A Study of parallel Processing and Noise Management on Machine Learning
Learning general concepts from a set of traing instances has become increasingly important for artificial intelligence researchers in constructing knowledge-based systems. It provides a good solution to fast building of a prototype knowledge base, avoiding the botteneck of knowledge acquisition. Symbolic learning strategies, according to the ways of processing training instances, can usually be divided into two classes: batch learning strategies and incremental learning strategies. Neural learning, as well as symbolic learning, is another interesting topic in A.I. No matter which strategy is adopted, however, its efficiency is limited by its learning speed and its validity is limited by the noise in the training set. It the first part of this thesis, we shall first study the feasibility of parallel machine learning. Techniques of parallel processing have been attempted to be applied to concept learning for conquering the problem of low learning speed. Three parallel learning models, based on the partiton of learning tasks on multiple processors, the principle of divide-and-conquer, and saving unnecessary checking time, are respectively proposed for batch learning, incremental learning, and neural learning. ID3, version space, and perceptron learning methods are also respectively parallelized for showing how these three parallel learning models can work well. Besides, the validity and relevance of the finally learned concepts heavily depends on the accuracy of the chosen training instances. The data provided to the learning systems usually contain noise in real applications. Modifying the traditional learning metholds for working well in noisy environments is then very important. In the second part of this thesis, ID3,version space, and perceptron learning methods are then respectively generalized for achieving this purpose. The generalized methods respectively possesses some (more or less) of the following additional capabilities: managing uncertain training instances, taking the different importance of different training instances into consideration, utilizing the available priori domain knowledge in guiding the proces of learning, making a trade-off between including the positive training instances and excluding the negative training instances, and decreasing time complexity of learning at the expense of only a little accuracyl. The conventional version space learning algorithm will also be generalized for finding disjunctive concepts in an incremental way. At last, two-phase learning has been designed for effectively solveing the learning problems in which training instances come in a two-stage way Machine learning in real-world situation usually starts from an initial collection of training instances; learning then proceeds off and on as new training instances come intermittently. Applying only batch learning methods or incremental learning methods cannot effectively and correctly attain the rules if training instances come in this two-stage way. Two-phase learning methods by integrating batch learning methods and incremental learning methods are apparently more suitable for solving this kind of learning problems. In summary, we hope the ideas proposed in this thesis could provide some principles to parallel machine learning, noise management, and integration of different learning methods. More effort, of courese, is needed since the proposed models and methods still cannot fit all learning strategies.
|Appears in Collections:||Thesis|