To
improve the prediction accuracy of a prediction problem, it is
prevalent to combine multiple versions of a prediction algorithm
rather than improving the accuracy of the single algorithm. This
method of combining the predictions of multiple versions of a single
algorithm over a single prob- lem is known as ensemble learning.
Ensemble learning methods have been a major area of research of
various disciplines such as statistical datamining, pattern recognition,
and machine learning. An ensemble method is constructed in two
ways; either by training the component classifiers parallelly or
by training the component classifiers sequentially. In this thesis,our
main focus is on parallel ensemble methods.
The success of the ensemble methods is analyzed from several viewpoints. First,
it is is from a maximum margin classifier viewpoint. In this approach, it is
shown that an ensemble classifier method maximizes the margin of the classifier,
improving the generalizing ability of the classifier. The other is from the classical
bias-variance decomposition theory of the misclassification error; it is shown
that ensemble methods reduce bias-variance of the classifier and finally a better
gener- alization ability. Besides these two approaches to decipher the mechanism
of ensemble methods, there is another approach, which is concerned about the
stability of the component algorithm of an ensemble. In this approach, it is
shown that an ensemble method is only effective if the ensem- ble method is more
stable than a component of classifiers of the ensemble, which approximately means
that the generalization ability goes up. Following this approach, it is not feasible
to construct any conventional ensemble methods (more specifically the parallel
ensemble methods) with stable component classifiers. However, the main advantage
of the stable classifiers over their unstable counterpart is that the variance
of the stable one is lower than the unstable one.
In this thesis, our main objective is to construct efficient parallel ensemble
methods using the stable classifiers, exploiting its low variance property. For
this purpose, we propose two ensemble designs to utilize the stable classifiers.
The first design is in accordance with the third approach mentioned above. In
this design, we propose to control the stability of the component classifiers,
so that the constructed ensemble gains superior to the generalization ability
when they are aggregated (combined). In this type of ensemble, we have inserted
an additional selection/validation step to select or validate the component stable
classifiers having a certain limit of generalization ability. Then, the selected
classifiers are combined using robust statistics, so that the final ensemble
is resistant to any unusual output of any of the component classifiers. The next
design is in accordance with the second approach stated above. In this design,
we propose to enlarge the feature space of the component base classifiers, so
that bias-variance of the constructed ensemble is reduced simultaneously. In
this type of ensemble, we increase the representational power of the component
base classifiers by adding extra features from the outcome of additional stable
classifiers; in this way, the bias can be reduced, and by the constitutional
steps of the underlying parallel ensemble method, the variance is also reduced.
In both of our ensemble designs, to validate/select and add the outcomes from
stable classifier we have used samples, which are independent of the training
resamples (these training resamples are defined as inbag sample).
These samples are built-in samples with the inbag samples and are defined as
Out-of-bag samples (OOBS). The main advantage of using these samples in designing
our ensembles is that, the in- stances of these samples are discarded from its
inbag resample part; so the validation process is optimum and the added features
from the stable classifiers are also near optimum. |
|
|
|
|
Out-of-Bag samples, Stable classifier,
Parallel ensemble method, Resampling with and without replacement
|
|
|