ENSMBLE LEARNING
Ensemble learning
Overview
Supervised learning algorithms perform the task of searching through a hypothesis space to find a suitable hypothesis that will make good predictions with a particular problem.[4] Even if the hypothesis space contains hypotheses that are very well-suited for a particular problem, it may be very difficult to find a good one. Ensembles combine multiple hypotheses to form a (hopefully) better hypothesis. The term ensemble is usually reserved for methods that generate multiple hypotheses using the same base learner.[according to whom?] The broader term of multiple classifier systems also covers hybridization of hypotheses that are not induced by the same base learner.[citation needed]
Evaluating the prediction of an ensemble typically requires more computation than evaluating the prediction of a single model. In one sense, ensemble learning may be thought of as a way to compensate for poor learning algorithms by performing a lot of extra computation. On the other hand, the alternative is to do a lot more learning on one non-ensemble system. An ensemble system may be more efficient at improving overall accuracy for the same increase in compute, storage, or communication resources by using that increase on two or more methods, than would have been improved by increasing resource use for a single method. Fast algorithms such as decision trees are commonly used in ensemble methods (for example, random forests), although slower algorithms can benefit from ensemble techniques as well.
By analogy, ensemble techniques have been used also in unsupervised learning scenarios, for example in consensus clustering or in anomaly detection.
Ensemble theory
Bootstrap aggregating (bagging)
Bootstrap aggregation (bagging) involves training an ensemble on bootstrapped data sets. A bootstrapped set is created by selecting from original training data set with replacement. Thus, a bootstrap set may contain a given example zero, one, or multiple times. Ensemble members can also have limits on the features (e.g., nodes of a decision tree), to encourage exploring of diverse features.[16] The variance of local information in the bootstrap sets and feature considerations promote diversity in the ensemble, and can strengthen the ensemble.[17] To reduce overfitting, a member can be validated using the out-of-bag set (the examples that are not in its bootstrap set).[18]
Inference is done by voting of predictions of ensemble members, called aggregation. It is illustrated below with an ensemble of four decision trees. The query example is classified by each tree. Because three of the four predict the positive class, the ensemble's overall classification is positive. Random forests like the one shown are a common application of bagging.
Bayesian model averaging
Bayesian model averaging (BMA) makes predictions by averaging the predictions of models weighted by their posterior probabilities given the data.[19] BMA is known to generally give better answers than a single model, obtained, e.g., via stepwise regression, especially where very different models have nearly identical performance in the training set but may otherwise perform quite differently.
The question with any use of Bayes' theorem is the prior, i.e., the probability (perhaps subjective) that each model is the best to use for a given purpose. Conceptually, BMA can be used with any prior. R packages ensembleBMA[20] and BMA[21] use the prior implied by the Bayesian information criterion, (BIC), following Raftery (1995).[22] R package BAS supports the use of the priors implied by Akaike information criterion (AIC) and other criteria over the alternative models as well as priors over the coefficients.
Comments
Post a Comment