Bagging is short for bootstrap aggregating, where individual estimators are trained with randomly sampled data (with replacement). The final model is the weighted average of the individual estimators. Typically bagging models have much higher performance compared to any individual estimator. Random Forest is an example of a bagging class algorithm. Several hyperparameters can be adjusted to improve the performance of these models, like the number of estimators, depth of trees, number of leaf nodes, etc.

Because each estimator is trained independently, bagging algorithms can run in parallel irrespective of the size of the dataset. This is useful in situations where we can take advantage of parallel computing. The nature of bagging algorithms will reduce overfitting since each estimator is trained only on a randomly selected sample of the entire dataset. The performance of bagging algorithms is based on the weighted averaging of the individual estimators. For this reason, increasing the number of estimators will not always improve the model’s performance.

When used on imbalanced datasets, bagging algorithms will likely use fewer entries from the minority class during training, resulting in worse performance.


Related: Boosting Sources: Title Unavailable | Site Unreachable