What is Ensemble technique ?
As discussed in previous few posts about predictive analytics challenges, Ensemble technique is a solution for most of them. Incremental learning environment, specifically demands for ensemble based algorithms for problem solving due to dynamic data generation functions.
Ensemble technique is a class of meta-algorithms that merge various machine learning algorithms into single predictive model in order to reduce variance (bagging), bias (boosting), or increase predictions quality.
Let us see a real world example to understand what ensemble technique exactly does. Assume that there is one patient who is suffering from a major disease and need some expert doctor to treat him. Therefore, for treating this critical patient, there is team of doctors having multiple skill-sets. Finally, the team of doctors will discuss their solutions and will come up with a single treatment method and try to sure the patient.
Another real world example is, agile team’s working in software development process. In Scrum team, there is group of heterogeneous (developer, tester, data analyst etc.) skilled software engineers. By working together on same set of problem, the team build a final software product.
Ensemble technique (multiple classifier system) is widespread in the area of machine learning, specifically in the incremental environment. An ensemble technique is extracted by merging diverse classifiers. There are various differentiating parameters who help to achieve diversity that in turn entitles each classifier to produce several decision boundaries. Appropriate diversity allows to gain different errors to be made by individual classifier and finally strategic integration of them can cut off the total error in the entire system.
The Ensemble can be build up in several ways like:
3) Stacked generalization
4) Mixture of experts
Algorithms based on Bagging and Boosting
1) Bagging meta-estimator
2) Random Forest
5) Light GBM
The diversity need can be fulfilled by applying various approaches such as:
1) Training each classifier using several data chunks
2) Training each classifier using several parameters of a given classifier architecture
3) Training each classifier using several classifier models
4) Random Subspace method (training each classifier using several subset of characteristics)
It is necessary to pick optimal group of classifiers for ensemble based system and the selected classifiers should be diverse enough so that the classification mistakes of one classifier should not be repeated by another one. This is vital for multiple classifier system to take full benefit of the system’s structure. When multiple decisions are united using voting technique, one expect to obtain accurate results based on the assumption that most of the experts are more probable to be right in their classification decision.
Ensemble techniques can be classified into two groups:
- Sequential ensemble techniques: The base learners are produced sequentially (example is AdaBoost algorithm).
The inspiration of sequential techniques is touse full the dependence between the base learners. The total performance can be boosted by weighing earlier incorrectly labeled instances with higher weight.
- Parallel ensemble techniques: The base learners are produced in parallel (example is Random Forest algorithm).
The basic inspiration of parallel techniques is to use full independence between the base learners since the error can be decreased dramatically by averaging.
Fundamental issue in ensemble technique is a selection of appropriate rule to unite decisions from multiple experts. Voting rule is applied at the final step of ensemble system. In literature, various voting rules are presented. Most commonly used examples are:
1. Geometric average rule (GA rule)
2. Arithmetic average rule (AA rule)
3. Weighted average rule (Weighted AA rule)
4. Median value rule (MV rule)
5. Majority voting rule (MajV rule)
6. Weighted majority voting rule (Weighted MajV rule)
7. Borda count rule (BC rule)
8. Max and min rule
9. SSC rule
For implementation of random forest algorithm in R, check this post by R bloggers