Kaggle Ensembling Guide
Model ensembling is a very powerful technique to increase accuracy on a variety of ML tasks. In this article I will share my ensembling approaches for Kaggle Competitions.
For the first part we look at creating ensembles from submission files. The second part will look at creating ensembles through stacked generalization/blending.
I answer why ensembling reduces the generalization error. Finally I show different methods of ensembling, together with their results and code to try it out for yourself.
This is how you win ML competitions: you take other peoples’ work and ensemble them together. — Vitaly Kuznetsov NIPS2014
Creating ensembles from submission files
The most basic and convenient way to ensemble is to ensemble Kaggle submission CSV files. You only need the predictions on the test set for these methods — no need to retrain a model. This makes it a quick way to ensemble already existing model predictions, ideal when teaming up.
Voting ensembles
We first take a look at a simple majority vote ensemble. Let’s see why model ensembling reduces error rate and why it works better to ensemble low-correlated model predictions.
Error correcting codes
During space missions it is very important that all signals are correctly relayed.
If we have a signal in the form of a binary string like:
1110110011101111011111011011
and somehow this signal is corrupted (a bit is flipped) to:
1010110011101111011111011011
then lives could be lost.
A coding solution was found in error correcting codes. The simplest error correcting code is a repetition-code: Relay the signal multiple times in equally sized chunks and have a majority vote.
Original signal:
1110110011
Encoded:
10,3 101011001111101100111110110011
Decoding:
1010110011
1110110011
1110110011
Majority vote:
1110110011
Signal corruption is a very rare occurrence and often occur in small bursts. So then it figures that it is even rarer to have a corrupted majority vote.
As long as the corruption is not completely unpredictable (has a 50% chance of occurring) then signals can be repaired.
A machine learning example
Suppose we have a test set of 10 samples. The ground truth is all positive (“1”):
1111111111
We furthermore have 3 binary classifiers (A,B,C) with a 70% accuracy. You can view these classifiers for now as pseudo-random number generators which output a “1” 70% of the time and a “0” 30% of the time.
We will now show how these pseudo-classifiers are able to obtain 78% accuracy through a voting ensemble.
A pinch of maths
For a majority vote with 3 members we can expect 4 outcomes:
-
All three are correct
0.7 * 0.7 * 0.7 = 0.3429