Bootstrap Sampling is a statistical technique that involves drawing of sample data repeatedly with replacement from a data source to estimate population parameters using samples.
Sampling with replacement means that a data point in added back in the sample, after it is drawn from it once.
When we have to estimate a parameter of a large population, we can take the help of Bootstrap Sampling. We take a small sample from the population and calculate the statistics, then add the samples drawn back into the population. Then, we repeat this procedure ’n’ number of times and take the average statistic as the estimate of the population parameter.
It helps in avoiding overfitting and improves the stability of machine learning algorithms.
3. It is used in many ensemble machine learning algorithms like random forests, AdaBoost, gradient boost, and XGBoost.
Select the sample size
Select an observation from the training data randomly
Now add this observation to the previously selected sample
The samples not selected are usually referred to as the “out-of-bag” samples. For a given iteration of Bootstrap resampling, a model is built on the selected samples and is used to predict the out-of-bag samples.
For getting better results, it is always better to increase the number of repetitions.