RANDOM FOREST

Lucky Bachawala
3 min readMay 21, 2021

ENSEMBLE TECHNIQUES

Before we start with Random Forest, we need to be clear with the method and its types of Ensemble method.

Ensemble method

The Ensemble method is a technique that creates multiple models (with a random subset of rows with overlapping)and then combines various models into one effective model. The Ensemble method usually provides more accurate solutions than a single model would.

The Ensemble technique is generally categorized into two types Bagging and Boosting. Here in this blog we extremely focus on Bagging techniques.

BAGGING

Bootstrap aggregating also called bagging, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression models which are trained in parallel. It also reduces variance and helps to avoid overfitting.

The algorithms which work by using the principle of Bagging techniques are

1. Random Forest

RANDOM FOREST

Random Forest or Random Decision Forest is an ensemble learning technique for classification and regression that operates by constructing a multitude of Decision trees (with a random subset of rows and features with overlapping) at training time and outputting the class in the form of Mode (Classification) or Mean/Median/average (Regression) prediction of individual trees.

Random Forest corrects the decision tree’s habit of overfitting by converting high variance to low variance by aggregation.

1. Suppose there are ‘C’ Columns and ‘R’ Rows in a dataset (D).

2. A Random subset of Rows and Columns are selected to create the Decision Tree.

3. Another Decision Tree is created with a Random subset of Rows and columns with or without overlapping of the above decision trees data subset in parallel to it.

4. In the same way we create many Decision trees models(M1, M2, M3,….Mn) parallel in the form of bootstrap with or without overlapping of datasets.

4. Prediction is given based on the aggregation of all outputs of models in the form of Mode (Classification) or Mean/Median/average (Regression) predictions of individual trees.

Random forest generally outperforms decision trees but their accuracy is lower than gradient boosting trees However, data characteristics can affect their performance.

If you ❤ this article, be sure to click 👏 below to recommend it and if you have any questions, leave a comment and I will do my best to answer.

For being more aware of the world of machine learning, follow me. It’s the best way to find out when I write more articles like this.

You can also follow me on Instagram, find me on LinkedIn or email me directly. I’d love to hear from you.

That’s all folks, Have a nice day :)

--

--