Random Forest

Akash Patel
3 min readJun 14, 2021

--

Random Forest is a Supervised Machine Learning algorithm and just like a decision tree it can also be used for the classification as well as for the regression.

Content

  1. Definition
  2. Assumptions of Random Forest
  3. Advantages of Random Forest
  4. Disadvantages of Random Forest
  5. Why name “Random”
  6. Working of Random Forest
  7. Random Forest Vs Decision Tree
  8. Applications of Random Forest
  9. References

Definition :-

Random Forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during the training phase. The decision of the majority of the trees is chosen by the Random Forest as the final decision.

It works on the concept of ensemble learning technique, which operates by combining multiple classifiers to solve a complex problem and to improve the performance of the model.

Random Forest

Assumptions :-

Random Forests have no formal distributional assumptions, they are non-parametric and are able to handle skewed and multi-modal data as well as categorical data that are ordinal or non-ordinal.

Advantages :-

Following are the few advantages of the Random Forest :-

  1. Scaling is not required in Random Forest.
  2. Random Forest can handle large datasets even with high dimensionality.
  3. It maintains accuracy even when large proportion data points are missing.
  4. It prevents overfitting by combining the results of different decision trees.
  5. Random forest has less variance than a single decision tree.

Disadvantages :-

Following are the few disadvantages of the Random Forest :-

  1. Complexity of Random Forest is the main disadvantage.
  2. Even a small change in dataset can create radical changes in the model.
  3. Random Forest construction is much harder and complex as compared to Decision Tree.
  4. It is not computationally very efficient and takes more time as compared to other algorithms.
  5. Although we can use Random Forest for classification as well as for regression but it is not more suitable for Regression.

Why name “Random” :-

Following are the two main concepts that give it the name “random” :-

  1. A random sampling of training dataset is used while building a tree instead of a whole dataset.
  2. Random subsets of attributes are considered when splitting nodes.

Working of Random Forest :-

Random Forest uses the Bagging ensemble technique ( or say Bootstrap Aggregation Technique), which combines the multiple learning models to increase the overall results.

By using the Bagging technique, Random Forest builds multiple number of Decision Tree and for creating the Decision Tree it uses the Row Sampling with Replacement and Feature Sampling technique to train all the model individually and then merges the outcome of the Decision Tree model to make more accurate and stable prediction.

Random Forest Vs Decision Tree :-

As the name implies itself “Tree” and “Forest” ( or say Collection of Tree is called Forest ). Random Forest uses a collection of Decision Tree to make accurate predictions.

Decision Tree uses the complete dataset while the Random Forest uses only randomly selected rows ( or say record or observations) and columns ( or say features or variables) to build multiple Decision Tree.

The Decision Tree is simple to understand as compared to Random Forest.

There is a chance of overfitting in the Decision Tree but there is very less chance of overfitting in Random Forest.

Applications of Random Forest :-

Following are the few major applications of Random Forest :-

  1. Healthcare and Medicine :- ex: Diabetes Prediction
  2. Stock Market :- ex: Stock Market Sentiment Analysis
  3. E-commerce :- ex: Product Recommendation
  4. Banking Industry :- ex: Credit Card Fraud Detection

References :-

  1. Wikipedia
  2. KDnuggets Blogs
  3. Few Other sources

--

--

No responses yet