Decision Tree

Akash Patel
5 min readJun 12, 2021

--

Decision Tree is a supervised machine learning algorithm used for both classification as well as for regression.

Content

  1. Definition
  2. Types of Decision Tree
  3. Terminologies
  4. Advantages of Decision Tree
  5. Disadvantage of Decision Tree
  6. Assumptions of Decision Tree
  7. Attribute Selection Measures
  8. Working of Decision Tree
  9. Points to remember
  10. References

Definition :-

Decision Tree is a tree-structured classifier and can be used to determine a course of action. In this each internal node represents a test case for some attribute, each branch represents the outcome/answers of the test case and each leaf node represents a class label.

Decision Tree structure

Types of Decision Trees :-

Based on the target variables decision tree is divided into two categories :-

  1. Categorical Variable Decision Tree :- Decision Tree having categorical target variable is called Categorical variable decision tree.
  2. Continuous Variable Decision Tree :- Decision Tree having Continuous target variable is called Continuous variable decision tree.

Terminologies in Decision Tree :-

Following are the few terminologies which are widely used in the decision tree.

Root Node :- It is also known as parent node. It is the very first node of the tree. It represents the entire population or the sample and it is further split into two or more decision nodes.

Decision Node :- Splitting a sub-node into further sub-nodes.

Leaf Node :- They are also called terminal nodes and they do not split further.

Sub-Tree :- It is also called a branch and it is a subdivision of an entire tree.

Pruning :- It is the opposite process of splitting. The removal of sub-nodes in the decision tree is called Pruning.

Advantages of Decision Tree :-

Following are the few advantages of the Decision Tree :-

  1. It is easy to understand, visualize and interpret.
  2. Little effort is required for data preparation.
  3. Both numerical and categorical data can be handled by Decision Tree.
  4. Non-linear relationship between parameters does not affect its performance.
  5. It does not get affected by missing values and outliers, that’s why it does not need cleaning of the data.

Disadvantages of Decision Tree :-

Following are the few disadvantages of the Decision Tree :-

  1. Overfitting occurs when algorithm captures noise in the data.
  2. Even a small variation in data can make model unstable. To resolve such problems, bagging and boosting is needed to be done.
  3. A highly complicated model tends to have a low bias which makes it difficult for the model to work with new data.
  4. For the dominating classes, it builds a biased tree.
  5. As compared to other algorithms it results in low predicted accuracy.

Assumption in Decision Tree :-

Following are some assumptions of the decision tree :-

  1. In the initial stage, whole training data should be considered as the root.
  2. Placing of the attributes as root or internal node of the tree in an ordered manner is done through some statistical approach.
  3. Categorical features are preferred otherwise discretization is required for the categorical variables.
  4. Distribution of records are done recursively on the basis of the attribute values.

Attribute Selection Measures ( ASM ):-

Attribute Selection Measures ( ASM) is a technique which helps to select the best attribute for the root node and for the sub-nodes.

Following are the few main technique for ASM :-

  1. Entropy :- It is the measure of the randomness or unpredictability or impurity in the dataset. Higher the entropy, harder to draw any conclusion from that information.
Entropy Equation

2. Information Gain :- It is the measure of change in the entropy after the dataset is split based on an attribute. A Decision tree always tries to maximize the information gain and the node ( or say attribute) having highest information gain is split first.

Information Gain Equation

3. Gini Impurity:- It is a measure of the impurity or purity used while creating a decision tree in the CART ( Classification and Regression Tree ) algorithm. Gini index performs only binary splits.

Gini Index Equation

Working of the Decision Tree :-

Decision tree uses multiple algorithms to decide how to split a node into two or more than two sub-nodes. Homogeneity of the resultant sub-nodes increases with the formation of new sub-nodes.

Decision Tree splits the nodes on all the available variables and then selects the one which results in the most homogeneous sub-node.

Selection of the algorithm to split the node is also based on the types of target variables.

Following are the few algorithms used in Decision Tree :-

ID3 ( Iterative Dichotomiser 3 ) :- It is a classification algorithm and it follows a greedy approach of building a decision tree.

C4.5 :- It is used to generate a decision tree and it is an extension of the ID3 algorithm.

CART ( Classification and Regression Tree ) algorithm :- It is a powerful algorithm and can be used for classification or regression problems.

CHAID ( Chi-square automatic interaction detection ) :- It is used to find the relationship between a categorical outcome variable and the categorical predictor variable.

Points to remember :-

Pruning and Random forest are two techniques to remove overfitting in a Decision Tree.

Decision Tree is not an ensemble algorithm because it does not aggregate the results of multiple trees.

In case of linear relation linear regression is preferred over Decision Tree while in case of non-linear and complex relationship Decision Tree is preferred.

Entropy and Gini Impurity are used for the same purpose but Gini Impurity is computationally more efficient as compare to the Entropy

Entropy vs Gini Entropy

Value of Gini Impurity always lies between 0 and 0.5 while the value of Entropy lies between 0 and 1.

References :-

  1. Wikipedia
  2. KDnuggets Blogs
  3. Few other source

--

--

No responses yet