Now question is when this diving process will stop?.

1.Either it has divided into classes that are pure (only containing members of single class ).

2.Some criteria of classifier attributes are met.

1.We run out of available features to divide the class upon.

It is negative summation of probability times the log of probability of item x.

* log(0.5)) - (0.25 * log(0.25)) -(0.25 * log(0.25)
= 0.45

Information Gain (n) = Entropy(x) — ([weighted average] * entropy(children for feature))

Little more explanation!

Suppose we have following class to work with intially

112234445

Suppose we divide them based on property: divisible by 2

Entropy at root level : 0.66

Entropy of left child : 0.45 , weighted value = (4/9) * 0.45 = 0.2

Entropy of right child: 0.29 , weighted value = (5/9) * 0.29 = 0.16

Information Gain = 0.66 - [0.2 + 0.16] = 0.3

Suppose we have a following data for playing a golf on various conditions.

Now if the weather condition is given as :

Outlook : Rainy, Temperature: Cool, Humidity: High, Windy: False

We have outcomes at beginning as NNYYYNYN (Y = Yes and N = No) taken in given order. Entropy at this root node is 0.3

Now try to divide on various predictors outlook, temperature, humidity and Windy.

Calculate the information gain in each case. Which one has highest information gain?

For example, if we divide based on Outlook, we have divisions as Rainy : NNN (entropy = 0) Sunny : YYN (entropy = 0.041) Overcast : YY (entropy = 0)

So information gain = 0.3 - [0 + (3/8)*0.041 + 0] = 0.28

Try out for other cases.

The information gain is max when divided based on Outlook.

Now the impurity for Rainy and Overcast is 0. We stop for them here.

Next we need to separate Sunny,

If we divide by Windy, we get max information gain. Sunny YYN Windy? Yes : N No : YY

So decision tree look like something as shown in image below.

No the prediction data is

Outlook : Rainy, Temperature: Cool, Humidity: High, Windy: False

Flowing down from tree according to result, we first check Rainy?

So answer is No, we don't play golf.

In [42]:

```
import numpy as np
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
```

Numpy arrays and pandas dataframes will help us in manipulating data. As discussed above, sklearn is a machine learning library. The cross_validation’s train_test_split() method will help us by splitting data into train & test set.

The tree module will be used to build a Decision Tree Classifier. Accutacy_score module will be used to calculate accuracy metrics from the predicted class variables.

In [43]:

```
balance_data = pd.read_csv(
'https://archive.ics.uci.edu/ml/machine-learning-databases/balance-scale/balance-scale.data',
sep= ',', header= None)
```

For importing the data and manipulating it, we are going to use pandas dataframes. First of all, we need to download the dataset. All the data values are separated by commas.

After downloading the data file, we will use Pandas read_csv() method to import data into pandas dataframe. Since our data is separated by commas “,” and there is no header in our data, so we will put header parameter’s value “None” and sep parameter’s value as “,”.

We are saving our data into “balance_data” dataframe.

For checking the length & dimensions of our dataframe, we can use len() method & “.shape”.

In [44]:

```
print ("Dataset Lenght:: ", len(balance_data))
print ("Dataset Shape:: ", balance_data.shape)
```

Dataset Lenght:: 625 Dataset Shape:: (625, 5)

In [45]:

```
print ("Dataset:: ")
balance_data.head()
```

Dataset::

Out[45]:

0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|

0 | B | 1 | 1 | 1 | 1 |

1 | R | 1 | 1 | 1 | 2 |

2 | R | 1 | 1 | 1 | 3 |

3 | R | 1 | 1 | 1 | 4 |

4 | R | 1 | 1 | 1 | 5 |

In [46]:

```
X = balance_data.values[:, 1:5]
Y = balance_data.values[:,0]
```

The above snippet divides data into feature set & target set. The “X ” set consists of predictor variables. It consists of data from 2nd column to 5th column. The “Y” set consists of the outcome variable. It consists of data in the 1st column. We are using “.values” of numpy converting our dataframes into numpy arrays.

Let’s split our data into training and test set. We will use sklearn’s train_test_split() method.

In [47]:

```
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)
```

The above snippet will split data into training and test set. X_train, y_train are training data & X_test, y_test belongs to the test dataset.

The parameter test_size is given value 0.3; it means test sets will be 30% of whole dataset & training dataset’s size will be 70% of the entire dataset. random_state variable is a pseudo-random number generator state used for random sampling. If you want to replicate our results, then use the same value of random_state.

DecisionTreeClassifier(): This is the classifier function for DecisionTree. It is the main function for implementing the algorithms. Some important parameters are:

1.Criterion: It defines the function to measure the quality of a split. Sklearn supports “gini” criteria for Gini Index & “entropy” for Information Gain. By default, it takes “gini” value.

2.Splitter: It defines the strategy to choose the split at each node. Supports “best” value to choose the best split & “random” to choose the best random split. By default, it takes “best” value.

3.max_features: It defines the no. of features to consider when looking for the best split. We can input integer, float, string & None value. a.If an integer is inputted then it considers that value as max features at each split.

```
b.If float value is taken then it shows the percentage of features at each split.
c.If “auto” or “sqrt” is taken then max_features=sqrt(n_features).
d.If “log2” is taken then max_features= log2(n_features).
e.If None, then max_features=n_features. By default, it takes “None” value.
```

4.max_depth: The max_depth parameter denotes maximum depth of the tree. It can take any integer value or None. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. By default, it takes “None” value.

5.min_samples_split: This tells above the minimum no. of samples reqd. to split an internal node. If an integer value is taken then consider min_samples_split as the minimum no. If float, then it shows percentage. By default, it takes “2” value.

6.min_samples_leaf: The minimum number of samples required to be at a leaf node. If an integer value is taken then consider min_samples_leaf as the minimum no. If float, then it shows percentage. By default, it takes “1” value.

7.max_leaf_nodes: It defines the maximum number of possible leaf nodes. If None then it takes an unlimited number of leaf nodes. By default, it takes “None” value.

8.min_impurity_split: It defines the threshold for early stopping tree growth. A node will split if its impurity is above the threshold otherwise it is a leaf.

In [49]:

```
#Decision Tree Classifier with criterion gini index.
clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_gini.fit(X_train, y_train)
```

Out[49]:

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=5, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=100, splitter='best')

In [50]:

```
#Decision Tree Classifier with criterion information gain.
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)
```

Out[50]:

DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=3, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=5, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=100, splitter='best')

In [51]:

```
clf_gini.predict([[4, 4, 3, 3]])
```

Out[51]:

array(['R'], dtype=object)

In [52]:

```
#Prediction for Decision Tree classifier with criterion as gini index.
y_pred = clf_gini.predict(X_test)
y_pred
```

Out[52]:

array(['R', 'L', 'R', 'R', 'R', 'L', 'R', 'L', 'L', 'L', 'R', 'L', 'L', 'L', 'R', 'L', 'R', 'L', 'L', 'R', 'L', 'R', 'L', 'L', 'R', 'L', 'L', 'L', 'R', 'L', 'L', 'L', 'R', 'L', 'L', 'L', 'L', 'R', 'L', 'L', 'R', 'L', 'R', 'L', 'R', 'R', 'L', 'L', 'R', 'L', 'R', 'R', 'L', 'R', 'R', 'L', 'R', 'R', 'L', 'L', 'R', 'R', 'L', 'L', 'L', 'L', 'L', 'R', 'R', 'L', 'L', 'R', 'R', 'L', 'R', 'L', 'R', 'R', 'R', 'L', 'R', 'L', 'L', 'L', 'L', 'R', 'R', 'L', 'R', 'L', 'R', 'R', 'L', 'L', 'L', 'R', 'R', 'L', 'L', 'L', 'R', 'L', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'L', 'R', 'L', 'R', 'R', 'L', 'R', 'R', 'R', 'R', 'R', 'L', 'R', 'L', 'L', 'L', 'L', 'L', 'L', 'L', 'R', 'R', 'R', 'R', 'L', 'R', 'R', 'R', 'L', 'L', 'R', 'L', 'R', 'L', 'R', 'L', 'L', 'R', 'L', 'L', 'R', 'L', 'R', 'L', 'R', 'R', 'R', 'L', 'R', 'R', 'R', 'R', 'R', 'L', 'L', 'R', 'R', 'R', 'R', 'L', 'R', 'R', 'R', 'L', 'R', 'L', 'L', 'L', 'L', 'R', 'R', 'L', 'R', 'R', 'L', 'L', 'R', 'R', 'R'], dtype=object)

In [53]:

```
#Prediction for Decision Tree classifier with criterion as information gain.
y_pred_en = clf_entropy.predict(X_test)
y_pred_en
```

Out[53]:

array(['R', 'L', 'R', 'L', 'R', 'L', 'R', 'L', 'R', 'R', 'R', 'R', 'L', 'L', 'R', 'L', 'R', 'L', 'L', 'R', 'L', 'R', 'L', 'L', 'R', 'L', 'R', 'L', 'R', 'L', 'R', 'L', 'R', 'L', 'L', 'L', 'L', 'L', 'R', 'L', 'R', 'L', 'R', 'L', 'R', 'R', 'L', 'L', 'R', 'L', 'L', 'R', 'L', 'L', 'R', 'L', 'R', 'R', 'L', 'R', 'R', 'R', 'L', 'L', 'R', 'L', 'L', 'R', 'L', 'L', 'L', 'R', 'R', 'L', 'R', 'L', 'R', 'R', 'R', 'L', 'R', 'L', 'L', 'L', 'L', 'R', 'R', 'L', 'R', 'L', 'R', 'R', 'L', 'L', 'L', 'R', 'R', 'L', 'L', 'L', 'R', 'L', 'L', 'R', 'R', 'R', 'R', 'R', 'R', 'L', 'R', 'L', 'R', 'R', 'L', 'R', 'R', 'L', 'R', 'R', 'L', 'R', 'R', 'R', 'L', 'L', 'L', 'L', 'L', 'R', 'R', 'R', 'R', 'L', 'R', 'R', 'R', 'L', 'L', 'R', 'L', 'R', 'L', 'R', 'L', 'R', 'R', 'L', 'L', 'R', 'L', 'R', 'R', 'R', 'R', 'R', 'L', 'R', 'R', 'R', 'R', 'R', 'R', 'L', 'R', 'L', 'R', 'R', 'L', 'R', 'L', 'R', 'L', 'R', 'L', 'L', 'L', 'L', 'L', 'R', 'R', 'R', 'L', 'L', 'L', 'R', 'R', 'R'], dtype=object)

In [54]:

```
#Accuracy for Decision Tree classifier with criterion as gini index.
print ("Accuracy is ", accuracy_score(y_test,y_pred)*100)
```

Accuracy is 73.40425531914893

In [55]:

```
#Accuracy for Decision Tree classifier with criterion as information gain.
print ("Accuracy is ", accuracy_score(y_test,y_pred_en)*100)
```

Accuracy is 70.74468085106383