0% found this document useful (0 votes)
63 views13 pages

Decision Tree Random Forrest Naive Bayes 02

The document provides an overview of three machine learning algorithms: Decision Trees, Random Forests, and Naive Bayes. It explains the structure and functioning of Decision Trees, including concepts like entropy and information gain, along with the ID3 algorithm. Additionally, it covers the ensemble learning approach of Random Forests and the principles behind Naive Bayes, including its advantages, disadvantages, and applications.

Uploaded by

Muhammad Farhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views13 pages

Decision Tree Random Forrest Naive Bayes 02

The document provides an overview of three machine learning algorithms: Decision Trees, Random Forests, and Naive Bayes. It explains the structure and functioning of Decision Trees, including concepts like entropy and information gain, along with the ID3 algorithm. Additionally, it covers the ensemble learning approach of Random Forests and the principles behind Naive Bayes, including its advantages, disadvantages, and applications.

Uploaded by

Muhammad Farhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Decision Tree, Random Forest & Naive Bayes

I. Decision Tree

Decision Tree
✔ Decision Trees are widely used algorithms for supervised machine learning.
✔ A Decision Tree consists of a series of sequential decisions, or decision nodes, on some
data set's features.
✔ The resulting flow-like structure is navigated via conditional control statements, or if-then
rules, which split each decision node into two or more subnodes.
✔ Leaf nodes, also known as terminal nodes, represent prediction outputs for the model.

Figure 1. Decision Tree Structure Ilustration


Entropy
Entropy measures the amount of information of some variable or event. it can be used to
identify regions consisting of a large number of similar (pure) or dissimilar (impure) elements.

The entropy can be used to quantify the impurity of a collection of labeled data points: a node
containing multiple classes is impure whereas a node including only one class is pure.

1
Entropy Properties

Information Gain
✔ information gain measures an amount the information that we gain.
✔ The idea is to subtract from the entropy of our data before the split the entropy of each
possible partition thereafter.
✔ Then select the split that yields the largest reduction in entropy, or equivalently, the largest
increase in information.

ID3 Algorithm
✔ The core algorithm to calculate information gain is called ID3.
✔ It's a recursive procedure that starts from the root node of the tree and iterates top-down
on all non-leaf branches in a greedy manner
✔ Calculating at each depth the difference in entropy:

ID3 Algorithm Steps


1. Calculate the entropy associated to every feature of the data set.
2. Partition the data set into subsets using different features and cutoff values. For each,
compute the information gain ΔIG as the difference in entropy before and after the split
using the formula above. For the total entropy of all children nodes after the split, use the
weighted average taking into account Nchild i.e. how many of the N samples end up on
each child branch.
3. Identify the partition that leads to the maximum information gain. Create a decision node
on that feature and split value.
4. When no further splits can be done on a subset, create a leaf node and label it with the
most common class of the data points within it if doing classification or with the average

2
value if doing regression.
5. Recurse on all subsets. Recursion stops if after a split all elements in a child node are of the
same type. Additional stopping conditions may be imposed, such as requiring a minimum
number of samples per leaf to continue splitting, or finishing when the trained tree has
reached a given maximum depth.

The Problem of Pertubations


✔ Decision Tree can be extremely sensitive to small perturbations in the data: a minor change
in the training examples can result in a drastic change in the structure of the Decision Tree.
✔ Example: small random Gaussian perturbations on just 5% of the training examples create
a set of completely different Decision Trees

II. Random Forrest

Random Forest
Condorcet's Jury Theorem says that if each person is more than 50% correct, then adding more
people to vote increases the probability that the majority is correct.

Figure 2. Ensemble Principle Ilustration

Ensemble learning
✔ Ensemble learning creates a stronger model by aggregating the predictions of multiple
weak models, such as decision trees

3
Bagging
✔ Bagging or bootstrap aggregation a technique for reducing the variance of an estimated
prediction function.
✔ For classification, a committee of trees each cast a vote for the predicted class.

Bagging Method
✔ One way to produce multiple models that are different is to train each model using a
different training set.
✔ The Bagging (Bootstrap Aggregating) method randomly draws a fixed number of samples
from the training set with replacement (this means that a data point can be drawn more
than once).

Random forest classifier


✔ Random forest classifier, an extension to bagging which uses de-correlated trees.

Figure 3. Random Forest Classifier Ilustration

Variance in Composition
✔ The inventor of the random forest model Leo Breiman says in his paper "[o]ur results
indicate that better (lower generalization error) random forests have lower correlation
between classifiers and higher strength."

✔ The high variance of the decision tree model can help keep the correlation among trees
low. The Bagging Method as well as the Feature Selection are the key innovations to keep
correlation low.

4
III. Naive Bayes
Naive Bayes
✔ Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem
and used for solving classification problems.
✔ Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
✔ Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes' Theorem
✔ Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
✔ The formula for Bayes' theorem is given as:

Where,
⮚ P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
⮚ P(B|A) is Likelihood probability: Probability of the evidence given that the probability
of a hypothesis is true.
⮚ P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
⮚ P(B) is Marginal Probability: Probability of Evidence.

Steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Advantages of Naïve Bayes Classifier


✔ Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
✔ It can be used for Binary as well as Multi-class Classifications.
✔ It performs well in Multi-class predictions as compared to the other Algorithms.
✔ It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier


✔ Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the

5
relationship between features.

Applications of Naïve Bayes Classifier


✔ It is used for Credit Scoring.
✔ It is used in medical data classification.
✔ It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
✔ It is used in Text classification such as Spam filtering and Sentiment analysis.

IV. Experiment
- Use the following into google colab., then do the trials.

Decision Tree

6
7
Random Forest For Classifying Digits

Random Forest example 2

8
9
Naive Bayes (Social Network Ads Example)

10
Naive Bayes (Play or No)

11
Naive Bayes (weather)

12
V. Reference
1. https://www.kaggle.com/datasets/zaraavagyan/weathercsv
2. https://github.com/likarajo/petrol_consumption
3. https://www.kaggle.com/datasets/rakeshrau/social-network-ads?resource=download
4. https://realpython.com/logistic-regression-python/
5. https://www.datacamp.com/community/tutorials/understanding-logistic-regression-
python
6. https://www.analyticsvidhya.com/blog/2021/05/machine-learning-with-python-logistic-
regression/
7. https://www.analyticsvidhya.com/blog/2021/04/beginners-guide-to-logistic-regression-
using-python/
8. https://mlu-explain.github.io/decision-tree/
9. https://mlu-explain.github.io/random-forest/

13

You might also like