0% found this document useful (0 votes)
6 views

NOTES

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

NOTES

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

UNIT 2 SUPERVISED LEARNIG:NAÏVE BAYES THEOREM,DECISION

TREE(MARKS:14)
Naive Bayes Classifiers
What is Naive Bayes Classifiers?
Naïve Bayes algorithm is used for classification problems. It is highly used in text
classification. In text classification tasks, data contains high dimension (as each word
represent one feature in the data). It is used in spam filtering, sentiment detection, rating
classification etc. The advantage of using naïve Bayes is its speed. It is fast and making
prediction is easy with high dimension of data.
This model predicts the probability of an instance belongs to a class with a given set of
feature value. It is a probabilistic classifier. It is because it assumes that one feature in the
model is independent of existence of another feature. In other words, each feature contributes
to the predictions with no relation between each other. In real world, this condition satisfies
rarely. It uses Bayes theorem in the algorithm for training and prediction
Why it is Called Naive Bayes?


Bayes’ Theorem
Naive Bayes Theorem
 Based on prior knowledge of conditions that may be related to an event, Bayes
theorem describes the probability of the event
 conditional probability can be found this way
Assume we have a Hypothesis(H) and evidence(E),
According to Bayes theorem, the relationship between the probability of the
Hypothesis before getting the evidence represented as P(H) and the probability
of the hypothesis after getting the evidence represented as P(H|E) is:
P(H|E) = P(E|H)*P(H)/P(E)
 Prior probability = P(H) is the probability before getting the evidence
Posterior probability = P(H|E) is the probability after getting evidence
 In general,
P(class|data) = (P(data|class) * P(class)) / P(data)
Naive Bayes Theorem Example
Assume we have to find the probability of the randomly picked card to be king given that
it is a face card.
There are 4 Kings in a Deck of Cards which implies that
P(King) = 4/52
as all the Kings are face Cards so
P(Face|King) = 1
there are 3 Face Cards in a Suit of 13 cards and there are 4 Suits in total so
P(Face) = 12/52
Therefore,
P(King|face) = P(face|king)*P(king)/P(face) = 1/3

Here are some details about Bayes classifiers:

Naive Bayes Classifier: A Simple Yet Powerful Algorithm


Naive Bayes is a probabilistic machine learning algorithm that's particularly effective

for classification tasks. It's based on Bayes' theorem with a strong independence

assumption between features. Despite this "naive" assumption, Naive Bayes often

performs surprisingly well in practice, especially with text data.

How Naive Bayes Works

1. Bayes' Theorem:
o The core of Naive Bayes is Bayes' theorem, which relates conditional

probabilities:

P(A|B) = P(B|A) * P(A) / P(B)

o In the context of classification, this translates to:


o P(Class|Features) = P(Features|Class) * P(Class) /

P(Features)

2. Naive Assumption:
o Naive Bayes assumes that all features are independent of each other

given the class. This simplifies the calculation of probabilities:

o P(Features|Class) = P(Feature1|Class) * P(Feature2|Class)

* ... * P(FeatureN|Class)

3. Classification:
o To classify a new instance, Naive Bayes calculates the probability of

the instance belonging to each class.

o The class with the highest probability is assigned to the instance.

Types of Naive Bayes Classifiers


 Gaussian Naive Bayes: Assumes that features are continuous and normally

distributed.

 Multinomial Naive Bayes: Suitable for discrete features, often used for text

classification.

 Bernoulli Naive Bayes: Similar to Multinomial but treats features as binary

(present or absent).

Advantages of Naive Bayes


 Simplicity: Easy to understand and implement.

 Efficiency: Fast training and prediction times.

 Handles high-dimensional data: Works well with many features.

 Effective for text classification: Often achieves high accuracy in tasks like

spam filtering and sentiment analysis.


Disadvantages of Naive Bayes
 Naive assumption: The independence assumption may not always hold in

real-world data.

 Zero-frequency problem: If a feature value doesn't appear in the training data,

its probability will be zero, affecting the overall probability. Smoothing

techniques like Laplace smoothing can help mitigate this issue.

Applications of Naive Bayes


 Text classification: Spam filtering, sentiment analysis, topic modeling

 Image classification: Facial recognition, object detection

 Medical diagnosis: Disease prediction

 Recommendation systems: Product recommendations

Bayes Theorem Formula

The formula for the Bayes theorem can be written in a variety of ways. The following

P(A ∣ B) = P(B ∣ A)P(A) / P(B)


is the most common version:

P(A ∣ B) is the conditional probability of event A occurring, given that B is true.


P(B ∣ A) is the conditional probability of event B occurring, given that A is true.
P(A) and P(B) are the probabilities of A and B occurring independently of one
another.

Bayes Theorem Formula Solved Examples


Example 1. A certain disease affects 2% of the population. A diagnostic test for the disease has
an accuracy rate of 95% (i.e., the probability of testing positive given that the disease is present
is 0.95), and the false positive rate is 3% (i.e., the probability of testing positive given that the
disease is absent is 0.03). If a randomly selected person tests positive, what is the probability
that they actually have the disease?
Example 2. A box contains 3 fair coins and 1 biased coin with two heads. A coin is randomly
selected and tossed, and it shows heads. What is the probability that the chosen coin is biased?
Example 3. A certain drug test correctly identifies drug users 98% of the time and gives false
negatives for 3% of non-drug users. If 1% of the population are drug users, what is the probability
that a person who tests positive is actually a drug user?
Advantages of Naive Bayes Classifier
 Easy to implement and computationally efficient.
 Effective in cases with a large number of features.
 Performs well even with limited training data.
 It performs well in the presence of categorical features.
 For numerical features data is assumed to come from normal distributions
Disadvantages of Naive Bayes Classifier
 Assumes that features are independent, which may not always hold in real-world data.
 Can be influenced by irrelevant attributes.
 May assign zero probability to unseen events, leading to poor generalization.
Applications of Naive Bayes Classifier
 Spam Email Filtering: Classifies emails as spam or non-spam based on features.
 Text Classification: Used in sentiment analysis, document categorization, and topic
classification.
 Medical Diagnosis: Helps in predicting the likelihood of a disease based on symptoms.
 Credit Scoring: Evaluates creditworthiness of individuals for loan approval.
 Weather Prediction: Classifies weather conditions based on various factors.

IMPLEMENTATION OF NAIVE BAYES THEOREM

Here's a basic implementation of Naive Bayes using Python and the scikit-learn library:
Python
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Sample dataset (replace with your actual data)


X = [[1, 2], [2, 3], [3, 1], [4, 3], [5, 3]]
y = [0, 0, 1, 1, 0]

# Split data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Gaussian Naive Bayes classifier


gnb = GaussianNB()

# Train the model


gnb.fit(X_train, y_train)

# Make predictions on the test set


y_pred = gnb.predict(X_test)

# Evaluate the model's accuracy


accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Explanation:

1. Import necessary libraries:

oGaussianNB: For creating a Gaussian Naive Bayes classifier.


otrain_test_split: For splitting data into training and testing sets.
oaccuracy_score: For evaluating the model's accuracy.
2. Prepare data:

o Replace X and y with your actual data. X should be a 2D array of features, and
y should be a 1D array of corresponding class labels.
3. Split data:

oUse train_test_split to divide the data into training and testing sets.
This helps in evaluating the model's performance on unseen data.
4. Create and train the model:

oCreate an instance of GaussianNB.


oTrain the model using the fit method, passing the training data (X_train
and y_train).
5. Make predictions:
oUse the trained model to make predictions on the test data using the predict
method.
6. Evaluate the model:

o Calculate the accuracy of the model using accuracy_score by comparing


the predicted labels (y_pred) with the actual labels (y_test).

Note:

 This is a basic implementation and can be further customized based on your specific
needs and data characteristics.
 For text data, you might consider using MultinomialNB or BernoulliNB
instead of GaussianNB.

Decision Tree Classification Algorithm

 Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
 In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
 The decisions or the test are performed on the basis of features of the given dataset.
 It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
 It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
 In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
 A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtrees.
 Below diagram explains the general structure of a decision tree:

Why use Decision Trees?


There are various algorithms in Machine learning, so choosing the best algorithm for the
given dataset and problem is the main point to remember while creating a machine learning
model. Below are the two reasons for using the Decision tree:

 Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
 The logic behind the decision tree can be easily understood because it shows a tree-
like structure.

Decision Tree Terminologies


 Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further
after getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from the tree.
 Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.
How does the Decision Tree algorithm Work?
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the
root node of the tree. This algorithm compares the values of root attribute with the record
(real dataset) attribute and, based on the comparison, follows the branch and jumps to the
next node.
For the next node, the algorithm again compares the attribute value with the other sub-nodes
and move further. It continues the process until it reaches the leaf node of the tree. The
complete process can be better understood using the below algorithm:

 Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
 Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
 Step-3: Divide the S into subsets that contains possible values for the best attributes.
 Step-4: Generate the decision tree node, which contains the best attribute.
 Step-5: Recursively make new decision trees using the subsets of the dataset created
in step -3. Continue this process until a stage is reached where you cannot further
classify the nodes and called the final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to decide whether he
should accept the offer or Not. So, to solve this problem, the decision tree starts with the root
node (Salary attribute by ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the corresponding labels. The next
decision node further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:
Advantages of the Decision Tree

 It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
 It can be very useful for solving decision-related problems.
 It helps to think about all the possible outcomes for a problem.
 There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree

 The decision tree contains lots of layers, which makes it complex.


 It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.
 For more class labels, the computational complexity of the decision tree may increase.
why use decision tree algo?The decision tree algorithm is used in machine
learning because it's an effective way to make decisions by:
 Laying out possible outcomes: Decision trees help developers analyze the
potential consequences of a decision.

 Predicting outcomes: As the algorithm accesses more data, it can predict
outcomes for future data.

 Producing accurate models: Decision trees can produce accurate and
interpretable models with minimal user intervention.

 Handling data: Decision trees can handle both categorical and numerical
data.

 Being fast: The decision tree algorithm is fast at both build time and apply
time.

Python Implementation of Decision Tree


from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load the Iris dataset


iris = load_iris()
X = iris.data
y = iris.target

# Split data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Decision Tree Classifier


clf = DecisionTreeClassifier(random_state=42)

# Train the classifier


clf.fit(X_train, y_train)

# Make predictions on the test set


y_pred = clf.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Random Forest Algorithm


Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML. It
is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision
trees on various subsets of the given dataset and takes the average to improve the predictive
accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of predictions, and it predicts the
final output.
The greater number of trees in the forest leads to higher accuracy and prevents the
problem of overfitting.
The below diagram explains the working of the Random Forest algorithm:
Why use Random Forest?
Below are some points that explain why we should use the Random Forest algorithm:

<="" li="" style="box-sizing: border-box;">

 It takes less training time as compared to other algorithms.


 It predicts output with high accuracy, even for the large dataset it runs efficiently.
 It can also maintain accuracy when a large proportion of data is missing.
Random forest algorithms are used for many purposes because they are
versatile, can handle large data sets, and are robust to overfitting:
 Versatility: Random forests can perform both classification and regression
tasks.

 Large data sets: Random forests can handle large data sets with high
dimensionality.
 Overfitting: Random forests are robust to overfitting, which is when a model
performs well with training data but doesn't generalize to other data.
 Accuracy: Random forests offer a high level of accuracy.
 Training time: Random forests reduce the required training time.
 Missing data: Random forests can estimate missing data and maintain
accuracy when a portion of the data is missing.

 Feature importance: Random forests automatically rank the importance of
different features

How does Random Forest algorithm work?


Random Forest works in two-phase first is to create the random forest by combining N
decision tree, and second is to make predictions for each tree created in the first phase.
The Working process can be explained in the below steps and diagram:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision tree, and assign the new
data points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset is
given to the Random forest classifier. The dataset is divided into subsets and given to each
decision tree. During the training phase, each decision tree produces a prediction result, and
when a new data point occurs, then based on the majority of results, the Random Forest
classifier predicts the final decision. Consider the below image:

Applications of Random Forest


There are mainly four sectors where Random forest mostly used:

1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease can
be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.

Advantages of Random Forest

 Random Forest is capable of performing both Classification and Regression tasks.


 It is capable of handling large datasets with high dimensionality.
 It enhances the accuracy of the model and prevents the overfitting issue.

Disadvantages of Random Forest

 Although random forest can be used for both classification and regression tasks, it is
not more suitable for Regression tasks.

Python Implementation of Random Forest Algorithm


Now we will implement the Random Forest Algorithm tree using Python. For this, we will
use the same dataset "user_data.csv", which we have used in previous classification models.
By using the same dataset, we can compare the Random Forest classifier with other
classification models such as Decision tree Classifier, KNN, SVM, Logistic Regression, etc.
Implementation Steps are given below:
 Data Pre-processing step
 Fitting the Random forest algorithm to the Training set
 Predicting the test result
 Test accuracy of the result (Creation of Confusion matrix)
 Visualizing the test set result.
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Sample data (replace with your own data)


X = [[0, 0], [1, 1], [1, 0], [0, 1]] # Features
y = [0, 1, 1, 0] # Labels

# Split data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Create a Random Forest Classifier


clf = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model


clf.fit(X_train, y_train)

# Make predictions on the test set


y_pred = clf.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

You might also like