NOTES
NOTES
TREE(MARKS:14)
Naive Bayes Classifiers
What is Naive Bayes Classifiers?
Naïve Bayes algorithm is used for classification problems. It is highly used in text
classification. In text classification tasks, data contains high dimension (as each word
represent one feature in the data). It is used in spam filtering, sentiment detection, rating
classification etc. The advantage of using naïve Bayes is its speed. It is fast and making
prediction is easy with high dimension of data.
This model predicts the probability of an instance belongs to a class with a given set of
feature value. It is a probabilistic classifier. It is because it assumes that one feature in the
model is independent of existence of another feature. In other words, each feature contributes
to the predictions with no relation between each other. In real world, this condition satisfies
rarely. It uses Bayes theorem in the algorithm for training and prediction
Why it is Called Naive Bayes?
Bayes’ Theorem
Naive Bayes Theorem
Based on prior knowledge of conditions that may be related to an event, Bayes
theorem describes the probability of the event
conditional probability can be found this way
Assume we have a Hypothesis(H) and evidence(E),
According to Bayes theorem, the relationship between the probability of the
Hypothesis before getting the evidence represented as P(H) and the probability
of the hypothesis after getting the evidence represented as P(H|E) is:
P(H|E) = P(E|H)*P(H)/P(E)
Prior probability = P(H) is the probability before getting the evidence
Posterior probability = P(H|E) is the probability after getting evidence
In general,
P(class|data) = (P(data|class) * P(class)) / P(data)
Naive Bayes Theorem Example
Assume we have to find the probability of the randomly picked card to be king given that
it is a face card.
There are 4 Kings in a Deck of Cards which implies that
P(King) = 4/52
as all the Kings are face Cards so
P(Face|King) = 1
there are 3 Face Cards in a Suit of 13 cards and there are 4 Suits in total so
P(Face) = 12/52
Therefore,
P(King|face) = P(face|king)*P(king)/P(face) = 1/3
for classification tasks. It's based on Bayes' theorem with a strong independence
assumption between features. Despite this "naive" assumption, Naive Bayes often
1. Bayes' Theorem:
o The core of Naive Bayes is Bayes' theorem, which relates conditional
probabilities:
P(Features)
2. Naive Assumption:
o Naive Bayes assumes that all features are independent of each other
* ... * P(FeatureN|Class)
3. Classification:
o To classify a new instance, Naive Bayes calculates the probability of
distributed.
Multinomial Naive Bayes: Suitable for discrete features, often used for text
classification.
(present or absent).
Effective for text classification: Often achieves high accuracy in tasks like
real-world data.
The formula for the Bayes theorem can be written in a variety of ways. The following
Here's a basic implementation of Naive Bayes using Python and the scikit-learn library:
Python
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
Explanation:
o Replace X and y with your actual data. X should be a 2D array of features, and
y should be a 1D array of corresponding class labels.
3. Split data:
oUse train_test_split to divide the data into training and testing sets.
This helps in evaluating the model's performance on unseen data.
4. Create and train the model:
Note:
This is a basic implementation and can be further customized based on your specific
needs and data characteristics.
For text data, you might consider using MultinomialNB or BernoulliNB
instead of GaussianNB.
Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
The decisions or the test are performed on the basis of features of the given dataset.
It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtrees.
Below diagram explains the general structure of a decision tree:
Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
The logic behind the decision tree can be easily understood because it shows a tree-
like structure.
Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created
in step -3. Continue this process until a stage is reached where you cannot further
classify the nodes and called the final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to decide whether he
should accept the offer or Not. So, to solve this problem, the decision tree starts with the root
node (Salary attribute by ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the corresponding labels. The next
decision node further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:
Advantages of the Decision Tree
It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
It can be very useful for solving decision-related problems.
It helps to think about all the possible outcomes for a problem.
There is less requirement of data cleaning compared to other algorithms.
1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease can
be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.
Although random forest can be used for both classification and regression tasks, it is
not more suitable for Regression tasks.
# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)