Multinomial Naive Bayes

Last Updated : 23 Jul, 2025

Multinomial Naive Bayes is one of the variation of Naive Bayes algorithm. A classification algorithm based on Bayes' Theorem ideal for discrete data and is typically used in text classification problems. It models the frequency of words as counts and assumes each feature or word is multinomially distributed. MNB is widely used for tasks like classifying documents based on word frequencies like in spam email detection.

How Does Multinomial Naive Bayes Work?

In Multinomial Naive bayes the word "Naive" means that the method assumes all features like words in a sentence are independent from each other and "Multinomial" refers to how many times a word appears or how often a category occurs. It works by using word counts to classify text. The main idea is that it assumes each word in a message or feature is independent of each others. This means the presence of one word doesn't affect the presence of another word which makes the model easy to use.

The model looks at how many times each word appears in messages from different categories (like "spam" or "not spam"). For example if the word "free" appears often in spam messages that will be used to help predict whether a new message is spam or not.

To calculate the probability of a message belonging to a certain category Multinomial Naive Bayes uses the multinomial distribution:

P(X) = \frac{n!}{n_1! n_2! \ldots n_m!} p_1^{n_1} p_2^{n_2} \ldots p_m^{n_m}

Where:

n is the total number of trials.
n_i is the count of occurrences for outcome i.
p_i is the probability of outcome i.

To estimate how likely each word is in a particular class like "spam" or "not spam" we use a method called Maximum Likelihood Estimation (MLE). This helps finding probabilities based on actual counts from our data. The formula is:

\quad \theta_{c,i} = \frac{\text{count}(w_i, c) + 1}{N + v}

Where:

count(wi,c) is the number of times word w_i appears in documents of class c.
\Nu is the total number of words in documents of class cc.
v is the vocabulary size.

Example

To understand how Multinomial Naive Bayes works, here's a simple example to classify whether a message is "spam" or "not spam" based on the presence of certain words.

Message ID	Message Text	Class
M1	"buy cheap now"	Spam
M2	"limited offer buy"	Spam
M3	"meet me now"	Not Spam
M4	"let's catch up"	Not Spam

1. Vocabulary

Extract all unique words from the training data:

\text{Vocabulary} = \{\text{buy, cheap, now, limited, offer, meet, me, let's, catch, up}\}

Vocabulary size V = 10

2. Word Frequencies by Class

Spam Class (M1, M2):

buy: 2
cheap: 1
now: 1
limited: 1
offer: 1

Total words: 6

Not Spam Class (M3, M4):

meet: 1
me: 1
now: 1
let's: 1
catch: 1
up: 1

Total words: 6

3. Test Message

Test Message: "\texttt{buy now}"

4. Applying Multinomial Naive Bayes Formula

P(C|d) \propto P(C) \cdot \prod_i P(w_i|C)^{f_i}

Prior Probabilities:

P(\text{Spam}) = 0.5, \quad P(\text{Not Spam}) = 0.5

Apply Laplace Smoothing:

P(w \mid C) = \frac{\text{count}(w, C) + 1}{\text{total words in } C + V}

Spam Class:

P(\text{buy} \mid \text{Spam}) = \frac{2 + 1}{6 + 10} = \frac{3}{16}
P(\text{now} \mid \text{Spam}) = \frac{1 + 1}{6 + 10} = \frac{2}{16}

P(\text{Spam} \mid d) \propto 0.5 \cdot \frac{3}{16} \cdot \frac{2}{16} = \frac{3}{256}

Not Spam Class:

P(\text{buy} \mid \text{Not Spam}) = \frac{0 + 1}{6 + 10} = \frac{1}{16}
P(\text{now} \mid \text{Not Spam}) = \frac{1 + 1}{6 + 10} = \frac{2}{16}

P(\text{Not Spam} \mid d) \propto 0.5 \cdot \frac{1}{16} \cdot \frac{2}{16} = \frac{1}{256}

5. Final Classification

Since, P(\text{Spam}|d) = \frac{3}{256} > \frac{1}{256} = P(\text{Not Spam}|d)
\boxed{\text{The message is classified as Spam}}

Python Implementation of Multinomial Naive Bayes

Let's understand it with a example of spam email detection. We'll classify emails into two categories: spam and not spam.

1. Importing Libraries:

pandas