BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
INSTRUCTION DIVISION
FIRST SEMESTER 2023-2024
BITS F464 – Machine Learning
Assignment #1
Weightage: 10% (20 Marks)
Due Date: 09/11/2023
Bayesian Machine Learning
Introduction*
Bayesian ML is a paradigm for constructing statistical models based on Bayes’
Theorem:
p(θ|x)=p(x|θ)p(θ)/p(x)
Generally speaking, the goal of Bayesian ML is to estimate the posterior distribution
𝑝(𝜃|𝑥) given the likelihood 𝑝(𝑥|𝜃) and the prior distribution, 𝑝(𝜃). The likelihood is
something that can be estimated from the training data.
In fact, that’s exactly what we’re doing when training a regular machine learning
model. We’re performing Maximum Likelihood Estimation, an iterative process which
updates the model’s parameters in an attempt to maximize the probability of seeing
the training data 𝑥 having already seen the model parameters 𝜃.
So how does the Bayesian paradigm differ? Well, things get turned on their head in
that in this instance we actually seek to maximize the posterior distribution which
takes the training data as fixed and determines the probability of any parameter
setting 𝜃 given that data. We call this process Maximum a Posteriori (MAP). It’s easier,
however, to think about it in terms of the likelihood function. By Bayes’ Theorem we
can write the posterior as
p(θ|x)∝ p(x|θ)p(θ)
Here, we leave out the denominator, 𝑝(𝑥), because we are taking the maximization
with respect to 𝜃 which 𝑝(𝑥) does not depend on. Therefore, we can ignore it in the
maximization procedure. The key piece of the puzzle which leads Bayesian models to
differ from their classical counterparts trained by MLE is the inclusion of the term
𝑝(𝜃). We call this the prior distribution over 𝜃.
The idea is that its purpose is to encode our beliefs about the model’s parameters
before we’ve even seen them. That’s to say, we can often make reasonable
assumptions about the “suitability” of different parameter configurations based simply
on what we know about the problem domain and the laws of statistics. For example,
it’s pretty common to use a Gaussian prior over the model’s parameters. This means
we assume that they’re drawn from a normal distribution having some mean and
variance. This distribution’s classic bell-curved shape consolidates most of its mass
close to the mean while values towards its tails are rather rare.
By using such a prior, we’re effectively stating a belief that most of the model’s weights
will fall in some narrow range about a mean value with the exception of a few outliers,
and this is pretty reasonable given what we know about most real-world phenomena.
It turns out that using these prior distributions and performing MAP is equivalent to
performing MLE in the classical sense along with the addition of regularization. There’s
a pretty easy mathematical proof of this fact that we won’t go into here, but the gist is
that by constraining the acceptable model weights via the prior we’re effectively
imposing a regularizer.
* Bayesian machine learning, September 3, 2020 by DataRobot
Implement the following algorithms:
1. Naive Bayes’ Classifier
2. Bayesian Belief Networks
3. Bayesian Linear Regression
4. Expectation Maximization Clustering
Group Information: You can work in groups of at most THREE (03)! Spread sheet in which you can
mark your group:
[Link]
6OHTxrBJlAXXU6PXQR56VQKWfnu_NP90wT6e6mY/edit?usp=sharing
What you need to submit?
Submit a report in which you describe the above algorithms. You also need to give details of the data
used and the results obtained. You also need to submit code files for each algorithm! Each group will
submit everything in a single ZIP file with name – GroupXX, where XX is your group number.
Grading: We will evaluate your submitted files and call you for a viva! You will be evaluated mainly
based on what you have understood and not based on what you have submitted. Mere submitting the
assignment (and not appearing for the viva) does not entitle you to any marks. All members of the
group need to be present for the viva and there will be differential marking.
Navneet Goyal