Likelihood Frequentist
Likelihood Frequentist
Likelihood Frequentist
• Introduction :Likelihood
• Likelihood describes how to find the best distribution of the data
for some feature or some situation in the data given a certain
value of some feature or situation, while probability describes
how to find the chance of something given a sample distribution
of data.
• Both methods can also be solved less efficiently using a more general
optimization algorithm such as stochastic gradient descent.
• In fact, most machine learning models can be framed under the maximum
likelihood estimation framework, providing a useful and consistent way to
approach predictive modeling as an optimization problem.
• An important benefit of the maximize likelihood estimator in machine learning
is that as the size of the dataset increases, the quality of the estimator
continues to improve.
Fitting a Line using Likelihood
• Linear Regression as Maximum Likelihood
• We can frame the problem of fitting a machine
learning model as the problem of probability density
estimation.
• Specifically, the choice of model and model
parameters is referred to as a modeling hypothesis h,
and the problem involves finding h that best explains
the data X. We can, therefore, find the modeling
hypothesis that maximizes the likelihood function.
• maximize sum i to n log(P(xi ; h))
• Supervised learning can be framed as a conditional
probability problem of predicting the probability of
the output given the input:
• P(y | X)
• As such, we can define conditional maximum
likelihood estimation for supervised machine
learning as follows:
• maximize sum i to n log(P(yi|xi ; h))
• Now we can replace h with our linear regression
model.
• We can make some reasonable assumptions, such as the
observations in the dataset are independent and drawn
from the same probability distribution (i.i.d.), and that the
target variable (y) has statistical noise with a Gaussian
distribution, zero mean, and the same variance for all
examples.
• With these assumptions, we can frame the problem of
estimating y given X as estimating the mean value for y from
a Gaussian probability distribution given X.
• The analytical form of the Gaussian function is as follows:
• f(x) = (1 / sqrt(2 * pi * sigma^2)) * exp(- 1/(2 * sigma^2) * (y
– mu)^2 )
• Where mu is the mean of the distribution and sigma^2 is
the variance where the units are squared.
• We can use this function as our likelihood function, where mu is defined
as the prediction from the model with a given set of coefficients (Beta)
and sigma is a fixed constant.
• First, we can state the problem as the maximization of the product of the
probabilities for each example in the dataset:
• maximize product i to n (1 / sqrt(2 * pi * sigma^2)) * exp(-1/(2 * sigma^2)
* (yi – h(xi, Beta))^2)
• Where xi is a given example and Beta refers to the coefficients of the
linear regression model. We can transform this to a log-likelihood model
as follows:
• maximize sum i to n log (1 / sqrt(2 * pi * sigma^2)) – (1/(2 * sigma^2) * (yi
– h(xi, Beta))^2)
y1=β0+β1x1+ϵ1
y2=β0+β1x2+ϵ2
⋮
yn=β0+β1xn+ϵn