ML Assignment 1
ML Assignment 1
1.Introduction
Bayes linear regression, a statistical modelling technique, provides a
probabilistic approach to modelling relationships between variables. Unlike
traditional linear regression, which treats parameters as fixed values, Bayes
linear regression views them as random variables, allowing for a more nuanced
understanding of uncertainty and enabling more robust inferences.
2. Motivation
The primary motivation behind Bayes linear regression is to address the
limitations of traditional linear regression. By incorporating prior information
about the model parameters, Bayes linear regression can:
3. Methodology
The goal of the Bayesian Regression Model is to identify the 'posterior'
distribution again for model parameters rather than the model parameters
themselves. The model parameters will be expected to follow a distribution in
addition to the output y.
p (y|X, w. 𝛼) = N (y|Xw, 𝛼)
where the Gamma distribution prior hyper-parameter alpha is present. It is
handled as a probability calculated from the data. The Bayesian Ridge
Regression implementation is provided below.
The Bayesian Ridge Regression formula on which it is based is as follows:
p(y|λ) =N (w|0, λ^-1Ip)
where alpha is the Gamma distribution's shape parameter before the alpha
parameter and lambda is the distribution's shape parameter before the lambda
parameter.
5. Example problem
Predict the closing price of a stock based on its opening price.
Where,
βn are the parameters
ε is the error value
open price is independent variable
closed price is the dependent variable
Assume the priors to be normally distributed, then only we can apply Bayesian
linear regression.
The likelihood for each data point is:
P (Closing Price | β₀, β₁, σ²) = N (Closing Price | β₀ + β₁ * Open Price, σ²)
Where sigma squared is Standard deviation.
Assuming σ² = 10, we can calculate the likelihood for each data point as
follows:
Now we can be values of β₀ and β₁ obtained using MCMC methods which can be
used to get new samples from a probability distribution.
We get open price value of 112. So, we can write the closed price for each point
as,
5. Python code.
import pandas as pd
import numpy as np
import pymc3 as pm
# Likelihood
mu = intercept + coef * X
sigma = pm.HalfCauchy('sigma', beta=1)
likelihood = pm.Normal('likelihood', mu=mu, sigma=sigma,
observed=y)
# Sampling
trace = pm.sample(draws=1000, tune=1000)
new_data = np.array([110]).reshape(-1, 1)
predicted_prices = intercept_samples + coef_samples * new_data
6. Merits and Demerits
Merits
1. Handles Uncertainty Well:
- Bayesian linear regression gives you a range of possible values for the model
parameters, not just a single estimate. This helps you understand how confident
you can be about the predictions.
2. Built-in Regularization:
- By using prior information, Bayesian methods automatically control for
overfitting (where the model fits the training data too closely). This can be
especially helpful when you have a small dataset.
3. Flexible with Prior Knowledge:
- You can include your own expertise or previous knowledge about the
parameters in the form of priors. This can improve the model if you have useful
information before seeing the data.
4. Probabilistic Predictions:
- Instead of giving a single prediction, Bayesian linear regression provides a
range of possible outcomes, which helps in understanding and preparing for
different scenarios.
5. Natural Model Averaging:
- It naturally combines different possible models based on their probabilities,
which can often lead to better predictions than using just one model.
Demerits
1. Computationally Intensive:
- Bayesian methods can be slow and require a lot of computing power,
especially with large datasets or complex models. This is because they often use
advanced techniques like Markov Chain Monte Carlo (MCMC) for calculations.
2. Choosing Priors Can Be Hard:
- The results can depend heavily on the choice of prior distributions. Finding
the right priors can be tricky and may need expert knowledge.
3. Scalability Problems:
- Bayesian methods might not scale well with very large datasets. While there
are approximate methods to make it more feasible, they might not be as
accurate.
4. Complex to Implement:
- The process can be more complicated than traditional linear regression,
which might be a barrier if you're looking for something straightforward.
5. Interpreting Results Can Be Tough:
- Understanding the probabilistic outputs and communicating them effectively
can be more challenging compared to straightforward estimates from
traditional linear regression.
7. Conclusion
Bayes linear regression offers a robust and flexible approach to modeling
relationships between variables by incorporating Bayesian principles. Its ability
to manage uncertainty and integrate prior knowledge makes it a valuable tool in
various fields, from finance to environmental science. While it presents
challenges such as computational complexity and sensitivity to prior choices,
the benefits of comprehensive uncertainty quantification and adaptability to
new data make Bayes linear regression a powerful technique in modern
statistical and machine learning practices. As data science continues to
advance, Bayes linear regression will remain a key method for tackling complex
modeling challenges and enhancing decision-making processes.