UNIT 2
UNIT 2
Regression modelling" refers to a statistical technique used to analyse the relationship between a
dependent variable and one or more independent variables, while "multivariate analysis" involves
examining relationships between multiple dependent variables simultaneously, and "Bayesian
modelling" is a statistical approach that incorporates prior knowledge into the analysis by utilizing
probability distributions to estimate parameters, allowing for more robust inferences, especially
when dealing with complex data structures; when combined, these terms essentially describe a
sophisticated statistical method where multiple dependent variables are analysed with a Bayesian
framework, taking into account prior information about the relationships between variables.
1. Predictive Modelling:
Regression analysis is commonly used for predictive modelling, which helps businesses forecast
future outcomes. By examining historical data and identifying relationships between variables,
businesses can make informed predictions about sales, demand, customer behaviour, and other
critical factors. This can assist in inventory management, resource allocation, and strategic planning.
In business, understanding the factors that drive specific outcomes is essential. Regression analysis
can help identify which independent variables significantly impact the dependent variable. For
example, it can determine which marketing channels or advertising strategies influence sales most,
allowing businesses to allocate resources more effectively.
3. Optimizing Decision-Making
Regression analysis provides insights that enable businesses to make data-driven decisions. Whether
it's optimizing pricing strategies, production processes, or marketing campaigns, regression can help
companies allocate resources efficiently and achieve better outcomes.
4. Risk Assessment
Businesses are exposed to various risks, such as economic fluctuations, market changes, and
competitive pressures. Regression analysis-powered risk assessment techniques can be used to
assess how changes in independent variables may affect business performance. This allows for risk
mitigation strategies to be developed, helping companies prepare for potential challenges.
5. Performance Evaluation
Regression analysis can evaluate the effectiveness of different initiatives and strategies. For instance,
it can assess the impact of employee training on productivity or the relationship between customer
satisfaction and repeat purchases. This information is invaluable for making improvements and
optimizing operations.
6. Market Research
In market research, regression analysis can be used to understand consumer behavior and
preferences. By examining demographics, pricing, and product features, businesses can tailor their
products and marketing efforts to specific target audiences.
Where,
2. Multiple regression formula: Multiple regression extends linear regression by considering multiple
independent variables to predict the dependent variable. The relationship is represented as Y = a +
b₁X₁ + b₂X₂ + ... + bₙXₙ
Where,
a is the intercept.
Suppose we want to understand the relationship between a company's stock price (dependent
variable) and the company's quarterly earnings (independent variable). For several quarters, we
collect historical data on the company's earnings and stock prices. And by performing simple linear
regression, we can identify the linear relationship between earnings and stock prices, if any.
In real estate, we can predict the selling price of a house based on various factors such as area,
number of bedrooms, number of floors, and location. This is where multiple linear regression comes
into play.
Logistic regression is often used in healthcare to estimate binary outcomes, like whether a patient
will develop a particular disease. For example, we could use logistic regression to predict the
likelihood of a patient having diabetes based on factors like age, BMI, family history, and blood sugar
levels.
In biology, nonlinear regression is often used to model complex biological processes. For example, we
might want to understand the growth of a population of bacteria over time. The relationship
between time and population growth may not be linear, so a nonlinear regression model can be used
to capture the growth curve accurately.
MULTIVATRIATE ANALYSIS:
Multivariate analysis refers to a set of statistical techniques used to analyse data that involves
multiple variables. It helps understand relationships, patterns, and influences among the variables.
There are many different techniques for multivariate analysis, and they can be divided into two
categories:
Dependence techniques:
Dependence methods are used when one or some of the variables are dependent on others.
Dependence looks at cause and effect .ie These methods help in prediction, classification, and
understanding causal relationships.
Predicts the value of the dependent variable based on multiple independent variables.
Example: Predicting house prices based on size, location, and number of rooms.
2. Logistic Regression
Example: Predicting whether a customer will buy a product (yes/no) based on age, income,
and past purchases.
Example: Studying the effect of teaching methods on student performance across different
subjects.
Cluster Analysis – Groups similar observations into clusters based on their characteristics.
Correspondence Analysis – Used for categorical data to find relationships between rows and
columns of a contingency table.
Benefits:
Can improve predictive power
Bayesian Modelling:
Bayesian Model Selection is an essential statistical method used in the selection of models for data
analysis. Rooted in Bayesian statistics, this approach evaluates a set of statistical models to identify
the one that best fits the data according to Bayesian principles. The approach is characterized by its
use of probability distributions rather than point estimates, providing a robust framework for dealing
with uncertainty in model selection.
1. Bayes' Theorem
Where:
o P(H∣D) P(H|D) P(H∣D) = Posterior Probability (updated belief after seeing data)
We want to classify whether an email is spam or not spam using Bayesian inference.
Step 2: Choose a Prior: From past emails, we know that 40% of emails are spam:
P(Spam)=0.4 , P(Not Spam)=0.6P(Spam) = 0.4
The word "free" appears in 10% of non-spam emails → P(Free∣Not Spam)=0.1P(Free | Not\
Spam) = 0.1P(Free∣Not Spam)=0.1
Time series analysis is a statistical method that analyzes data collected over time to identify patterns
and trends. It can help you understand the structure of the data and predict future events.
How it works
5. Use the model to predict future values and/or impute missing values
Weather records, Economic indicators, Patient health evolution metrics, Server metrics, Application
performance monitoring, Network data, Sensor data, Events, Clicks, and Retail sales.
Time series analysis models Moving average, Exponential smoothing, Seasonal autoregressive
integrated moving average (SARIMA), and ARIMA.
1. Predict Future Trends: Time series analysis enables the prediction of future trends, allowing
businesses to anticipate market demand, stock prices, and other key variables, facilitating
proactive decision-making.
2. Detect Patterns and Anomalies: By examining sequential data points, time series analysis
helps detect recurring patterns and anomalies, providing insights into underlying behaviors
and potential outliers.
3. Risk Mitigation: By spotting potential risks, businesses can develop strategies to mitigate
them, enhancing overall risk management.
4. Strategic Planning: Time series insights inform long-term strategic planning, guiding
decision-making across finance, healthcare, and other sectors.
5. Competitive Edge: Time series analysis enables businesses to optimize resource allocation
effectively, whether it's inventory, workforce, or financial assets.