0% found this document useful (0 votes)
15 views33 pages

11, 12. Predictive Analysis

The document provides an overview of linear and multiple regression analysis, including key concepts such as correlation, assumptions, and statistical measures. It discusses the importance of regression analysis in understanding relationships between variables, predicting outcomes, and controlling for other variables. Additionally, it introduces Structural Equation Modeling (SEM) as a multivariate technique used in marketing research to analyze relationships between observed and latent variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views33 pages

11, 12. Predictive Analysis

The document provides an overview of linear and multiple regression analysis, including key concepts such as correlation, assumptions, and statistical measures. It discusses the importance of regression analysis in understanding relationships between variables, predicting outcomes, and controlling for other variables. Additionally, it introduces Structural Equation Modeling (SEM) as a multivariate technique used in marketing research to analyze relationships between observed and latent variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Linear and

Multiple
Regression
Dr. Rajarshi Debnath
Marketing Area
FORE School of Management,
New Delhi
Agenda

• Linear Regression
• Multiple Regression

Ŷ = a + bX
Product Moment
Correlation

• The product moment correlation, r, summarizes


the strength of association between two metric
(interval or ratio scaled) variables, say X and Y.

• It is an index used to determine whether a linear


or straight-line relationship exists between X and
Y.

• As it was originally proposed by Karl Pearson, it is


also known as the Pearson correlation coefficient.
It is also referred to as simple correlation,
bivariate correlation, or merely the correlation
coefficient.
Nonmetric Correlation
• If the nonmetric variables are ordinal and numeric, Spearman's
rho,ρs , and Kendall's tau,τ, are two measures of nonmetric
correlation, which can be used to examine the correlation
between them.
• Both these measures use rankings rather than the absolute values
of the variables, and the basic concepts underlying them are quite
similar. Both vary from −1.0 to +1.0.
• In the absence of ties, Spearman's ρs yields a closer approximation
to the Pearson product moment correlation coefficient,ρ , than
Kendall's τ. In these cases, the absolute magnitude of τ tends to be
smaller than Pearson's ρ.
• When the data contain a large number of tied ranks, Kendall's τ
seems more appropriate.
Regression Analysis
Regression analysis examines associative relationships between a
metric dependent variable and one or more independent variables
in the following ways:
• Determine whether the independent variables explain a significant variation in the
dependent variable: whether a relationship exists?
• Determine how much of the variation in the dependent variable can be explained by the
independent variables: strength of the relationship?
• Determine the structure or form of the relationship: the mathematical equation relating
the independent and dependent variables?
• Predict the values of the dependent variable.
• Control for other independent variables when evaluating the contributions of a specific
variable or set of variables.
• Regression analysis is concerned with the nature and degree of association between
variables and does not imply or assume any causality.
Conducting Bivariate
Regression Analysis
Plot the Scatter Diagram

• A scatter diagram, or
scattergram, is a plot of
the values of two
variables for all the cases
or observations.
Conducting
Bivariate
Regression
Analysis
Which
Straight
Line is
Best?
Assumptions
• The error term is normally distributed. For each fixed value of X, the
distribution of Y is normal.
• The means of all these normal distributions of Y, given X, lie on a
straight line with slope b.
• The mean of the error term is 0.
• The variance of the error term is constant. This variance does not
depend on the values assumed by X.
• The error terms are uncorrelated. In other words, the observations
have been drawn independently.
Model (fit) summary

Important ANOVA (choice based)

Tables in Model Coefficients


Linear
Residual Statistics (Only if Durbin-Watson is
Regression implemented)

Assumption checks
r- Correlation

r2 – Explains % of variance

Model Adjusted r2- Presents better estimate of


population
Summary Standard Error of estimate- standard deviation
of expected values for dependent variable

Durbin Watson- statistic is a test statistic used to


detect the presence of autocorrelation
Sum of Squares- for regression it is the between group sum of
squares; for residuals, within the group sum of squares.

DF- for regression, number of independent variables (1 in this


case). For residuals, number of subjects minus number of
independent variables.

ANOVA Mean Square- sum of squares divided by degree of freedom

F- mean square regression divided by mean square residual

Sig- likelihood that this result could occur by chance.


B- coefficient and constant for linear regression equation.

Std.Error- Standard error of B: a measure of stability or sampling


error of the B- values. It is standard deviation of B- Values given
in large number of samples drawn from same population.

Coefficients Beta- standardized Regression Coefficients

t- B divided by standard error of B

Sig- Likelihood that this result could occur by Chance.


Case Study:
Medical
Practitioner
Identify the relationship between:

• Job Satisfaction and Burnout


Multicollinearity
• Multicollinearity (also collinearity) is a
phenomenon in which two or more
predictor variables in a multiple
regression model are highly correlated,
meaning that one can be linearly
predicted from the others with a
substantial degree of accuracy.

•variance inflation
factor (VIF)>5
Linear regression makes
several key assumptions:

• Assumption#0: Measurement of variable


• Assumption#1: Linear relationship
• Assumption#2: Multivariate normality
• Assumption#3: No or little multicollinearity
• Assumption#4: No auto-correlation
• Assumption#5: Homoscedasticity
Objectives of Regression Analysis

Prediction Explanation- Magnitude, Sign, and Research Design


Significance
Sample size: 1:10 (variable:sample) minimum
Variables: Metric
Multiple Regression (1 of 2)
The general form of the multiple regression model is as follows:

Y = β0+ β1X1+ β2X2+ β3X3+…+ βkXk+ e

which is estimated by the following equation:

Ŷ = a + b1X1+ b2X2+ b3X3+…+ bkXk

As before, the coefficient a represents the intercept, but the b's are
now the partial regression coefficients.
Statistics Associated with Multiple Regression
(1 of 2)
• Adjusted R2. R2, coefficient of multiple determination, is adjusted for
the number of independent variables and the sample size to account
for the diminishing returns. After the first few variables, the
additional independent variables do not make much contribution.

• Coefficient of multiple determination. The strength of association in


multiple regression is measured by the square of the multiple
correlation coefficient, R2, which is also called the coefficient of
multiple determination.

• F test. The F test is used to test the null hypothesis that the
coefficient of multiple determination in the population, R2pop, is zero.
This is equivalent to testing the null hypothesis. The test statistic has
an F distribution with k and (n − k − 1) degrees of freedom.
Statistics Associated with Multiple Regression
(2 of 2)

• Partial F test. The significance of a partial regression coefficient,βi, of


Xi may be tested using an incremental F statistic. The incremental F
statistic is based on the increment in the explained sum of squares
resulting from the addition of the independent variable Xi to the
regression equation after all the other independent variables have
been included.

• Partial regression coefficient. The partial regression coefficient, b1,


denotes the change in the predicted value, Ŷ, per unit change in X1
when the other independent variables, X2 to Xk, are held constant.
SPSS Windows
The CORRELATE program computes Pearson product moment correlations
and partial correlations with significance levels. Univariate statistics,
covariance, and cross-product deviations may also be requested. Significance
levels are included in the output. To select these procedures using SPSS for
Windows, click:
Analyze>Correlate>Bivariate …
Analyze>Correlate>Partial …
Scatterplots can be obtained by clicking:
Graphs>Scatter >Simple>Define …
REGRESSION calculates bivariate and multiple regression equations,
associated statistics, and plots. It allows for an easy examination of residuals.
This procedure can be run by clicking:
Analyze>Regression Linear …
Structural Equation
Modelling (SEM)
Introduction to SEM
• SEM is a multivariate statistical technique used to analyze
relationships between observed and latent variables.

• Combines factor analysis and regression modeling.

• Widely used in marketing research for understanding consumer


behavior, brand loyalty, etc.
Why SEM in Marketing Research?
• Helps in testing theoretical models.

• Simultaneously examines multiple relationships.

• Accounts for measurement errors.

• Provides deeper insights than traditional regression analysis.


Key Components of SEM
• Observed Variables: Measurable items (survey responses, sales data,
etc.).
• Latent Variables: Unobserved constructs (brand trust, satisfaction,
etc.).
• Path Diagrams: Visual representation of relationships.
• Structural Model vs. Measurement Model: How variables relate vs.
how constructs are measured.
Assumptions of SEM
• Sample Size: Minimum 200-300 recommended.
• Multivariate Normality: Data should be normally distributed.
• No Multicollinearity: High correlation between variables should be
avoided.
• Model Identification: Degrees of freedom should be positive.
• Linearity: Relationships between variables should be linear.
• Measurement Invariance: Measurement scales should be consistent
across groups.
Steps in SEM
1. Model Specification: Define theoretical relationships.
2. Model Identification: Check if enough data is available.
3. Model Estimation: Use Maximum Likelihood (ML) estimation.
4. Model Evaluation: Assess model fit (CFI, RMSEA, Chi-square, etc.).
5. Model Modification: Improve the model by adjusting paths, and
adding/removing variables.
Model Fit Indices
• Chi-Square (X²): Lower is better, but sensitive to sample size.

• CFI (Comparative Fit Index): > 0.90 is acceptable.

• RMSEA (Root Mean Square Error of Approximation): < 0.08 is good fit.

• SRMR (Standardized Root Mean Square Residual): < 0.08 is


recommended.
Introduction to Jamovi for SEM
• Open-source statistical software for SEM.
• User-friendly interface.
• No coding required.
Demonstration – Running SEM in Jamovi
1. Import dataset.
2. Define observed and latent variables.
3. Specify model using path diagrams.
4. Run analysis and interpret model fit indices.
5. Check significance of paths and modify model if necessary.
Interpreting SEM Results
• Look at path coefficients (significance & direction).
• Assess model fit indices.
• Modify the model if necessary.
• Report findings with theoretical and managerial implications.
Thank you!

You might also like