FRM Part 1
Book 2 – Quantitative Analysis
REGRESSION WITH MULTIPLE
EXPLANATORY VARIABLES
Learning Objectives
After completing this reading you should be able to:
Distinguish between the relative assumptions of single and
multiple regression.
Interpret regression coefficients in a multiple regression.
Interpret goodness of fit measures for single and multiple
regressions, including R2 and adjusted-R2.
Construct, apply and interpret joint hypothesis tests and
confidence intervals for multiple coefficients in a regression.
Omitted Variable Bias
Omitted variable bias is said to occur when:
1. the omitted variable is correlated with the movement of the
independent variable in the model; and
2. the omitted variable is a determinant of the dependent variable.
When a valid variable is excluded,
o assumptions of linear regression are violated;
o we underspecify the model;
o OLS estimates are biased and inconsistent.
Example:
o When regressing GDP growth against determinants such as
interest rate, inflation, and exchange rate, leaving out one or
more of these would result in biased estimates as long as the
variable left out is a determinant of the dependent variable.
Addressing Omitted Variable Bias
To find out if omitted variable bias is present in a statistical model,
various tests are conducted, e.g., the Ramsey test.
If a bias is found, it can be addressed by dividing data into groups
and examining one factor at a time while holding other factors
constant.
Multiple regression analysis helps eliminate the omitted variable
problem by incorporating multiple independent variables in the
model where the significance of each variable can be tested.
Single Vs. Multiple Regression
Simple regression considers a single explanatory (independent)
variable and response (dependent) variable:
X Y 𝒚 𝜷𝟎 𝜷𝟏 𝒙
Multiple regression simultaneously considers the influence of
multiple explanatory variables on a response variable Y
𝑋1
𝑋2 𝒚 = 𝜷𝟎 + 𝜷𝟏 𝒙𝟏𝒊 + 𝜷𝟐 𝒙𝟐𝒊 + ⋯ + 𝜷𝒌 𝒙𝒌𝒊 + 𝜺𝒊
……. Y
𝑋𝑘
Multiple Linear Regression
𝒚𝒊 = 𝜷𝟎 + 𝜷𝟏 𝒙𝟏𝒊 + 𝜷𝟐 𝒙𝟐𝒊 + ⋯ + 𝜷𝒌 𝒙𝒌𝒊 + 𝜺𝒊 i = 1, 2, …n
A slope coefficient, 𝛽𝑖 , measures how much the dependent variable,
Y, changes when the independent variable, Xj , changes by one
unit, holding all other independent variables constant.
The intercept term and the slope coefficients are determined by
minimizing the sum of the squared error terms.
o In practice, software programs are used to estimate the multiple
regression model.
OLS Estimators in Multiple
Regression
𝒚𝒊 = 𝜷𝟎 + 𝜷𝟏 𝒙𝟏𝒊 + 𝜷𝟐 𝒙𝟐𝒊 + ⋯ + 𝜷𝒌 𝒙𝒌𝒊 + 𝜺𝒊 i = 1, 2, …n
Ordinary least squares method
estimates
estimates
Minimizing the sum of squared error terms,
σ𝑛𝑖=1 𝜀𝑖
𝟎 + 𝜷
ෝ𝒊 = 𝜷
𝒚 𝟏 𝒙𝟏𝒊 + 𝜷
𝟐 𝒙𝟐𝒊 + ⋯ + 𝜷
𝒌 𝒙𝒌𝒊 + 𝜺𝒊 i = 1, 2, …n
Assumptions of Multiple
Regression
1. The relationship between the dependent variable, Y, and the
independent variables, X1, X2, . . . , Xk, is linear.
2. The independent variables (X1, X2, . . . , Xk) are not random.
3. Also, no exact linear relation exists between two or more of the
independent variables.
4. The expected value of the error term, conditioned on the
independent variables, is 0: E(| X1, X2, . . . , Xk) = 0.
5. The variance of the error term is the same for all observations.
6. The error term is uncorrelated across observations.
7. The error term is normally distributed.
Multiple Regression Model
An Example
An economist tests the hypothesis that GDP growth in a certain
country can be explained by interest rates and inflation.
Using some 30 observations, the analyst formulates the following
regression equation:
𝐆𝐃𝐏 𝐠𝐫𝐨𝐰𝐭𝐡 = 𝛃 𝟎 + 𝛃𝟏 𝐈𝐧𝐭𝐞𝐫𝐞𝐬𝐭 + 𝛃
𝟐 𝐈𝐧𝐟𝐥𝐚𝐭𝐢𝐨𝐧
Regression estimates are as follows:
Coefficient Standard error
Intercept 0.03 0.5%
Interest rates 0.20 5%
Inflation 0.15 3%
How do we interpret these results?
Multiple Regression Model
An Example
𝟎 + 𝛃
𝐆𝐃𝐏 𝐠𝐫𝐨𝐰𝐭𝐡 = 𝛃 𝟏 𝐈𝐧𝐭𝐞𝐫𝐞𝐬𝐭 + 𝛃
𝟐 𝐈𝐧𝐟𝐥𝐚𝐭𝐢𝐨𝐧
Coefficient Standard error
Intercept 0.03 0.5%
Interest rates 0.20 5%
Inflation 0.15 3%
Interpretation
Intercept term: If the interest rate is zero and inflation is zero, we would
expect the GDP growth rate to be 3%.
Interest rate coefficient: If interest rate increases by 1%, we would expect
the GDP growth rate to increase by 0.20%, holding inflation constant.
Inflation rate coefficient: If inflation rate increases by 1%, we would expect
the GDP growth rate to increase by 0.15%, holding interest rate constant.
𝐑 𝟐
To determine the accuracy within which the OLS regression line fits
the data, we apply the coefficient of determination and the
regression’s standard error.
The coefficient of determination, represented by R2 , is a measure
of the “goodness of fit” of the regression.
o It is interpreted as the percentage of variation in the dependent
variable explained by the independent variable.
𝐑 𝟐
𝐑𝟐 can be expressed mathematically as the ratio of the explained
sum of squares (ESS) to the total sum of squares (TSS).
For ESS, the squared deviations of the predicted values of 𝐘𝐢 , 𝐘𝐢 ,
from their average are summed up.
ESS = σ𝐧𝐣=𝟏(𝐘𝐢 − 𝐘)
ത 𝟐
The sum of squared deviations of 𝐘𝐢 from its average is referred to
as the total sum of squares.
ഥ )𝟐
TSS = σ𝒏𝒋=𝟏(𝒀𝒊 − 𝒀
Therefore:
𝐄𝐒𝐒
𝐑𝟐 =
𝐓𝐒𝐒
𝐑 𝟐
Unexplained/residual sum of
squares ( yi yˆi )2
y yi
yˆ ˆ ˆ x
i 0 1 i
Total sum of
squares ( yi y )2 Explained sum of
squares ( yˆi y )2
y
x
xi
𝐑 𝟐
It is important to note that:
TSS = ESS + SSR
Where SSR is the unexplained (residual) sum of squares.
Therefore:
𝐒𝐒𝐑
𝐑𝟐 =𝟏−
𝐓𝐒𝐒
The square of the correlation coefficient between Y and X gives
the 𝐑𝟐 of the regression Y on the single regressor X.
Coefficient of Correlation, r
It measures the strength of the linear association between the
independent variable and the dependent variable; it is the square
root of 𝐑𝟐 .
Perfect Negative No Linear Perfect Positive
Correlation Correlation Correlation
–1.0 –.5 0 +.5 +1.0
Increasing degree of negative Increasing degree of positive
correlation correlation
𝐑 𝟐
To determine the accuracy within which the OLS regression
line fits the data, we apply the coefficient of determination and
the regression’s standard error.
The coefficient of determination, represented by R2 , is a
measure of the “goodness of fit” of the regression.
o It is interpreted as the percentage of variation in the
dependent variable explained by the independent variables
Total variation - Unexplained variation
𝐑𝟐 =
Total variation
Adjusted 𝐑 𝟐
However, 𝑅2 is not a reliable indicator of the explanatory power of
a multiple regression model.
o Why? 𝑅2 almost always increases as new independent
variables are added to the model, even if the marginal
contribution of the new variable is not statistically significant.
Thus, a high 𝑅2 may reflect the impact of a large set of
independents rather than how well the set explains the dependent.
o This problem is solved by use of the adjusted 𝐑𝟐 .
Adjusted 𝐑 𝟐
n-1
Adjusted 𝐑𝟐 = 1 - × 1 - 𝐑𝟐
n-k-1
n = number of observations
k = number of independent variables
While adding new independents always increases 𝐑𝟐 , such a move
may either increase or decrease the adjusted 𝐑𝟐
Adjusted 𝐑 𝟐
Example (part 1)
An analyst runs a regression of monthly value-stock returns on:
o Four independent variables;
o Over 48 months.
o The total sum of squares for the regression is 360; and
o The sum of squared errors is 120.
Calculate the 𝐑𝟐 and adjusted 𝐑𝟐 .
Solution
𝟐
360 − 120
𝐑 = = 66.7%
360
𝟐
𝑛−1
𝐀𝐝𝐣𝐮𝐬𝐭𝐞𝐝 𝐑 = 1 − × (1− R2 )
𝑛−𝑘−1
𝟐
48 − 1
𝐀𝐝𝐣𝐮𝐬𝐭𝐞𝐝 𝐑 = 1 − × (1 − 0.667) = 63.6%
48 − 4 − 1
Adjusted 𝐑 𝟐
Example (part 2)
The analyst now adds four more independent variables to the
regression.
The new 𝐑𝟐 increases to 69%.
Which model would the analyst most likely prefer?
Solution
𝑵𝒆𝒘 𝑹𝟐 = 69%
48−1
𝐍𝐞𝐰 𝐀𝐝𝐣𝐮𝐬𝐭𝐞𝐝 𝐑𝟐 = 1 − × (1 − 0.69) = 62.6%
48−8−1
The analyst would prefer the first model because it has a higher
adjusted 𝐑𝟐 and the model has four independent variables as
opposed to eight.
Hypothesis Test for a Single
Coefficient
We may want to test the significance of a single regression coefficient in a
multiple regression.
As in the simple case, such a test makes use of the t-statistic:
𝐣
𝛃 𝜷𝑯𝟎
t-statistic = -
𝑺𝜷 𝒋
Value of estimate
Estimated regression under 𝐻0
coefficient
Standard error of
estimated coefficient
The t-statistic has n – k – 1 degrees of freedom where k = number of
independents.
Hypothesis Test for a Single
Coefficient
Example
An economist tests the hypothesis that GDP growth in a certain
country can be explained by interest rates and inflation.
Using some 30 observations, the analyst formulates the following
regression equation:
𝐆𝐃𝐏 𝐠𝐫𝐨𝐰𝐭𝐡 = 𝛃 𝟎 + 𝛃𝟏 𝐈𝐧𝐭𝐞𝐫𝐞𝐬𝐭 + 𝛃
𝟐 𝐈𝐧𝐟𝐥𝐚𝐭𝐢𝐨𝐧
Regression estimates are as follows:
Coefficient Standard error
Intercept 0.10 0.5%
Interest rates 0.20 0.05
Inflation 0.15 0.03
Is the coefficient for interest rates significant at 5%?
Hypothesis Test for a Single
Coefficient
Solution
Coefficient Standard error
Intercept 0.10 0.5%
Interest rates 0.20 0.05
Inflation 0.15 0.03
We have:
o GDP growth = 0.10 + 0.20 Int + 0.15(Inf)
Hypothesis: 𝐻0 : 𝛽መ1 = 0 vs 𝐻1 : 𝛽መ1 ≠ 0
0.20−0
o The test statistic = =4
0.05
o The critical value = 𝑡𝛼,𝑛−𝑘−1 = 𝑡0.025,27 = 2.052
2
Decision: Since test statistic > t-critical, we reject 𝐻0 .
Conclusion: Interest rate coefficient is significant at the 5% level.
Confidence Intervals for a Single
Coefficient
The confidence interval for a regression coefficient in multiple
regression is calculated and interpreted the same way as it is in
simple linear regression.
CI = 𝐣
𝛃 ± 𝒕𝒄 × 𝑺𝜷 𝒋
Estimated regression Critical t-value Standard error
coefficient of regression
coefficient
Joint Hypothesis Tests
In a multiple regression, we cannot test the null hypothesis that all
slope coefficients equal 0 based on t-tests that each individual slope
coefficient equals 0.
o Why? Individual tests do not account for the effects
of interactions among the independent variables.
For this reason, we conduct the F-test.
o The F-test tests the null hypothesis that all of the slope
coefficients in the multiple regression model are jointly equal to
0.
𝟏 = 𝛃
𝐇𝟎 : 𝛃 𝟐 = 𝛃
𝟑 = ⋯ = 𝛃 𝐤 = 𝟎
vs
𝐣 ≠ 𝟎
𝐇𝟏 : 𝐀𝐭 𝐥𝐞𝐚𝐬𝐭 𝟏 𝛃
Joint Hypothesis Tests
The F-statistic, which is always a one-tailed test, is calculated as:
ESS
𝑘
𝐹𝑛−𝑘−1 = k
SSR
n–k-1
Where
o n = number of observations
o k = number of independent variables
o ESS = explained sum of squares
o SSR = sum of squared residuals
Joint Hypothesis Tests
To determine whether at least one of the coefficients is statistically
significant, the calculated F-statistic is compared with the one-tailed critical
F-value, at the appropriate level of significance.
Decision rule: Reject 𝑯𝟎 if F (test-statistic) > 𝐅𝐜 (critical value)
Rejection of the null hypothesis at a stated level of significance indicates
that at least one of the coefficients is significantly different than zero, i.e, at
least one of the independent variables in the regression model makes a
significant contribution to the dependent variable.
Joint Hypothesis Tests
Example
An analyst runs a regression of monthly value-stock returns on four independent
variables over 48 months.
The total sum of squares for the regression is 360, and the sum of squared
errors is 120.
Test the null hypothesis at the 5% significance level (95% confidence) that all
the four independent variables are equal to zero.
Solution
H0 : β 1 = β 2 = β 3 = β 4 = 0 vs H1 : At least 1 β j ≠ 0
ESS = TSS – SSR = 360 – 120 = 240
𝐸𝑆𝑆
𝑘
The calculated test statistic = 𝑆𝑆𝑅
𝑛−𝑘−1
o = (240/4)/(120/43) = 21.5
4
𝐹43 is approximately 2.59 at 5% significance level.
Decision: Reject 𝐻0 .
Conclusion: at least one of the 4 independents is significantly different than
zero.
Book 2 – Quantitative Analysis
REGRESSION WITH MULTIPLE
EXPLANATORY VARIABLES
Learning Objectives Recap
Distinguish between the relative assumptions of single and multiple regression.
Interpret regression coefficients in a multiple regression.
Interpret goodness of fit measures for single and multiple regressions, including
R2 and adjusted-R2.
Construct, apply and interpret joint hypothesis tests and confidence intervals for
multiple coefficients in a regression.