0% found this document useful (0 votes)
22 views29 pages

FRM Part 1: Regression With Multiple Explanatory Variables

This document covers multiple regression analysis, focusing on distinguishing between single and multiple regression, interpreting coefficients, and addressing omitted variable bias. It explains the assumptions of multiple regression, the significance of goodness-of-fit measures like R2 and adjusted R2, and includes examples of hypothesis testing for regression coefficients. The document also discusses the importance of joint hypothesis tests in multiple regression contexts.

Uploaded by

umamahes03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views29 pages

FRM Part 1: Regression With Multiple Explanatory Variables

This document covers multiple regression analysis, focusing on distinguishing between single and multiple regression, interpreting coefficients, and addressing omitted variable bias. It explains the assumptions of multiple regression, the significance of goodness-of-fit measures like R2 and adjusted R2, and includes examples of hypothesis testing for regression coefficients. The document also discusses the importance of joint hypothesis tests in multiple regression contexts.

Uploaded by

umamahes03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

FRM Part 1

Book 2 – Quantitative Analysis

REGRESSION WITH MULTIPLE


EXPLANATORY VARIABLES
Learning Objectives
After completing this reading you should be able to:
 Distinguish between the relative assumptions of single and
multiple regression.
 Interpret regression coefficients in a multiple regression.
 Interpret goodness of fit measures for single and multiple
regressions, including R2 and adjusted-R2.
 Construct, apply and interpret joint hypothesis tests and
confidence intervals for multiple coefficients in a regression.
Omitted Variable Bias
 Omitted variable bias is said to occur when:
1. the omitted variable is correlated with the movement of the
independent variable in the model; and
2. the omitted variable is a determinant of the dependent variable.
 When a valid variable is excluded,
o assumptions of linear regression are violated;
o we underspecify the model;
o OLS estimates are biased and inconsistent.
 Example:
o When regressing GDP growth against determinants such as
interest rate, inflation, and exchange rate, leaving out one or
more of these would result in biased estimates as long as the
variable left out is a determinant of the dependent variable.
Addressing Omitted Variable Bias
 To find out if omitted variable bias is present in a statistical model,
various tests are conducted, e.g., the Ramsey test.

 If a bias is found, it can be addressed by dividing data into groups


and examining one factor at a time while holding other factors
constant.

 Multiple regression analysis helps eliminate the omitted variable


problem by incorporating multiple independent variables in the
model where the significance of each variable can be tested.
Single Vs. Multiple Regression
 Simple regression considers a single explanatory (independent)
variable and response (dependent) variable:

X Y 𝒚  𝜷𝟎 𝜷𝟏 𝒙 

 Multiple regression simultaneously considers the influence of


multiple explanatory variables on a response variable Y

𝑋1
𝑋2 𝒚 = 𝜷𝟎 + 𝜷𝟏 𝒙𝟏𝒊 + 𝜷𝟐 𝒙𝟐𝒊 + ⋯ + 𝜷𝒌 𝒙𝒌𝒊 + 𝜺𝒊
……. Y
𝑋𝑘
Multiple Linear Regression
𝒚𝒊 = 𝜷𝟎 + 𝜷𝟏 𝒙𝟏𝒊 + 𝜷𝟐 𝒙𝟐𝒊 + ⋯ + 𝜷𝒌 𝒙𝒌𝒊 + 𝜺𝒊 i = 1, 2, …n

 A slope coefficient, 𝛽𝑖 , measures how much the dependent variable,


Y, changes when the independent variable, Xj , changes by one
unit, holding all other independent variables constant.

 The intercept term and the slope coefficients are determined by


minimizing the sum of the squared error terms.
o In practice, software programs are used to estimate the multiple
regression model.
OLS Estimators in Multiple
Regression
𝒚𝒊 = 𝜷𝟎 + 𝜷𝟏 𝒙𝟏𝒊 + 𝜷𝟐 𝒙𝟐𝒊 + ⋯ + 𝜷𝒌 𝒙𝒌𝒊 + 𝜺𝒊 i = 1, 2, …n

Ordinary least squares method

estimates
estimates

Minimizing the sum of squared error terms,


σ𝑛𝑖=1 𝜀𝑖

෡𝟎 + 𝜷
ෝ𝒊 = 𝜷
𝒚 ෡ 𝟏 𝒙𝟏𝒊 + 𝜷
෡ 𝟐 𝒙𝟐𝒊 + ⋯ + 𝜷
෡ 𝒌 𝒙𝒌𝒊 + 𝜺𝒊 i = 1, 2, …n
Assumptions of Multiple
Regression
1. The relationship between the dependent variable, Y, and the
independent variables, X1, X2, . . . , Xk, is linear.
2. The independent variables (X1, X2, . . . , Xk) are not random.
3. Also, no exact linear relation exists between two or more of the
independent variables.
4. The expected value of the error term, conditioned on the
independent variables, is 0: E(| X1, X2, . . . , Xk) = 0.
5. The variance of the error term is the same for all observations.
6. The error term is uncorrelated across observations.
7. The error term is normally distributed.
Multiple Regression Model
An Example
 An economist tests the hypothesis that GDP growth in a certain
country can be explained by interest rates and inflation.
 Using some 30 observations, the analyst formulates the following
regression equation:
𝐆𝐃𝐏 𝐠𝐫𝐨𝐰𝐭𝐡 = 𝛃 ෡𝟎 + 𝛃෡𝟏 𝐈𝐧𝐭𝐞𝐫𝐞𝐬𝐭 + 𝛃
෡𝟐 𝐈𝐧𝐟𝐥𝐚𝐭𝐢𝐨𝐧
 Regression estimates are as follows:
Coefficient Standard error
Intercept 0.03 0.5%
Interest rates 0.20 5%
Inflation 0.15 3%

 How do we interpret these results?


Multiple Regression Model
An Example
෡𝟎 + 𝛃
𝐆𝐃𝐏 𝐠𝐫𝐨𝐰𝐭𝐡 = 𝛃 ෡𝟏 𝐈𝐧𝐭𝐞𝐫𝐞𝐬𝐭 + 𝛃
෡𝟐 𝐈𝐧𝐟𝐥𝐚𝐭𝐢𝐨𝐧

Coefficient Standard error


Intercept 0.03 0.5%
Interest rates 0.20 5%
Inflation 0.15 3%
Interpretation
 Intercept term: If the interest rate is zero and inflation is zero, we would
expect the GDP growth rate to be 3%.
 Interest rate coefficient: If interest rate increases by 1%, we would expect
the GDP growth rate to increase by 0.20%, holding inflation constant.
 Inflation rate coefficient: If inflation rate increases by 1%, we would expect
the GDP growth rate to increase by 0.15%, holding interest rate constant.
𝐑 𝟐

 To determine the accuracy within which the OLS regression line fits
the data, we apply the coefficient of determination and the
regression’s standard error.

 The coefficient of determination, represented by R2 , is a measure


of the “goodness of fit” of the regression.
o It is interpreted as the percentage of variation in the dependent
variable explained by the independent variable.
𝐑 𝟐

 𝐑𝟐 can be expressed mathematically as the ratio of the explained


sum of squares (ESS) to the total sum of squares (TSS).
 For ESS, the squared deviations of the predicted values of 𝐘𝐢 , 𝐘෡𝐢 ,
from their average are summed up.

ESS = σ𝐧𝐣=𝟏(𝐘෡𝐢 − 𝐘)
ത 𝟐
 The sum of squared deviations of 𝐘𝐢 from its average is referred to
as the total sum of squares.
ഥ )𝟐
TSS = σ𝒏𝒋=𝟏(𝒀𝒊 − 𝒀
 Therefore:
𝐄𝐒𝐒
𝐑𝟐 =
𝐓𝐒𝐒
𝐑 𝟐

Unexplained/residual sum of
squares ( yi  yˆi )2
y yi
yˆ  ˆ  ˆ x
i 0 1 i

Total sum of
squares ( yi  y )2 Explained sum of
squares ( yˆi  y )2
y
x
xi
𝐑 𝟐

 It is important to note that:


TSS = ESS + SSR
Where SSR is the unexplained (residual) sum of squares.

 Therefore:
𝐒𝐒𝐑
𝐑𝟐 =𝟏−
𝐓𝐒𝐒

 The square of the correlation coefficient between Y and X gives


the 𝐑𝟐 of the regression Y on the single regressor X.
Coefficient of Correlation, r
 It measures the strength of the linear association between the
independent variable and the dependent variable; it is the square
root of 𝐑𝟐 .
Perfect Negative No Linear Perfect Positive
Correlation Correlation Correlation

–1.0 –.5 0 +.5 +1.0

Increasing degree of negative Increasing degree of positive


correlation correlation
𝐑 𝟐

 To determine the accuracy within which the OLS regression


line fits the data, we apply the coefficient of determination and
the regression’s standard error.

 The coefficient of determination, represented by R2 , is a


measure of the “goodness of fit” of the regression.
o It is interpreted as the percentage of variation in the
dependent variable explained by the independent variables

Total variation - Unexplained variation


𝐑𝟐 =
Total variation
Adjusted 𝐑 𝟐

 However, 𝑅2 is not a reliable indicator of the explanatory power of


a multiple regression model.
o Why? 𝑅2 almost always increases as new independent
variables are added to the model, even if the marginal
contribution of the new variable is not statistically significant.
 Thus, a high 𝑅2 may reflect the impact of a large set of
independents rather than how well the set explains the dependent.
o This problem is solved by use of the adjusted 𝐑𝟐 .
Adjusted 𝐑 𝟐

n-1
Adjusted 𝐑𝟐 = 1 - × 1 - 𝐑𝟐
n-k-1

n = number of observations
k = number of independent variables
While adding new independents always increases 𝐑𝟐 , such a move
may either increase or decrease the adjusted 𝐑𝟐
Adjusted 𝐑 𝟐

Example (part 1)
 An analyst runs a regression of monthly value-stock returns on:
o Four independent variables;
o Over 48 months.
o The total sum of squares for the regression is 360; and
o The sum of squared errors is 120.
 Calculate the 𝐑𝟐 and adjusted 𝐑𝟐 .
Solution
𝟐
360 − 120
𝐑 = = 66.7%
360
𝟐
𝑛−1
𝐀𝐝𝐣𝐮𝐬𝐭𝐞𝐝 𝐑 = 1 − × (1− R2 )
𝑛−𝑘−1
𝟐
48 − 1
𝐀𝐝𝐣𝐮𝐬𝐭𝐞𝐝 𝐑 = 1 − × (1 − 0.667) = 63.6%
48 − 4 − 1
Adjusted 𝐑 𝟐

Example (part 2)
 The analyst now adds four more independent variables to the
regression.
 The new 𝐑𝟐 increases to 69%.
 Which model would the analyst most likely prefer?
Solution
𝑵𝒆𝒘 𝑹𝟐 = 69%
48−1
𝐍𝐞𝐰 𝐀𝐝𝐣𝐮𝐬𝐭𝐞𝐝 𝐑𝟐 = 1 − × (1 − 0.69) = 62.6%
48−8−1
 The analyst would prefer the first model because it has a higher
adjusted 𝐑𝟐 and the model has four independent variables as
opposed to eight.
Hypothesis Test for a Single
Coefficient
 We may want to test the significance of a single regression coefficient in a
multiple regression.
 As in the simple case, such a test makes use of the t-statistic:

෡𝐣
𝛃 𝜷𝑯𝟎
t-statistic = -

𝑺𝜷෡ 𝒋
Value of estimate
Estimated regression under 𝐻0
coefficient
Standard error of
estimated coefficient

 The t-statistic has n – k – 1 degrees of freedom where k = number of


independents.
Hypothesis Test for a Single
Coefficient
Example
 An economist tests the hypothesis that GDP growth in a certain
country can be explained by interest rates and inflation.
 Using some 30 observations, the analyst formulates the following
regression equation:
𝐆𝐃𝐏 𝐠𝐫𝐨𝐰𝐭𝐡 = 𝛃 ෡𝟎 + 𝛃෡𝟏 𝐈𝐧𝐭𝐞𝐫𝐞𝐬𝐭 + 𝛃
෡𝟐 𝐈𝐧𝐟𝐥𝐚𝐭𝐢𝐨𝐧
 Regression estimates are as follows:
Coefficient Standard error
Intercept 0.10 0.5%
Interest rates 0.20 0.05
Inflation 0.15 0.03

 Is the coefficient for interest rates significant at 5%?


Hypothesis Test for a Single
Coefficient
Solution
Coefficient Standard error
Intercept 0.10 0.5%
Interest rates 0.20 0.05
Inflation 0.15 0.03

 We have:
o GDP growth = 0.10 + 0.20 Int + 0.15(Inf)
 Hypothesis: 𝐻0 : 𝛽መ1 = 0 vs 𝐻1 : 𝛽መ1 ≠ 0
0.20−0
o The test statistic = =4
0.05
o The critical value = 𝑡𝛼,𝑛−𝑘−1 = 𝑡0.025,27 = 2.052
2

 Decision: Since test statistic > t-critical, we reject 𝐻0 .


 Conclusion: Interest rate coefficient is significant at the 5% level.
Confidence Intervals for a Single
Coefficient
 The confidence interval for a regression coefficient in multiple
regression is calculated and interpreted the same way as it is in
simple linear regression.

CI = ෡𝐣
𝛃 ± 𝒕𝒄 × 𝑺𝜷෡ 𝒋

Estimated regression Critical t-value Standard error


coefficient of regression
coefficient
Joint Hypothesis Tests
 In a multiple regression, we cannot test the null hypothesis that all
slope coefficients equal 0 based on t-tests that each individual slope
coefficient equals 0.
o Why? Individual tests do not account for the effects
of interactions among the independent variables.
 For this reason, we conduct the F-test.
o The F-test tests the null hypothesis that all of the slope
coefficients in the multiple regression model are jointly equal to
0.
෡𝟏 = 𝛃
𝐇𝟎 : 𝛃 ෡𝟐 = 𝛃
෡𝟑 = ⋯ = 𝛃 ෡𝐤 = 𝟎
vs
෡𝐣 ≠ 𝟎
𝐇𝟏 : 𝐀𝐭 𝐥𝐞𝐚𝐬𝐭 𝟏 𝛃
Joint Hypothesis Tests
 The F-statistic, which is always a one-tailed test, is calculated as:

ESS
𝑘
𝐹𝑛−𝑘−1 = k

SSR

n–k-1
 Where
o n = number of observations
o k = number of independent variables
o ESS = explained sum of squares
o SSR = sum of squared residuals
Joint Hypothesis Tests
 To determine whether at least one of the coefficients is statistically
significant, the calculated F-statistic is compared with the one-tailed critical
F-value, at the appropriate level of significance.
 Decision rule: Reject 𝑯𝟎 if F (test-statistic) > 𝐅𝐜 (critical value)

Rejection of the null hypothesis at a stated level of significance indicates


that at least one of the coefficients is significantly different than zero, i.e, at
least one of the independent variables in the regression model makes a
significant contribution to the dependent variable.
Joint Hypothesis Tests
Example
 An analyst runs a regression of monthly value-stock returns on four independent
variables over 48 months.
 The total sum of squares for the regression is 360, and the sum of squared
errors is 120.
 Test the null hypothesis at the 5% significance level (95% confidence) that all
the four independent variables are equal to zero.
Solution
 H0 : β෠ 1 = β෠ 2 = β෠ 3 = β෠ 4 = 0 vs H1 : At least 1 β෠ j ≠ 0
 ESS = TSS – SSR = 360 – 120 = 240
𝐸𝑆𝑆
𝑘
 The calculated test statistic = 𝑆𝑆𝑅
𝑛−𝑘−1
o = (240/4)/(120/43) = 21.5
4
 𝐹43 is approximately 2.59 at 5% significance level.
 Decision: Reject 𝐻0 .
 Conclusion: at least one of the 4 independents is significantly different than
zero.
Book 2 – Quantitative Analysis

REGRESSION WITH MULTIPLE


EXPLANATORY VARIABLES
Learning Objectives Recap
 Distinguish between the relative assumptions of single and multiple regression.
 Interpret regression coefficients in a multiple regression.
 Interpret goodness of fit measures for single and multiple regressions, including
R2 and adjusted-R2.
 Construct, apply and interpret joint hypothesis tests and confidence intervals for
multiple coefficients in a regression.

You might also like