0% found this document useful (0 votes)
7 views145 pages

Notes Scatterplots

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views145 pages

Notes Scatterplots

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 145

Linear Regression

CHAPTER 3: SCATTERPLOTS AND CORRELATION


Describing a scatterplot

Scatterplots examine the relationship between


Describing a scatterplot

Scatterplots examine the relationship between 2 quantitative variables.


Describing a scatterplot

Scatterplots examine the relationship between 2 quantitative variables.


Explanatory Variable:

Response Variable:
Describing a scatterplot

Scatterplots examine the relationship between 2 quantitative variables.


Explanatory Variable:
The independent variable – Median
Income

Response Variable:
Describing a scatterplot

Scatterplots examine the relationship between 2 quantitative variables.


Explanatory Variable:
The independent variable – Median
Income

Response Variable:
The dependent variable – Crime rate
Describing a scatterplot

 Scatterplots are described by 3 things:


Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 Direction: Positive or negative
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 Direction: Positive or negative
 Positive: as the explanatory variable increases, the response
variable also tends to increase
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 Direction: Positive or negative
 Positive: as the explanatory variable increases, the response
variable also tends to increase
 OR as the explanatory variable decreases, the response variable
also tends to decrease
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 Direction: Positive or negative
 Positive: as the explanatory variable increases, the response
variable also tends to increase
 OR as the explanatory variable decreases, the response variable
also tends to decrease
 Negative: as the explanatory variable increases, the response
variable tends to decrease.
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 Direction: Positive or negative
 Positive: as the independent variable increases, the dependent
variable also tends to increase
 Negative: as the independent variable increases, the dependent
variable
tends to decrease.
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 FORM: Linear or non linear
Linear
Non-linear
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 FORM: Linear or non linear
Linear
Non-linear
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 FORM: Linear or non linear
Linear
Non-linear
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 STRENGTH: Strong, Moderate, Weak (with r-value)
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 STRENGTH: Strong, Moderate, Weak (with r-value)
Strong Moderate
Weak
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 STRENGTH: Strong, Moderate, Weak (with r-value)
 Correlation coefficient (r-value): the measure of the strength and the
direction of the association.
 r-values are between -1 and 1 with 0 the weakest and 1 and -1 the
strongest.
 r-value has no units
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 STRENGTH: Strong, Moderate, Weak (with r-value)
 Correlation coefficient (r-value): the measure of the strength and
direction of the association.
 R values are between -1 and 1 with 0 the weakest and 1 and -1 the
strongest. R=-0.65 R=0.99
Describing a scatterplot

 Scatterplots are described by 3 things: Direction, Form and


Strength.
 STRENGTH: Strong, Moderate, Weak (with r-value)
 Correlation (r-value): the measure of the strength of the association.
 R values are between -1 and 1 with 0 the weakest and 1 and -1 the
R≈0.4
strongest. R≈0
Describing a scatterplot

 Give the direction, form and strength for each:


Describing a scatterplot

 Give the direction, form and strength for each:

Positive Positive Neither


Negative No direction
Linear Linear non-linear linear
No form
Strong Moderate Strong Strong
no correlation
Describing a scatterplot

 Estimate the r-value for each:

Positive Positive Neither


Negative No direction
Linear Linear non-linear linear
No form
Strong Moderate Strong Strong
no correlation
Describing a scatterplot

 Estimate the r-value for each:

Positive Positive Neither


Negative No direction
Linear Linear non-linear linear
No form
Strong Moderate Strong Strong
no correlation
Describing a scatterplot

 Describe the scatterplot.


Describing a scatterplot

 Describe the scatterplot.

There is a moderate, negative,


linear relationship between
median income and crime rate.
More on Correlation

 Correlation does NOT prove Causation!


 Examples:
More on Correlation

 Correlation does NOT prove Causation!


 Examples:
 There is a positive association between the amount of damage done
at a fire and the number of fire-fighters who report to a fire.
More on Correlation

 Correlation does NOT prove Causation!


 Examples:
 There is a positive association between the amount of damage done
at a fire and the number of fire-fighters who report to a fire.
 Does this mean that the firefighters are causing the damage?
More on Correlation

 Correlation does NOT prove Causation!


 Examples:
 There is a positive association between the amount of damage done
at a fire and the number of fire-fighters who report to a fire.
 Does this mean that the firefighters are causing the damage?
 No, it could be that bigger fires cause damage and also require
more firefighters.
More on Correlation

 Correlation does NOT prove Causation!


 Examples:
 There is a positive association between the amount of damage done
at a fire and the number of fire-fighters who report to a fire.
 Does this mean that the firefighters are causing the damage?
 No, it could be that bigger fires cause damage and also require
more firefighters. Therefore, the size of the fire is a confounding
variable.
More on Correlation

 Correlation does NOT prove Causation!


 Examples:
 There is a positive association between the number of AP Classes
students take in school and their GPA in college.
More on Correlation

 Correlation does NOT prove Causation!


 Examples:
 There is a positive association between the number of AP Classes
students take in school and their GPA in college.
 Does this mean that if more students are encouraged to take AP
Classes, they will do better in college?
More on Correlation

 Correlation does NOT prove Causation!


 Examples:
 There is a positive association between the number of AP Classes
students take in school and their GPA in college.
 Does this mean that if more students are encouraged to take AP
Classes, they will do better in college?
 No, it could be that there is something inherently different about
students who choose to take AP classes (such as high motivation)
which also causes them to want to succeed in college. Students
pressured to take AP may not necessarily do better in college.
More on Correlation

 Correlation does NOT prove Causation!


 Examples:
 There is a positive association between chocolate sales and car
accidents.
More on Correlation

 Correlation does NOT prove Causation!


 Examples:
 There is a positive association between chocolate sales and car
accidents.
 Does this mean that buying chocolate (or eating chocolate) is
causing car accidents?
More on Correlation

 Correlation does NOT prove Causation!


 Examples:
 There is a positive association between chocolate sales and car
accidents.
 Does this mean that buying chocolate (or eating chocolate) is
causing car accidents?
 NO, it could be that people are more likely to buy chocolate around
the holidays and because more people travel during the holidays,
there may be more accidents around that time.
Drawing a Scatterplot

Draw a scatterplot showing the relationship between fat and calories.


Drawing a Scatterplot

Draw a scatterplot showing the relationship between fat and


calories.
110

100

90

80
Calories

70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
Drawing a Scatterplot

Draw a scatterplot showing the relationship between fat and


calories.
110

100

90

80
Calories

70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
Drawing a Scatterplot

Draw a scatterplot showing the relationship between fat and


calories.
110

100

90

80
Calories

70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
Drawing a Scatterplot

Draw a scatterplot showing the relationship between fat and


calories.
110

100

90

80
Calories

70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
LSRL

 LSRL – “Least Squares Regression Line”


LSRL

 LSRL – “Least Squares Regression Line”


 y=mx+b
LSRL

 LSRL – “Least Squares Regression Line”


 y=mx+b
^𝑦 =𝑎+𝑏 𝑥
 Put data in L1 and L2
 Stat, Calc, 8: Linear Regression L1, L2
LSRL

 LSRL – “Least Squares Regression Line”


 y=mx+b
^𝑦 =𝑎+𝑏 𝑥
 Put data in L1 and L2
 Stat, Calc, 8: Linear Regression L1, L2
 If R and R2 do not show...
 Go to Catalogue (at the bottom), scroll until you find
DiagnosticOn, then hit Enter twice, the go back and do 8:LinReg,
L1, L2.
LSRL

 LSRL – “Least Squares Regression Line”


 y=mx+b
^𝑦 =𝑎+𝑏 𝑥
𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 ( 𝑓𝑎𝑡)
LSRL

 LSRL – “Least Squares Regression Line”


 y=mx+b
^𝑦 =𝑏+𝑚𝑥
𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 ( 𝑓𝑎𝑡)
22: Y-intercept
LSRL

 LSRL – “Least Squares Regression Line”


 y=mx+b
^𝑦 =𝑏+𝑚𝑥
𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 ( 𝑓𝑎𝑡)
22: Y-intercept – we predict a slice of cheese with 0 grams of fat would have 22
calories.
LSRL

 LSRL – “Least Squares Regression Line”


 y=mx+b
^𝑦 =𝑏+𝑚𝑥
𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 ( 𝑓𝑎𝑡)
22: Y-intercept – we predict a slice of cheese with 0 grams of fat would have 22
calories.
9.143: Slope
LSRL

 LSRL – “Least Squares Regression Line”


 y=mx+b
^𝑦 =𝑏+𝑚𝑥
𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 ( 𝑓𝑎𝑡)
22: Y-intercept – we predict a slice of cheese with 0 grams of fat would have 22
calories.
9.143: Slope –we predict the calories will increase by 9.143 for each 1 gram increase
of fat.
Describe the scatterplot

 Describe the scatterplot (include the r-value)


Describe the scatterplot

 Describe the scatterplot (include the r-value)


 There is a strong, positive, linear relationship between grams of fat
and calories in cheese slices with a correlation of 0.96.
Coefficient of Determination (r2)

 R2: The percent of variation in y that is explained by the linear model.


Coefficient of Determination (r2)

 R2: The percent of variation in y that is explained by the linear model.


 91.4% of the variation in calories can be explained by the linear
model.
Predictions

 Use the model to predict the calories in a slice of American cheese with
6 grams of fat.
Predictions

 Use the model to predict the calories in a slice of American cheese with
6 grams of fat.

𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 (6)


Predictions

 Use the model to predict the calories in a slice of American cheese with
6 grams of fat.

𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 (6)=76.858


Predictions

 Use the model to predict the calories in a slice of American cheese with
6 grams of fat.

𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 (6)=76.858


 Use the model to predict the calories in a slice of American cheese with
12 grams of fat.
Predictions

 Use the model to predict the calories in a slice of American cheese with
6 grams of fat.

𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 (6)=76.858


 Use the model to predict the calories in a slice of American cheese with
12 grams of fat.
𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 (12)=131.72
Predictions

 Use the model to predict the calories in a slice of American cheese with
6 grams of fat.

𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 (6)=76.858


 Use the model to predict the calories in a slice of American cheese with
12 grams of fat.
𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 (12)=131.72
 This second prediction is inaccurate because it is extrapolation.
Predictions

 Use the model to predict the calories in a slice of American cheese with
6 grams of fat.

𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 (6)=76.858


 Use the model to predict the calories in a slice of American cheese with
12 grams of fat.
𝑐𝑎𝑙 𝑜^ 𝑟𝑖𝑒𝑠=22+9.143 (12)=131.72
 This second prediction is inaccurate because it is extrapolation.
It is predicting calories based on a fat content beyond the
range of the fat content that was used to build the model.
Drawing the LSRL

 Plot 2 points 110

 Predicted value for 3 grams of fat = 100

90

80

Calories
70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
Drawing the LSRL

 Plot 2 points 110

 Predicted value for 3 grams of fat = 49.42 calories100


 Predicted value for 8 grams of fat = 90

80

Calories
70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
Drawing the LSRL

 Plot 2 points 110

 Predicted value for 3 grams of fat = 49.42 calories100


 Predicted value for 8 grams of fat = 95.14 calories 90
80

Calories
70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
Drawing the LSRL

 Plot 2 points 110

 Predicted value for 3 grams of fat = 49.42 calories100


 Predicted value for 8 grams of fat = 95.14 calories 90
80

Calories
70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
Drawing the LSRL

 Plot 2 points 110

 Predicted value for 3 grams of fat = 49.42 calories100


 Predicted value for 8 grams of fat = 95.14 calories 90
 Connect the two points 80

Calories
70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
Drawing the LSRL

 Plot 2 points 110

 Predicted value for 3 grams of fat = 49.42 calories100


 Predicted value for 8 grams of fat = 95.14 calories 90
 Connect the two points 80

Calories
70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
Regression

^
𝑐𝑟𝑖𝑚𝑒 − 𝑟𝑎𝑡𝑒=4027.383+327.226 ( 𝑝𝑜𝑣𝑒𝑟𝑡𝑦 )
Regression

 The relationship between % below


poverty and serious crime rate is
positive, moderate and linear.

^
𝑐𝑟𝑖𝑚𝑒 − 𝑟𝑎𝑡𝑒=4027.383+327.226 ( 𝑝𝑜𝑣𝑒𝑟𝑡𝑦 )
Regression

 The relationship between % below


poverty and serious crime rate is
positive, moderate and linear.
 Slope: As the below poverty increases
by 1%, we expect that the crime rate
will increase by 327.226 per 100,000.

^
𝑐𝑟𝑖𝑚𝑒 − 𝑟𝑎𝑡𝑒=4027.383+327.226 ( 𝑝𝑜𝑣𝑒𝑟𝑡𝑦 )
Regression

 Slope: As the below poverty increases


by 1%, we expect that the crime rate
will increase by 327.226 per 100,000. .
 Intercept: for an area with 0% below
poverty, we predict the crime rate will
be about 4027.383 per 100,000.

^
𝑐𝑟𝑖𝑚𝑒 − 𝑟𝑎𝑡𝑒=4027.383+327.226 ( 𝑝𝑜𝑣𝑒𝑟𝑡𝑦 )
Regression

 Slope: As the below poverty increases


by 1%, on average the crime rate will
increase by 327.226 per 100,000.
 Intercept: for an area with 0% below
poverty, we predict the crime rate will
be about 4027.383 per 100,000.
 R2: 54.76% of the variation in crime rate
can be explained by the linear model.

^
𝑐𝑟𝑖𝑚𝑒 − 𝑟𝑎𝑡𝑒=4027.383+327.226 ( 𝑝𝑜𝑣𝑒𝑟𝑡𝑦 )
Regression

 Slope: As the below poverty increases


by 1%, on average the crime rate will
increase by 327.226 per 100,000.
 Intercept: for an area with 0% below
poverty, we predict the crime rate will
be about 4027.383 per 100,000.
 R2: 54.76% of the variation in crime rate
can be explained by the linear model.

^
𝑐𝑟𝑖𝑚𝑒 − 𝑟𝑎𝑡𝑒=4027.383+327.226 ( 𝑝𝑜𝑣𝑒𝑟𝑡𝑦 )
Regression

 Slope: As the below poverty increases


by 1%, on average the crime rate will
increase by 327.226 per 100,000.
 Intercept: for an area with 0% below
poverty, we predict the crime rate will
be about 4027.383 per 100,000.
 R2: 54.76% of the variation in crime rate
can be explained by the linear model.

^
𝑐𝑟𝑖𝑚𝑒 − 𝑟𝑎𝑡𝑒=4027.383+327.226 ( 𝑝𝑜𝑣𝑒𝑟𝑡𝑦 )
Looking at Output

An important factor in the amount of


gasoline a car uses is the size of the
engine. Called “displacement”, engine
size measures the volume of the
cylinders in cubic inches. The regression
analysis is shown.
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
a. How many cars were included in
this analysis?
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
a. How many cars were included in
this analysis?
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
a. How many cars were included in
this analysis? 89
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
b. What is the correlation between
engine size and fuel economy?
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
b. What is the correlation between
engine size and fuel economy?
R2=0.609
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
b. What is the correlation between
engine size and fuel economy?
R2=0.609, r=√(0.609) = ±0.78
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
b. What is the correlation between
engine size and fuel economy?
R2=0.609, r=√(0.609) = ±0.78, because the
slope is negative, then the correlation must be
negative.
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
b. What is the correlation between
engine size and fuel economy?
R2=0.609, r=√(0.609) = ±0.78, because the
slope is negative, then the correlation must be
negative; therefore r = -0.78
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
c. Write the equation of the linear
model.
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
c. Write the equation of the linear
model.

Y-
intercept
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
c. Write the equation of the linear
model.

Y-
intercept

Slope
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
c. Write the equation of the linear
model.

Y-
intercept

Slope
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
c. Write the equation of the linear
model.

Y-
intercept

Slope
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
c. Write the equation of the linear
𝑚 𝑝^ 𝑔=34.98 −0.066 (𝑑𝑖𝑠𝑝𝑙𝑎𝑐𝑒𝑚𝑒𝑛𝑡)
model.

Y-
intercept

Slope
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
d. A car you are thinking of buying
is available with two different size
engines, 190 cubic inches or 240
cubic inches. How much difference
might this make in your gas
mileage? (Show work)
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is
shown.
d. A car you are thinking of buying
is available with two different size
engines, 190 cubic inches or 240
cubic inches. How much difference
might this make in your gas
mileage? (Show work)
240-190 = 50
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is shown.
d. A car you are thinking of buying is
available with two different size
engines, 190 cubic inches or 240
cubic inches. How much difference
might this make in your gas
mileage? (Show work)
240-190 = 50
50(-0.0662) =
Looking at Output

“displacement”, engine size measures


the volume of the cylinders in cubic
inches. The regression analysis is shown.
d. A car you are thinking of buying is
available with two different size
engines, 190 cubic inches or 240
cubic inches. How much difference
might this make in your gas
mileage? (Show work)
240-190 = 50
50(-0.0662) = -3.31
Looking at Output

“displacement”, engine size measures the


volume of the cylinders in cubic inches. The
regression analysis is shown.
d. A car you are thinking of buying is
available with two different size
engines, 190 cubic inches or 240 cubic
inches. How much difference might
this make in your gas mileage? (Show
work)
240-190 = 50
50(-0.0662) = -3.31
Increasing the displacement by 50 cubic
inches will decrease the gas mileage by
3.31 mpg.
Looking at Output

“displacement”, engine size measures the


volume of the cylinders in cubic inches. The
regression analysis is shown.
d. A car you are thinking of buying is
available with two different size engines,
190 cubic inches or 240 cubic inches. How
much difference might this make in your
gas mileage? (Show work)
240-190 = 50
50(-0.0662) = -3.31
Increasing the displacement by 50 cubic inches
will decrease the gas mileage by 3.31 mpg.
OR 34.9799-0.0662(240) = 19.09
34.9799-0.0662(190) = 22.4
22.4 – 19.1 = 3.31
Residuals

 The residual (error) is the actual – predicted


Residuals

 The residual (error) is the actual – predicted (or observed – expected)


Residuals

 The residual (error) is the actual – predicted (or observed – expected)


 A positive residual is a point that lies above the prediction (above the
linear model).
Residuals

 The residual (error) is the actual – predicted (or observed – expected)


 A positive residual is a point that lies above the prediction (above the
110
linear model).
100

90

80

Calories
70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
Residuals

 The residual (error) is the actual – predicted (or observed – expected)


Positive
 A positive residual is a point that lies above the prediction (above the residual
110
linear model).
100

90

80

Calories
70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
Residuals

 The residual (error) is the actual – predicted (or observed – expected)


Positive
 A positive residual is a point that lies above the prediction (above the residual
110
linear model).
100
 A negative residual is a point that is
90
overestimated.
80

Calories
70

60

50

40
2 3 4 5 6 7 8 9

Fat (grams)
Residuals

 The residual (error) is the actual – predicted (or observed – expected)


Positive
 A positive residual is a point that lies above the prediction (above the residual
110
linear model).
100
 A negative residual is a point that is
90
overestimated.
Use the data to find the value of the 80

Calories
residual for the cheese with 7 grams of fat.
70

Negative
60
residual
50

40
2 3 4 5 6 7 8 9

Fat (grams)
Residuals

 The residual (error) is the actual – predicted (or observed – expected)


Positive
 A positive residual is a point that lies above the prediction (above the residual
110
linear model).
100
 A negative residual is a point that is
90
overestimated.
Use the data to find the value of the 80

Calories
residual for the cheese with 7 grams of fat.
Observed = 80 70

Predicted = Negative
60
residual
50

40
2 3 4 5 6 7 8 9

Fat (grams)
Residuals

 The residual (error) is the actual – predicted (or observed – expected)


Positive
 A positive residual is a point that lies above the prediction (above the residual
110
linear model).
100
 A negative residual is a point that is
90
overestimated.
Use the data to find the value of the 80

Calories
residual for the cheese with 7 grams of fat.
Observed = 80 70

Predicted = 22+9.143(7) = Negative


60
residual
50

40
2 3 4 5 6 7 8 9

Fat (grams)
Residuals

 The residual (error) is the actual – predicted (or observed – expected)


Positive
 A positive residual is a point that lies above the prediction (above the residual
110
linear model).
100
 A negative residual is a point that is
90
overestimated.
Use the data to find the value of the 80

Calories
residual for the cheese with 7 grams of fat.
Observed = 80 70

Predicted = 22+9.143(7) = 86 calories Negative


60
residual
50

40
2 3 4 5 6 7 8 9

Fat (grams)
Residuals

 The residual (error) is the actual – predicted (or observed – expected)


Positive
 A positive residual is a point that lies above the prediction (above the residual
110
linear model).
100
 A negative residual is a point that is
90
overestimated.
Use the data to find the value of the 80

Calories
residual for the cheese with 7 grams of fat.
Observed = 80 70

Predicted = 22+9.143(7) = 86 calories Negative


60
80-86 = -6 calories residual
50

40
2 3 4 5 6 7 8 9

Fat (grams)
Residual Plots

 Go to Stat/Edit. Highlight L3 – then go to “list” (shift stat). Choose


resids. Hit Enter.
Residual Plots

 Go to Stat/Edit. Highlight L3 – then go to “list” (shift stat). Choose


resids. Hit Enter.
 Go to Stat plot. Choose Scatterplot with L1, L2. Graph
Residual Plots

 Go to Stat/Edit. Highlight L3 – then go to “list” (shift stat). Choose


resids. Hit Enter.
 Go to Stat plot. Choose Scatterplot with L1, L2. Graph
 Go to Stat plot. Choose Scatterplot with L1, L3. Graph
Residual Plots

 Go to Stat/Edit. Highlight L3 – then go to “list” (shift stat). Choose


resids. Hit Enter.
 Go to Stat plot. Choose Scatterplot with L1, L2. Graph
 Go to Stat plot. Choose Scatterplot with L1, L3. Graph
 To check linearity, the first graph (the scatterplot) should be linear.
Residual Plots

 Go to Stat/Edit. Highlight L3 – then go to “list” (shift stat). Choose resids. Hit


Enter.
 Go to Stat plot. Choose Scatterplot with L1, L2. Graph
 Go to Stat plot. Choose Scatterplot with L1, L3. Graph
 To check linearity, the first graph (the scatterplot) should be linear.
 But also, the second graph (the residuals) should be random and scattered.
Residual Plots
Residual Plots

Yes, because the scatterplot shows a linear


relationship and the residual plot shows
random scatter (no pattern).
Residual Plots

Write the equation of the LSRL


Residual Plots

Write the equation of the LSRL


^
𝑓𝑎𝑟𝑒=177.215 +0.0786 (𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒)
Outliers and Influential points

 Outliers lie away from the other data in the y-direction.


Outliers and Influential points

 Outliers lie away from the other data in the y-direction.


Outliers and Influential points

 Outliers lie away from the other data in the y-direction.


 Outliers have large residuals
 Outliers typically reduce the R-value.
 Outliers typically have small impact on the slope of the LSRL
Outliers and Influential points

 Outliers lie away from the other data in the y-direction.


 Outliers have large residuals
 Outliers typically reduce the R-value.
 Outliers typically have small impact on the slope of the LSRL
 Influential points lie away from the other data in the x-direction
Outliers and Influential points

 Outliers lie away from the other data in the y-direction.


 Outliers have large residuals
 Outliers typically reduce the R-value.
 Outliers typically have small impact on the slope of the LSRL
 Influential points lie away from the other data in the x-direction

Influential point
Outliers and Influential points

 Outliers lie away from the other data in the y-direction.


 Outliers have large residuals
 Outliers typically reduce the R-value.
 Outliers typically have small impact on the slope of the LSRL
 Influential points lie away from the other data in the x-direction
 Influential points may not have a large residuals, but
 Influential points have a drastic impact on the slope of the LSRL

Influential point
Outliers and Influential points
Outliers and Influential points

 Moderate, positive linear association


Outliers and Influential points

 Moderate, positive linear association

 Where would you place this point to


make the slope of the line negative?
Practice

 Do The Practice of Statistics 4th Edition Chapter 3 (See


schoolSpace)
 #R3.2, R3.4, T3.2, T3.5, T3.6, T3.7, T3.9
Confidence Intervals for Slope

 Calculate and interpret the 95% confidence interval for the slope of the
association between anxiety level and math test score.
Confidence Intervals for Slope

 Calculate and interpret the 95% confidence interval for the slope of the
association between anxiety level and math test score.
P: β1=the amount of change in math test score for
each increase of 1 in anxiety level.
Confidence Intervals for Slope

 Calculate and interpret the 95% confidence interval for the slope of the
association between anxiety level and math test score.
P: β1=the amount of change in math test score for
each increase of 1 in anxiety level.
A: Linear (Check scatterplot and residuals)
Independence (Check residuals for scatter)
Equal Variance (Check residuals for scatter)
Normalcy (Check histogram of residuals)
Confidence Intervals for Slope

 Calculate and interpret the 95% confidence interval for the slope of the
association between anxiety level and math test score.
P: β1=the amount of change in math test score for
each increase of 1 in anxiety level.
A: Linear (Check scatterplot and residuals)
Independence (Check residuals for scatter)
Equal Variance (Check residuals for scatter)
Normalcy (Check histogram of residuals)
N: t-interval for slope
Confidence Intervals for Slope

 Calculate and interpret the 95% confidence interval for the slope of the
association between anxiety level and math test score.
P: β1=the amount of change in math test score for
each increase of 1 in anxiety level.
A: Linear (Check scatterplot and residuals)
Independence (Check residuals for scatter)
Equal Variance (Check residuals for scatter)
Normalcy (Check histogram of residuals)
N: t-interval for slope
I: b1 ± t*(SEb)
Confidence Intervals for Slope

 Calculate and interpret the 95% confidence interval for the slope of the
association between anxiety level and math test score.
P: β1=the amount of change in math test score for
each increase of 1 in anxiety level.
A: Linear (Check scatterplot and residuals)
Independence (Check residuals for scatter)
Equal Variance (Check residuals for scatter)
Normalcy (Check histogram of residuals)
N: t-interval for slope
I: b1 ± t*(SEb) = -4.486 ±
Confidence Intervals for Slope

 Calculate and interpret the 95% confidence interval for the slope of the
association between anxiety level and math test score.
P: β1=the amount of change in math test score for
each increase of 1 in anxiety level.
A: Linear (Check scatterplot and residuals)
Independence (Check residuals for scatter)
Equal Variance (Check residuals for scatter)
Normalcy (Check histogram of residuals)
N: t-interval for slope
I: b1 ± t*(SEb) = -4.486 ± 2.074(1.551)
Confidence Intervals for Slope

 Calculate and interpret the 95% confidence interval for the slope of the
association between anxiety level and math test score.
P: β1=the amount of change in math test score for
each increase of 1 in anxiety level.
A: Linear (Check scatterplot and residuals)
Independence (Check residuals for scatter)
Equal Variance (Check residuals for scatter)
Normalcy (Check histogram of residuals)
N: t-interval for slope
I: b1 ± t*(SEb) = -4.486 ± 2.074(1.551) = (-7.7, -1.27)
Confidence Intervals for Slope

 Calculate and interpret the 95% confidence interval for the slope of the association
between anxiety level and math test score.
P: β1=the amount of change in math test score for
each increase of 1 in anxiety level.
A: Linear (Check scatterplot and residuals)
Independence (Check residuals for scatter)
Equal Variance (Check residuals for scatter)
Normalcy (Check histogram of residuals)
N: t-interval for slope
I: b1 ± t*(SEb) = -4.486 ± 2.074(1.551) = (-7.7, -1.27)
C: We are 95% confident that on average, the math test score will decrease between 7.7 and 1.27 points for
each increase of 1 in anxiety level.
T-tests for regression

 Is there evidence that there is a


relationship between math test scores
and anxiety level?
T-tests for regression

 Is there evidence that there is a


relationship between math test scores
and anxiety level?
H: Ho: β1 = 0 (there is no linear
association between math test score and
anxiety level)
Ha: β1 ≠ 0 (there is a linear association
between math test score and anxiety
level)
T-tests for regression

 Is there evidence that there is a


relationship between math test scores
and anxiety level?
H: Ho: β1 = 0 (there is no linear
association between math test score and
anxiety level)
Ha: β1 ≠ 0 (there is a linear association
between math test score and anxiety
level))
A: Look at graphs if data is provided.
T-tests for regression

 Is there evidence that there is a


relationship between math test scores
and anxiety level?
H: Ho: β1 = 0 (there is no linear
association between math test score and
anxiety level)
Ha: β1 ≠ 0 (there is a linear association
between math test score and anxiety
level)
A: Look at graphs if data is provided.
N: t-test for regression
T-tests for regression

 Is there evidence that there is a


relationship between math test scores
and anxiety level?
H: Ho: β1 = 0 (there is no linear association
between math test score and anxiety level)
Ha: β1 ≠ 0 (there is a linear association
between math test score and anxiety level)
A: Look at graphs if data is provided.
N: t-test for regression
T: t = -2.89
T-tests for regression

 Is there evidence that there is a relationship


between math test scores and anxiety
level?
H: Ho: β1 = 0 (there is no linear association
between math test score and anxiety level)
Ha: β1 ≠ 0 (there is a linear association
between math test score and anxiety level)
A: Look at graphs if data is provided.
N: t-test for regression
T: t = -2.89
O: p-value = 0.0084
T-tests for regression

 Is there evidence that there is a relationship


between math test scores and anxiety level?
H: Ho: β1 = 0 (there is no linear association
between math test score and anxiety level)
Ha: β1 ≠ 0 (there is a linear association
between math test score and anxiety level)
A: Look at graphs if data is provided.
N: t-test for regression
T: t = -2.89
O: p-value = 0.0084
M: because p-value < 0.05, we will reject Ho.
T-tests for regression

 Is there evidence that there is a relationship between


math test scores and anxiety level?
H: Ho: β1 = 0 (there is no linear association between math
test score and anxiety level)
Ha: β1 ≠ 0 (there is a linear association between math
test score and anxiety level)
A: Look at graphs if data is provided.
N: t-test for regression
T: t = -2.89
O: p-value = 0.0084
M: because p-value < 0.05, we will reject Ho.
S: We do have evidence that there is a relationship between anxiety
level and math test score.
Practice

 Do The Practice of Statistics 4th Edition Chapter 12 (See


schoolSpace)
 #1, 2, 3, 9, 13

You might also like