0% found this document useful (0 votes)

6 views56 pages

Chapter 14

Chapter 14 covers correlation and regression, focusing on the relationship between a quantitative response variable and an explanatory variable using historical public health data. It discusses scatterplots, correlation coefficients, and the interpretation of correlation strength, as well as the methodology for regression analysis, including the calculation of slope and intercept. The chapter emphasizes the importance of conditions for inference and the potential pitfalls of misinterpreting correlation as causation.

Uploaded by

diego940306

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views56 pages

Chapter 14

Uploaded by

diego940306

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 56

Chapter 14

Correlation and Regression

•Jul 15, 2025 •2

In Chapter 14
14.1 Data
14.2 Scatterplots
14.3 Correlation
14.4 Regression

•3
14.1 Data
• Quantitative response variable Y (“dependent
variable”)
• Quantitative explanatory variable X (“independent
variable”)
• Historically important public health data set used to
illustrate techniques (Doll, 1955)
– n = 11 countries
– Explanatory variable = per capita cigarette consumption in
1930 (CIG1930)
– Response variable = lung cancer mortality per 100,000
(LUNGCA)
•4
Table 14.2 Data used for chapter illustrations. Per capita
cigarette consumption in 1930 (cig1930) and lung cancer
cases per 100,000 in 1950 (lungca) in 11 countries.

Data from Doll,

R. (1955).
Etiology of lung
cancer.
Advances
Cancer
Research, 3,
1–50. Data
stored online in
the file doll-
ecol.sav.

•5
Figure 14.1 Scatterplot of Doll’s illustration of the correlation
between smoking and lung cancer rates. (Data listed in Table
14.2.) The data point for the United States is highlighted.
Inspect scatterplots
• Form: Can the relation be described with a
straight or some other type of line?
• Direction: Do points tend trend upward or
downward?
• Strength of association: Do point adhere
closely to an imaginary trend line?
• Outliers (in any): Are there any striking
deviations from the overall pattern?
•7
Judging Correlational Strength
• Correlational strength refers to the degree to
which points adhere to a trend line
• The eye is not a good judge of strength.
• The top plot appears to show a weaker
correlation than the bottom plot. However,
these are plots of the same data sets. (The
perception of a difference is an artifact of axes
scaling.)

•8
Figure 14.2
Scatterplots of the
same data with
different axis
scalings. It is difficult
to determine
correlational
strength visually.
§14.3. Correlation
• Correlation coefficient r quantifies linear relationship
with a number between −1 and 1.
• When all points fall on a line with an upward slope, r
= 1. When all data points fall on a line with a
downward slope, r = −1
• When data points trend upward, r is positive; when
data points trend downward, r is negative.
• The closer r is to 1 or −1, the stronger the
correlation.

•10
Figure 14.3
Examples of
different
correlations.

•11
Calculating r
• Formula

Correlation coefficient tracks the degree to which X

and Y “go together.”
• Recall that z scores quantify the amount a value lies
above or below its mean in standard deviations units.
• When z scores for X and Y track in the same
direction, their products are positive and r is positive
(and vice versa).
•12
Table 14.3 Calculation of correlation
coefficient r, illustrative data.

•13
Calculating r
• In practice, we rely on computers and
calculators to calculate r.
– SPSS
– Scientific and graphing calculators
• I encourage my students to use these tools
whenever possible.

•14
Calculating r
• SPSS output for Analyze > Correlate > Bivariate
using the illustrative data:

•15
Interpretation of r
1. Direction. The sign of r indicates the direction of the
association: positive (r > 0), negative (r < 0), or no
association (r ≈ 0).
2. Strength. The closer r is to 1 or −1, the stronger the
association.
3. Coefficient of determination. The square of the
correlation coefficient (r2) is called the coefficient of
determination. This statistic quantifies the proportion
of the variance in Y [mathematically] “explained” by
X. For the illustrative data, r = 0.737 and r2 = 0.54.
Therefore, 54% of the variance in Y is explained by X.

•16
Notes, cont.

4. Reversible relationship. With correlation, it does

not matter whether variable X or Y is specified as
the explanatory variable; calculations come out the
same either way. [This will not be true for
regression.]
5. Outliers. Outliers can have a profound effect on r.
This figure has an r of 0.82 that is fully accounted
for by the single outlier (see next slide).

•17
Figure 14.4 The calculated correlation for this
data set is r 0.82. The single influential
observation in the upper-right quadrant
accounts for this large r.
Notes, cont.
6. Linear
relations only.
Correlation
applies only to
linear
relationships
This figure
shows a strong
non-linear
relationship, yet
r = 0.00.
•19
Notes, cont.
7. Correlation does not necessarily mean causation.
Beware lurking variables.
• A near perfect negative correlation (r = −.987) was
seen between cholera mortality and elevation
above sea level during a 19th century epidemic
• We now know that cholera is transmitted by water.
• The observed relationship between cholera and
elevation was confounded by the lurking variable
proximity to polluted water.
• See next slide

•20
Figure 14.6 Cholera mortality and elevation above sea
level were strongly correlated in the 1850s (r 5 20.987),
but this correlation was an artifact of confounding by
the extraneous factor of “water source.”

•21
Hypothesis Test
• Random selection from a random scatter can result
in an apparent correlation
• We conduct the hypothesis test to guard against
identifying too many random correlations.

•22
Hypothesis Test
A. Hypotheses. Let ρ represent the population correlation
coefficient.
H0: ρ = 0 vs. Ha: ρ ≠ 0 (two-sided)
[or Ha: ρ > 0 (right-sided) or Ha: ρ < 0 (left-sided)]
B. Test statistic
r 1 r 2
tstat  where SEr 
SEr n 2
df n  2
C. P-value. Convert tstat to P-value with software or Table C.

•23
Hypothesis Test – Illustrative Example
A. H0: ρ = 0 vs. Ha: ρ ≠ 0 (two-sided)
B. Test stat
1  0.737 2
SE r  0.2253
11  2
0.737
tstat  3.27
0.2253
df 11  2 9

C. 005 < P < .01 by Table C. P = .0097 by computer. The

evidence against H0 is highly significant.
•24
Confidence Interval for ρ

•25
Illustrative Example

•26
Conditions for Inference
• Independent observations
• Bivariate Normality (r can still be used
descriptively when data are not bivariate
Normal)

•27
Figure 14.8 Bivariate Normality
§14.4. Regression
• Regression describes the relationship in the
data with a line that predicts the average
change in Y per unit X.
• The best fitting line is found by minimizing the
sum of squared residuals, as shown in this
figure.

•29
Figure 14.9 Fitted regression line and residuals,
smoking and lung cancer illustrative data

•30
Regression Line, cont.
• The regression line equation is:

where ŷ ≡ predicted value of Y,

a ≡ the intercept of the line, and
b ≡ the slope of the line
• Equations to calculate a and b
SLOPE:

INTERCEPT:
•31
Figure 14.10 Components of a
regression model
Slope b is the key statistic produced by the regression
Regression Line, illustrative example

Here’s the output from SPSS:

•33
Inference
• Let α represent the population intercept, β
represent population slope, and εi represent the
residual “error” for point i.
• The population regression model is

• The estimated standard error of the regression is

•34
Inference
• A (1−α)100% CI for population slope β is

•35
Confidence Interval for β–Example

95% Confidence Interval for B

Model Lower Bound Upper Bound
1 (Constant) -4.342 17.854
cig1930 .007 .039
•36
t Test of Slope Coefficient
A. Hypotheses. H0: β = 0 against Ha: β ≠ 0
B. Test statistic.
b sY | x
tstat  where SEb 
SEb n  1 s X
df n  2

C. P-value. Convert the tstat to a P-value

•37
t Test: Illustrative Example

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 6.756 4.906 1.377 .202
cig1930 .023 .007 .737 3.275 .010

•38
Analysis of Variance of the
Regression Model
• An ANOVA technique equivalent to the t test
can also be used to test H0: β = 0.
• This technique is covered on pp. 321 – 324 in
the text but is not included in this
presentation.

•39
Conditions for Inference
• Inference about the regression line requires
these conditions
– Linearity
– Independent observations
– Normality at each level of X
– Equal variance at each level of X

•40
Figure 14.12 Population regression model showing
Normality and homoscedasticity conditions

•41
Assessing Conditions
• The scatterplot should be visually inspected
for linearity, Normality, and equal variance
• Plotting the residuals from the model can be
helpful in this regard.
• The table on the next slide lists residuals for
the illustrative data

•42
Table 14.7 Residuals in smoking and lung cancer
illustrate data set
Assessing Conditions, cont.
• A stemplot of the
residuals show no
major departures from |-1|6
Normality |-0|2336
• A residual plot shows | 0|01366
more variability at | 1|4
higher X values (but x10
the data is very sparse)
• See next slide

•44
Figure 14.15 Residual plot for
illustrative data set
Residual Plots
• With a little experience, you can get good at
reading residual plots.
• On the next three slides, see:
A. An example of linearity with equal variance
B. An example of linearity with unequal variance
C. An example of non-linearity with equal variance

•46
Figure 14.16 Residual plot demonstrating (A)
linearity with equal variance
Figure 14.16 Residual plot demonstrating
(B) linearity with unequal variance
Figure 14.16 Residual plot
demonstrating (C) nonlinearity.
Dependency and
Independency of Data

•50
Validity Issue

• Question: Is there a positive

association between X and Y?

•51
Your Answer Is …..
• It is so clear that there is a
positive association between X
and Y.
• Wait….. Let’s make sure whether
the data are independent or
dependent (i.e., with repeated
measurement?)

•52
• Every two points connected by the
same red line are measurements
observed from the same person.
A1

A2
It is so clear
that “Y” will B1

decrease as C1
B2

“X” increases C2

for nearly all

study subjects.

•53
Failure to
consider
“dependence” of
data may result
in biased study
findings.

•54
Test for “Normality”
• Examining the residuals (or
standardized residuals), help detect
violations of the required conditions.
• Nonnormality
– Use Excel to obtain the standardized
residual histogram
– Examine the histogram and look for a
bell shaped. diagram with a mean close
to zero
•55
Cont.
Standardized residuals

40
30
20
10
0
-2 -1 0 1 2 More

t seems the residual are normally distributed

with mean zero
•56

Panchakavyam Method of Preparation
No ratings yet
Panchakavyam Method of Preparation
1 page
Topic03 Correlation Regression
No ratings yet
Topic03 Correlation Regression
81 pages
Correlation and Regression
No ratings yet
Correlation and Regression
33 pages
Chapter 4 - Notes
No ratings yet
Chapter 4 - Notes
58 pages
Correlation
100% (1)
Correlation
29 pages
1 - Simple Linear Regression
No ratings yet
1 - Simple Linear Regression
43 pages
STAB27
No ratings yet
STAB27
51 pages
Chapter - 5 - Correlation and Regression
100% (1)
Chapter - 5 - Correlation and Regression
70 pages
Biostat Lecture Note 3
No ratings yet
Biostat Lecture Note 3
5 pages
Lect W4m08ab f2023
No ratings yet
Lect W4m08ab f2023
8 pages
Gerstman PP14
No ratings yet
Gerstman PP14
6 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
77 pages
Chapter4 - Part 2
No ratings yet
Chapter4 - Part 2
37 pages
Chapter 5 - Regression
No ratings yet
Chapter 5 - Regression
7 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
DAM Class 21-24 Regression Analysis
No ratings yet
DAM Class 21-24 Regression Analysis
93 pages
Correlation Simple Regression
No ratings yet
Correlation Simple Regression
26 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
Stats10 - Chapter+4 2
No ratings yet
Stats10 - Chapter+4 2
54 pages
Correlation New
No ratings yet
Correlation New
37 pages
User Manual Top Drive JH
100% (4)
User Manual Top Drive JH
165 pages
Corr - Regression Analysis
No ratings yet
Corr - Regression Analysis
19 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
KSuite List of Protocols Full
No ratings yet
KSuite List of Protocols Full
459 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Corelation and Regression
No ratings yet
Corelation and Regression
137 pages
Correlation
No ratings yet
Correlation
33 pages
ES12005 Lecture 2.5 2024-25
No ratings yet
ES12005 Lecture 2.5 2024-25
75 pages
Brochure DE NORA TETRA ABF 650 0316
No ratings yet
Brochure DE NORA TETRA ABF 650 0316
4 pages
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
No ratings yet
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
43 pages
Regression and Correlation Notes
No ratings yet
Regression and Correlation Notes
28 pages
Lecture Week 12 - Intro To Regression
No ratings yet
Lecture Week 12 - Intro To Regression
5 pages
Part 2 Exploring Relationships Among Variables
No ratings yet
Part 2 Exploring Relationships Among Variables
8 pages
Lecture 4 - Correlation and Regression
No ratings yet
Lecture 4 - Correlation and Regression
35 pages
Chapter-23 Bivariate Statistical Analysis: Measurement of Association
No ratings yet
Chapter-23 Bivariate Statistical Analysis: Measurement of Association
30 pages
STAR Rando Questions Stats
No ratings yet
STAR Rando Questions Stats
14 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
46 pages
Correlation - Linear - Logistic Regression
No ratings yet
Correlation - Linear - Logistic Regression
123 pages
Stats 4
No ratings yet
Stats 4
23 pages
Week 13
No ratings yet
Week 13
25 pages
Two Quantitative Variables: Scatterplot, Correlation, and Linear Regression
No ratings yet
Two Quantitative Variables: Scatterplot, Correlation, and Linear Regression
17 pages
Correlation
No ratings yet
Correlation
82 pages
Mda-Session-7 Simple Linear Regression
No ratings yet
Mda-Session-7 Simple Linear Regression
75 pages
Correlation and Regression
100% (1)
Correlation and Regression
20 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Correlation & Regression
No ratings yet
Correlation & Regression
20 pages
Module 3 - Data Analysis - S RM
No ratings yet
Module 3 - Data Analysis - S RM
63 pages
Lecture8 4
No ratings yet
Lecture8 4
29 pages
Introduction To Rubber Final
No ratings yet
Introduction To Rubber Final
4 pages
Lecture 8 and 9 Regression Correlation and Index
No ratings yet
Lecture 8 and 9 Regression Correlation and Index
32 pages
Previous Week
No ratings yet
Previous Week
51 pages
Clase 2
No ratings yet
Clase 2
48 pages
Applied Statistics II Chapter 7 The Relationship Between Two Variables
No ratings yet
Applied Statistics II Chapter 7 The Relationship Between Two Variables
73 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Bivariate EDA and Regression Analysis
No ratings yet
Bivariate EDA and Regression Analysis
61 pages
CS, LAS N SS Steel Requirements Final
No ratings yet
CS, LAS N SS Steel Requirements Final
11 pages
JIAP April 2016 - To Splint or Not To Splint - The Current Status of Periodontal Splinting
No ratings yet
JIAP April 2016 - To Splint or Not To Splint - The Current Status of Periodontal Splinting
12 pages
Bivariate Linear Regression
No ratings yet
Bivariate Linear Regression
33 pages
Correlation and Regression
No ratings yet
Correlation and Regression
9 pages
Delivery System Thesis
100% (3)
Delivery System Thesis
8 pages
Epoxi Resins Reactive Diluyents EPOSIR EPONAC
No ratings yet
Epoxi Resins Reactive Diluyents EPOSIR EPONAC
8 pages
Chapter12 Stats
No ratings yet
Chapter12 Stats
6 pages
Chronic Obstructive Pulmonary Disease: A Case Presentation On
100% (2)
Chronic Obstructive Pulmonary Disease: A Case Presentation On
95 pages
Sigmund Freud: Powerpoint Presentation by Bettyann Zevallos
No ratings yet
Sigmund Freud: Powerpoint Presentation by Bettyann Zevallos
14 pages
Sikagard EPS
0% (1)
Sikagard EPS
18 pages
Working With Relationships Between Two Variables - Size of Teaching Tip & Stats Test Score
No ratings yet
Working With Relationships Between Two Variables - Size of Teaching Tip & Stats Test Score
20 pages
Climate Change Conference Brochure-2025-KU
No ratings yet
Climate Change Conference Brochure-2025-KU
2 pages
Basf Masterseal Cr195 Tds
No ratings yet
Basf Masterseal Cr195 Tds
2 pages
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
BURNS Concept Map
100% (2)
BURNS Concept Map
1 page
Hotels Sector Updt Jul25
No ratings yet
Hotels Sector Updt Jul25
5 pages
Gas Comp Lecture Examples Online PDF
No ratings yet
Gas Comp Lecture Examples Online PDF
2 pages
Forty Years Among The Zulu (Josiah Tyler)
No ratings yet
Forty Years Among The Zulu (Josiah Tyler)
346 pages
Re50111 2006-12
No ratings yet
Re50111 2006-12
24 pages
Acids and Bases Webquest
No ratings yet
Acids and Bases Webquest
2 pages
The Human Body
No ratings yet
The Human Body
3 pages
Scenario Training
No ratings yet
Scenario Training
4 pages
CM 1
No ratings yet
CM 1
3 pages
Complication of Condyle Fracture
No ratings yet
Complication of Condyle Fracture
3 pages
Newsela - Breaking News - War Between Israel and Hamas
No ratings yet
Newsela - Breaking News - War Between Israel and Hamas
3 pages
Determinants of VO2max Decline With Aging
No ratings yet
Determinants of VO2max Decline With Aging
11 pages
Hotel Voucher 5
No ratings yet
Hotel Voucher 5
2 pages
Speck Chold Chain
No ratings yet
Speck Chold Chain
2 pages
Milan Župunski, Milan Borišev, Saša Orlović, Slobodanka Pajević, Nataša Nikolić, Andrej Pilipović, Rita Horak
No ratings yet
Milan Župunski, Milan Borišev, Saša Orlović, Slobodanka Pajević, Nataša Nikolić, Andrej Pilipović, Rita Horak
1 page
Biology 11 and 12: Cell Modifications and Specialization
No ratings yet
Biology 11 and 12: Cell Modifications and Specialization
2 pages

Chapter 14

Uploaded by

Chapter 14

Uploaded by

Chapter 14

Correlation and Regression

•Jul 15, 2025 •2

Data from Doll,

Correlation coefficient tracks the degree to which X

4. Reversible relationship. With correlation, it does

C. 005 < P < .01 by Table C. P = .0097 by computer. The

where ŷ ≡ predicted value of Y,

Here’s the output from SPSS:

• The estimated standard error of the regression is

95% Confidence Interval for B

C. P-value. Convert the tstat to a P-value

• Question: Is there a positive

for nearly all

t seems the residual are normally distributed

You might also like