Correlation Regression Tutorial
Correlation Regression Tutorial
Regression
BIO323
SBASSE, LUMS
Correlation Coefficient,
R
• “R” is a measure of strength of the linear
association between two variables, x and y.
20 2.7
30 2.9
50 3.4
45 3.0
10 2.2
30 3.1
40 3.3
25 2.3
50 3.5
20 2.5
10 1.5
55 3.8
60 3.7
50 3.1
35 2.8
Scatter Diagram
• Scatter diagram is a graphical method to display the
relationship between two variables
3.5
2.5
1.5
0.5
0
0 10 20 30 40 50 60 70
Is there a linear
relationship between BMI
and BW?
• Scatter diagrams are important for initial
exploration of the relationship between two
quantitative variables
15
Least-squares or
regression line
• These vertical distances, i.e., the distance between y
values and their corresponding estimated values on
the line are called residuals
• The line which fits the best is called the regression line
or, sometimes, the least-squares line
18
Assumption # 2 — Linear and Additive
relationship
Relationship between the independent and dependent variables
must be linear.
Inefficient
models !!!!!
19
Linearity - The relationship between height and weight must be linear.
21
Assumption # 3 — Independence of errors
There should not be a relationship between
the residuals and X
23
24
Assumption # 5 — Equal Variances
The variance of the residuals is the same for all values of x
No pattern !!
25
Funnel
shape
26
The Least-Squares Line
27
BMI (Kg/m2) Birth-weight (Kg)
20 2.7
30 2.9
50 3.4
45 3.0
10 2.2
30 3.1
40 3.3
25 2.3
50 3.5
20 2.5
10 1.5
55 3.8
60 3.7
50 3.1
35 2.8
Estimated Regression Line
for BW Data
yˆ = ˆ + ˆ x = 1.775351 + 0.0330187 x
31
32
coefficient of determination
33
r2 as a measure of closeness-of-fit of the sample regression line to the sample observations.
34
35
We wish to know if we can conclude that the slope of the population regression line
describing the relationship between X and Y is zero.
Assumptions: We presume that the simple linear regression model and its underlying
assumptions are applicable.
Hypotheses:
When the assumptions are met and H0 is true, the test statistic is distributed as
Student’s t with n 2 degrees of freedom.
Decision rule:
Reject H0 if the computed value of t is either greater than or equal to 1.9826 or less
than or equal to - 1.9826
36
Calculation of statistic :
Conclusion: We conclude that the slope of the true regression line is not zero
37
Multiple regression model:
38
Logistic regression
39
Types of Logistic regression
40
The End!
41
Example of Computing
Regression Line
• Data: (1,2), (2,1), (4,3)