Correlation and Regression
Correlation and Regression
Linear Regression
Dr. Ramzan Tahir
Correlation
Or
• Both of these data sets have an r = 0.01, but they are very different. Plot 1 shows little
linear relationship / or no relationship between x and y variables. Plot 2 shows a
strong non-linear relationship.
• Pearson’s linear correlation coefficient only measures the strength and direction of a
linear relationship.
• Ignoring the scatterplot could result in a serious mistake when describing the
relationship between two variables.
Correlation to Regression
When you investigate the relationship between two variables, always begin
with a scatterplot. This graph allows you to look for patterns (both linear and
non-linear).
The next step is to quantitatively describe the strength and direction of the
linear relationship using “r”.
Once you have established that a linear relationship exists, you can take
the next step in model building.
simple linear regression
Once we have identified two variables that are correlated, we would like
to model this relationship. We want to use one variable as
a predictor or explanatory variable to explain the other variable,
the response or dependent variable. In order to do this, we need a good
relationship between our two variables. The model can then be used to
predict changes in our response variable. A strong relationship between
the predictor variable and the response variable leads to a good model.
where b0 is the y-intercept, b1 is the slope, x is the predictor variable, and ŷ an estimate of
the mean value of the response variable for any value of the predictor variable.
The y-intercept is the predicted value for the response (y) when x = 0. The slope describes
the change in y for each one unit change in x. Let’s look at this example to clarify the
interpretation of the slope and intercept.
Regression Equation
The criterion to determine the line that best describes the relation between
two variables is based on the residuals.
Residual = Observed – Predicted
Which gives the minimum residual
Residual: Linear regression