0% found this document useful (0 votes)
10 views15 pages

Correlation and Regression

Uploaded by

nitikesh31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views15 pages

Correlation and Regression

Uploaded by

nitikesh31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Correlation and Simple

Linear Regression
Dr. Ramzan Tahir
Correlation

 Correlation is defined as the statistical association between two variables.

 A relationship has no correlation when the points on a scatterplot do not


show any pattern.
 A relationship is non-linear when the points on a scatterplot follow a pattern
but not a straight line.
 A relationship is linear when the points on a scatterplot follow a somewhat
straight line pattern. This is the relationship that we will examine.
Types of linear relationship

 Linear relationships can be either positive or negative.


 Positive relationships have points that incline upwards to the right.
As x values increase, y values increase.
 Negative relationship: As x values decrease, y values decrease.
Types of RElationship
How to Measure Correlation
 Correlation Coefficient (r)

 Or

Also referred as Pearson’s Correlation Coefficient


The properties of “r”

 It is always between -1 and +1.


 It is a unitless measure so “r” would be the same value whether you
measured the two variables in pounds and inches or in grams and
centimeters.
 Positive values of “r” are associated with positive relationships.
 Negative values of “r” are associated with negative relationships.
Examples of Positive Correlation
Examples of Negative Correlation
Importance of Scatter plots

• Both of these data sets have an r = 0.01, but they are very different. Plot 1 shows little
linear relationship / or no relationship between x and y variables. Plot 2 shows a
strong non-linear relationship.
• Pearson’s linear correlation coefficient only measures the strength and direction of a
linear relationship.
• Ignoring the scatterplot could result in a serious mistake when describing the
relationship between two variables.
Correlation to Regression

 When you investigate the relationship between two variables, always begin
with a scatterplot. This graph allows you to look for patterns (both linear and
non-linear).
 The next step is to quantitatively describe the strength and direction of the
linear relationship using “r”.
 Once you have established that a linear relationship exists, you can take
the next step in model building.
simple linear regression

 Once we have identified two variables that are correlated, we would like
to model this relationship. We want to use one variable as
a predictor or explanatory variable to explain the other variable,
the response or dependent variable. In order to do this, we need a good
relationship between our two variables. The model can then be used to
predict changes in our response variable. A strong relationship between
the predictor variable and the response variable leads to a good model.

 A simple linear regression model is a mathematical equation that allows us


to predict a response for a given predictor value.
Linear Regression Equation

where b0 is the y-intercept, b1 is the slope, x is the predictor variable, and ŷ an estimate of
the mean value of the response variable for any value of the predictor variable.
The y-intercept is the predicted value for the response (y) when x = 0. The slope describes
the change in y for each one unit change in x. Let’s look at this example to clarify the
interpretation of the slope and intercept.
Regression Equation

There is a relationship between correlation coefficient and regression coefficient


Criteria for best fit line

 The criterion to determine the line that best describes the relation between
two variables is based on the residuals.
Residual = Observed – Predicted
 Which gives the minimum residual
Residual: Linear regression

You might also like