0% found this document useful (0 votes)
28 views6 pages

Lab 4 DONE New

The document discusses correlation and regression analysis using a dataset on lung function in children. It defines correlation and the coefficient of determination, and examines the linear relationship between age and height, and between height and lung function. Scatter plots and regression lines are used to analyze the data by sex. The analysis finds a positive linear correlation between the variables, with height explaining around 70-80% of variation in lung function and age explaining around 80% of variation in height.

Uploaded by

Luzzuvanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views6 pages

Lab 4 DONE New

The document discusses correlation and regression analysis using a dataset on lung function in children. It defines correlation and the coefficient of determination, and examines the linear relationship between age and height, and between height and lung function. Scatter plots and regression lines are used to analyze the data by sex. The analysis finds a positive linear correlation between the variables, with height explaining around 70-80% of variation in lung function and age explaining around 80% of variation in height.

Uploaded by

Luzzuvanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

LAB 04: CORRELATION AND REGRESSION

Introduction
Correlation
Correlation measures the linear association between two continuous variables. The explanatory variable
is x and the response variable is y. The values can fall between [-1, 1]. Where the closer the value is to |
1|, the stronger the linear relationship, and the sign (+. -) indicates if the relationship is positive: as x
increases y also increases, or negative: as x increases y decreases.

Correlation is very sensitive to outliers in a data set. Outliers that follow the trend of the data set (in
trend) tend to inflate the correlation (make it stronger). Outliers that are in the opposite trend from the
data set (out of trend) tend to deflate (weaken) the correlation.

Coefficient of Determination
The coefficient of determination is the r 2 value. This measures how much of the change in y is because
of the change in x. Interpretation is typically formatted as follows: The percent change in y is explained
by the change in x. Where the percentage, x, and y are replaced based on the problem’s context.

Regression
Linear regression is used for predicting y-values (response) for given x-values (explanatory). The
equation of the line is determined by fitting a line of best fit through the scatterplot of the data. The
prediction is only reliable if the data follow a linear trend (measured by the correlation)

Correlation is Not Causation:


Correlation is a mathematical principle. Correlation does not prove causation. Causation (cause and
effect of two variables) can only be proven through randomized controlled trials. Correlation can be
calculated for any two continuous variables. Just because a strong correlation exists, does not mean that
one variable causes the change in the other variable. . If the correlation was a perfect 1, then we would
know exactly how much y would change for the change in x (the equation of the line would tell us).

Lab
Using the Large.FEV Data set, download the Excel file “Large.FEV” from Blackboard (In the Lab 4 folder).

The forced expiratory volume (FEV, measured in liters) is a primary indicator of lung function and
corresponds to the volume of air that can forcibly be blown out in the first second after full inspiration.
The large.FEV data file contains the FEV values of a large sample of children, age, height, and some
categorical descriptors of each. We will use the data to study growth patterns in children.

Data Variables Description:


 Age (years)
 Forced Expiratory Volume (liters)
 Height (inches)
 Sex (male = 0, female = 1)
 Smoking status (no = 0, yes =1)

1
Correlation
Copy FEV data into the worksheet of Minitab.

o On the drop-down menu select:


 Stat > Basic Statistics > Correlation
o Variables: age height
o Click OK
Questions
1. What is the correlation between height and age? __0.792__________
2. What happens to the correlation if you change the x and y labels (swap height and age)?
__The graph looks different; however the correlation value is the same_____

Correlation of Age and Height by Sex – Scatter Plot


Plot the relationship between age in years (explanatory, x-axis) and height in inches (response, y-axis).
Create Scatter plot:

Option 1: Desktop App


o On the drop-down menu select:
 Graph > Scatterplot > With Regression and Groups >OK
o Response(Y): height
o Predictor (X): age
o Click: OK

Option 2: Web App


o On the drop-down menu select:
 Graph > Scatterplot > Simple
o Y-Variable: height
o X- Variable: age
o Select Option > check Symbols and Fit regression line
o Click: OK > OK

3. Paste the scatter plot below

2
Plot the relationship between age in years (explanatory, x-axis) and height in inches (response, y-axis) in
boys and girls, using the category of sex (where 0 = boys 1 = girls).

Option 1: Desktop App


o On the drop-down menu select:
 Graph > Scatterplot > With Regression and Groups >OK
o Response(Y): height
o Predictor (X): age
o Categorical Variables for Grouping: sex
o Click: OK

Option 2: Web App


o On the drop-down menu select:
 Graph > Scatterplot > Groups Overlaid
o Y-Variable: height
o X- Variable: age
o Group variables: sex
o Select Option > check Symbols and Fit regression line
o Click: OK > OK

4. Paste the scatter plot below:

3

5) Do the growth patterns look linear(based on the scatter plot)?

Yes, the growth patterns seem to have a linear growth for the most part, regardless of the sex, even
though sex 0 seems to have a steeper slope than sex 1. There seems to be a positive linear
correlation within the two.

6) Do the patterns look similar between boys and girls?


Yes, which was proven by our R value later in the lab, as we noticed that sex didn’t seem to play TOO
much of a role in the correlation between the height and age.

Regression Line
Plot the relationship between height in inches (explanatory, x-axis) and FEV( an indicator of lung
function that corresponds to the volume of air), using the category of sex (where 0 = boys 1 = girls).

Option 1: Desktop App


o On the drop-down menu select:
 Graph > Scatterplot > With Regression and Groups >OK
o Response(Y): FEV
o Predictor (X): Height
o Categorical Variables for Grouping: sex
o Click: OK

Option 2: Web App


o On the drop-down menu select:
 Graph > Scatterplot > Groups Overlaid
o Y-Variable: FEV
4
o X- Variable: height
o Group variables: sex
o Select Option > check Symbols and Fit regression line
o Click: OK > OK

7) Paste the scatter plot below:

8) What percent of the variation in height is explained by these models for the boys(sex = 0)?
(Hint: Hover over the regression line to see it)
77.9 %

9) What percent of the variation in height is explained by these models for the girls(sex =1)?
(Hint: Hover over the regression line to see it)
69.6%

Plot the relationship between height and FEV(an indicator of lung function that corresponds to the
volume of air), and obtain the least-squares regression equation.

o On the drop-down menu select:


 Stat > Regression > Fitted line Plot
o Response(Y): FEV
o Predictor (X): height
5
o Click: OK

10) Paste the regression plot below:


8) What are the least-squares regression equations for the fev and height?

A (fev) = 5.433, B (height)=0.1320

9) What percent of the variation in height is explained by these models?


The percent of the variation explained= 75.4%

10) What is the correlation between FEV and height? _0.8681

11) Use the least-squares equation to predict the FEV of a child who is 60 inches tall.
Y= -5.433 + 0.1320(60) = 2.487

You might also like