0% found this document useful (0 votes)
7 views58 pages

S1.4 Correlation and regression (3)

The document covers correlation and regression in statistics, focusing on scatter graphs, types of correlation, and the product-moment correlation coefficient. It illustrates how to interpret relationships between variables, the importance of distinguishing correlation from causation, and how to calculate correlation coefficients. Additionally, it discusses the effects of coding on correlation values and provides examples for practical understanding.

Uploaded by

stefanalbert2302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views58 pages

S1.4 Correlation and regression (3)

The document covers correlation and regression in statistics, focusing on scatter graphs, types of correlation, and the product-moment correlation coefficient. It illustrates how to interpret relationships between variables, the importance of distinguishing correlation from causation, and how to calculate correlation coefficients. Additionally, it discusses the effects of coding on correlation values and provides examples for practical understanding.

Uploaded by

stefanalbert2302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 58

AS-Level Maths:

Statistics 1
for Edexcel

S1.4 Correlation
and regression

These icons indicate that teacher’s notes or useful web addresses are available in the Notes Page.

This icon indicates the slide contains activities created in Flash. These activities are not editable.
For more detailed instructions, see the Getting Started presentation.
11 of
of 58
58 © Boardworks Ltd 2005
Scatter graphs

Scatter graphs, types of correlation and


Contents

lines of best fit


Product–moment correlation coefficients
The effects of coding on correlation
Regression

22 of
of 58
58 © Boardworks Ltd 2005
Correlation

There are many situations where people wish to find out


whether two (or more) variables are related to each other.
Here are some examples:
Is systolic blood pressure related to age?
Is the life expectancy of people in a country related
to how wealthy the country is?
Are A-level results related to the number of hours
students spend undertaking part-time work?
Is an athlete’s leg length related to the time in which
they can run 100m?
Correlation is a measure of relationship – the
stronger the correlation, the more closely related
the variables are likely to be.
3 of 58 © Boardworks Ltd 2005
Scatter graphs

Scatter graphs are a useful visual way of judging whether a


relationship appears to exist between two variables.

Example: The City Latitude Mean Jan. temp. (°C)


table shows the Belgrade 45 1
latitude and Bangkok 14 32
mean January Cairo 30 14
temperature (°C) Dublin 50 3
for a sample of Havana 23 22
10 cities in the Kuala Lumpur 3 27
northern Madrid 40 5
hemisphere. New York 41 0
Reykjavik 30 –1
Tokyo 36 5

4 of 58 © Boardworks Ltd 2005


Scatter graphs

The data in the table can be presented in a scatter graph:


Scatter graph showing how January
temperatures change with latitude
35
30
Temperature (ºC)

25
20
15
10
5
0
-5 0 10 20 30 40 50 60
Latitude

This shows that mean January temperature tends to


decrease as the latitude of the city increases. We say that
the variables are negatively correlated.

5 of 58 © Boardworks Ltd 2005


Scatter graphs

In this example, a city’s temperature is likely to be dependent


upon its latitude – not the other way around. Temperature
cannot affect a city’s latitude.
The latitude is called the independent (or explanatory)
variable. The temperature is called the dependent
(or response) variable.
When plotting scatter Scatter graph showing how January temperatures
change with latitude

graphs, the convention is 35


30
to always plot the
Temperature (ºC)
25
20
independent variable on 15
10
the horizontal axis and the 5
0
dependent variable on the -5 0 10 20 30
Latitude
40 50 60

vertical axis.

6 of 58 © Boardworks Ltd 2005


Scatter graphs

7 of 58 © Boardworks Ltd 2005


Correlation

The type of correlation existing between two variables can be


described in terms of the gradient of the slope formed by the
points, and how close the points lie to a straight line.

Strong positive correlation –


the points lie close to a straight
line with positive gradient.

Weak positive correlation –


the points are more scattered
but follow a general upward
trend.

8 of 58 © Boardworks Ltd 2005


Correlation

30
25
20
15
Strong negative correlation –
y

10 the points lie close to a straight


5
0
line with negative gradient.
0 2 4 6 8 10 12
x

30
25
20
15 Weak negative correlation –
y

10 the points are more scattered


5
0
but follow a general
0 2 4 6 8 10 12
downward trend.
x

9 of 58 © Boardworks Ltd 2005


Correlation

25

20 No correlation –
15 the points are scattered
across the graph area
y

10

5 indicating no relationship
0 between the variables.
0 2 4 6 8 10 12
x

10 of 58 © Boardworks Ltd 2005


Correlation vs. causation

The following diagram illustrates why it is important to interpret


scatter diagrams with caution.

The diagram shows life expectancy at birth plotted against


annual cigarette consumption for a sample of 9 countries.

11 of 58 © Boardworks Ltd 2005


Correlation vs. causation

The diagram shows a positive correlation between cigarette


consumption and life expectancy. However, it would be wrong
to conclude that consuming more cigarettes causes people to
live longer.

This type of correlation is sometimes referred to as


nonsense correlation.

The relationship can be explained because both life expectancy


and cigarette consumption for a country are correlated with a
third variable – the wealth of the country.

12 of 58 © Boardworks Ltd 2005


Lines of best fit
When a linear relationship exists between two variables,
a line of best fit can be drawn on the scatter graph.

13 of 58 © Boardworks Ltd 2005


Lines of best fit

It can be shown that a line of best fit always passes through the
mean point, ( x , y ) .
Example: A line of best fit can be added to the scatter
graph showing mean January temperatures and latitude.

mean point

14 of 58 © Boardworks Ltd 2005


Lines of best fit

The line of best fit can be used to make predictions.


For example, Los Angeles has a latitude of 34°N. The line of
best fit suggests that Los Angeles should have a January
temperature of about 9°C.
The actual mean temperature for Los Angeles is 13°C.

15 of 58 © Boardworks Ltd 2005


Product–moment correlation coefficient

Scatter graphs, types of correlation and


Contents

lines of best fit


Product–moment correlation coefficient
The effects of coding on correlation
Regression

16
16 of
of 58
58 © Boardworks Ltd 2005
Product–moment correlation coefficient

The product–moment correlation coefficient (r) gives a


numerical measure of the strength of the linear association
between two variables.
This means that it measures how close the points on a scatter
graph lie to a straight line.

17 of 58 © Boardworks Ltd 2005


Product–moment correlation coefficient

The product–moment correlation coefficient works so that:


100

–1 ≤ r ≤ 1 80

60

• r = 1 indicates perfect linear 40

20

positive correlation; 0
0 1 2 3 4

100

• r = –1 indicates perfect linear 80

negative correlation;
60

40

20

• r = 0 indicates that there is 0 1 2 3 4

absolutely no linear 100

correlation between the


80

60

variables. 40

20

0
0 1 2 3 4

18 of 58 © Boardworks Ltd 2005


Product–moment correlation coefficient

The product–moment correlation coefficient for n pairs of


observations ( x1, y1 ), ..., ( xn , yn ) is obtained using the formula:
S xy
r
S xx S yy

where: S xy  ( xi  x )( yi  y )   xi yi  x y
i i

 x
2

S xx  ( xi  x )   x
2 2

 i
Usually, the
i
n second version
of each formula
 y
2

S yy  ( yi  y )   y
2 2

 i
is used.
i
n
19 of 58 © Boardworks Ltd 2005
Product–moment correlation coefficient

Example: The Species Body mass (kg) Brain mass (g)


table shows the Baboon 11 180
average body Cat 3.3 30
mass and brain Fox 4.2 50
mass of 6 species Mouse 0.02 0.4
of animal. Monkey 10 120
Rabbit 2.5 12

a) Draw a scatter graph showing brain mass plotted against


body mass.
b) Describe the relationship that exists between the two
variables.
c) Calculate the product–moment correlation coefficient.

20 of 58 © Boardworks Ltd 2005


Product–moment correlation coefficient

a) Scatter graph comparing body mass and brain


mass

200
Brain size (kg)

150

100

50

0
0 2 4 6 8 10 12
Body m ass (kg)

b) The scatter graph shows strong positive correlation,


meaning that the size of an animal’s brain tends to
increase as its body mass increases.

21 of 58 © Boardworks Ltd 2005


Product–moment correlation coefficient

c) Species Body mass (kg) Brain mass (g)


Baboon 11 180
Cat 3.3 30
Fox 4.2 50
Mouse 0.02 0.4
Monkey 10 120
Rabbit 2.5 12

 x 11 3.3  ...  2.5 31.02


 y 180  30  ...  12 392.4
 x 11  3.3  ...  2.5 255.78
2 2 2 2

 y 180  30  ...  12 50 344


2 2 2 2

 xy (11180)  ...  (2.5 12) 3519


22 of 58 © Boardworks Ltd 2005
Product–moment correlation coefficient

So: S xy  xi yi 
x y i i
3519 
31.02 392.4
1490
n 6
 x
2

S xx  xi2 
 i
 255.78 
31.022
95.41
n 6
 y
2

S yy  yi2 
 i
50 344 
392.42
24 681
n 6
S xy 1490
Therefore: r   0.971
S xx S yy 95.4124 681

Note: The product–moment correlation coefficient


can be found using built-in calculator functions.

23 of 58 © Boardworks Ltd 2005


Product-moment correlation coefficient

Examination style question: A researcher believes there


is a relationship between a country’s annual income per
head (x, in $1000) and the per capita carbon dioxide
emissions (c, tonnes). He collects data from a random
sample of 10 countries and records the following results:
 x 96.5  c 54
 x 2156.9  c 383.54
2 2

 xc 619.6
Calculate the value of the product–moment correlation
coefficient and comment on the implications of your
answer.

24 of 58 © Boardworks Ltd 2005


Product-moment correlation coefficient

So: S xc  xi ci 
 x c
i i
619.6 
96.5 54
98.5
n 10
 x
2

S xx  xi2 
 i
2156.9 
96.52
1225.675
n 10
 ci 
2
2
54
Scc  ci2  383.54  91.94
n 10
Therefore, the product-moment correlation coefficient is:
S xc 98.5
r  0.293
S xx Scc 1225.675 91.94
Income shows weak positive correlation with CO2 emissions –
emissions are generally higher in wealthier countries. However,
as the correlation is low, the result is somewhat inconclusive.

25 of 58 © Boardworks Ltd 2005


Effects of coding on correlation

Scatter graphs, types of correlation and


Contents

lines of best fit


Product–moment correlation coefficient
The effects of coding on correlation
Regression

26
26 of
of 58
58 © Boardworks Ltd 2005
Effect of coding on the correlation

The value of the product–moment correlation coefficient is


unaffected by linear transformations of the variables.
More specifically, if the variables u and v are related to the
variables x and y through the transformations

u = ax + b
v = cy + d

then the correlation coefficient between u and v is


identical to the correlation coefficient between x and y.

Note: this is only true if c and a are greater than 0.

27 of 58 © Boardworks Ltd 2005


Effect of coding on the correlation

28 of 58 © Boardworks Ltd 2005


Effect of coding on the correlation

Example: The heights (in cm) of a sample of 11 men and their


adult sons can be summarized as follows:

 ( x  160) 105  ( y  160) 142  ( x  160)( y  160) 2483


 ( x  160) 1959  ( y  160)  3326
2 2

where x = height of father (in cm) and y = height of son (in cm).
Calculate the value of the product–moment correlation
coefficient between the fathers’ and sons’ heights.

Solution: Let u = x – 160 and v = y – 160.


Then:  u 105  v 142  uv 2483
 u 1959  v  3326
2 2

29 of 58 © Boardworks Ltd 2005


Effect of coding on the correlation

We can find the values of Suv, Suu and Svv:


105 142
Suv 2483  1127.55
11
105 
2

Suu 1959  956.727


11
1422
Svv 3326  1492.909
11
1127.55
So: r 0.943 (to 3 sig. figs.)
956.727 1492.909
As the transformations between (x, y) and (u, v) are linear, the
correlation coefficient between the father and sons’ heights
must also be 0.943.

30 of 58 © Boardworks Ltd 2005


Limitations of the PMCC

The product–moment correlation coefficient (PMCC) measures


the strength of a linear relationship.
However: Outliers can greatly distort the PMCC;
The PMCC is not a suitable measure of
correlation if the relationship is non-linear.
20 25

15 20
15
10
10
5
5
0 0
0 2 4 6 8 10 0 2 4 6 8 10

31 of 58 © Boardworks Ltd 2005


Types of variables

Variables can be described as being either:


random or
non-random.

Random variables take values that cannot be predicted


with certainty before collecting the data.

Sometimes a variable is controlled by the experimenter –


they decide in advance what values that variable should
take. If a variable is controlled, then it is non-random.

32 of 58 © Boardworks Ltd 2005


Types of variables

Example: An experiment is carried out into how fast a mug of


coffee cools. The temperature of the coffee is measured every
2 minutes until 10 minutes have passed.
Time (minutes) 0 2 4 6 8 10
Temperature (°C) 95 83 73 64 55 48

The values for the time were chosen by the experimenter. If


the experiment is repeated, the values for the time will be the
same. Therefore, time is a non-random variable.
Temperature is a random variable. The values for this
variable may be different if the experiment is repeated.

33 of 58 © Boardworks Ltd 2005


Regression

Scatter graphs, types of correlation and


Contents

lines of best fit


Product–moment correlation coefficient
The effects of coding on correlation
Regression

34
34 of
of 58
58 © Boardworks Ltd 2005
Regression – random on random

Linear regression involves finding the equation of the line of


best fit on a scatter graph.
The equation obtained can then be used to make an estimate
of one variable given the value of the other variable.
There are two cases to consider, depending upon whether:
1. We wish to find a value of y given a value for x, or
2. We want to estimate x given y.

We deal first with the situation where both variables (x and y)


are random, and where we wish to predict a value for y given
a value for x.

35 of 58 © Boardworks Ltd 2005


Regression – random on random

The best fitting line is the one that minimizes the sum of the
squared deviations,  di , where di is the vertical distance
2

between the ith point and the line.

d6
d3
d5
d1 d4
d2

The distances di are


sometimes referred
to as residuals.
36 of 58 © Boardworks Ltd 2005
Regression – random on random

As stated previously, the best fitting line should pass through


the mean point, ( x , y ) .

37 of 58 © Boardworks Ltd 2005


Regression – random on random

The line that minimises the sum of squared deviations is


formally known as the least squares regression line of y on x.
The equation of the least squares regression line of y on x is:
y = a + bx
S xy
where: b
S xx b is sometimes
referred to as
the regression
and: a  y  bx coefficient.

 x
2

Recall: S xy  xy 
 x  y 
and S xx  x 2


n n

38 of 58 © Boardworks Ltd 2005


Regression – random on random

Consider again the temperature data presented earlier.

Example: The table shows City Latitude Mean Jan.


temp. (°C)
the latitude, x, and mean
Belgrade 45 1
January temperature(°C), y,
Bangkok 14 32
for a sample of 10 cities in
Cairo 30 14
the northern hemisphere.
Dublin 50 3
Calculate the equation of the Havana 23 22
regression line of y on x and Kuala Lumpur 3 27
use it to predict the mean Madrid 40 5
January temperature for the New York 41 0
city of Los Angeles, which Reykjavik 30 –1
has a latitude of 34°N. Tokyo 36 5

39 of 58 © Boardworks Ltd 2005


Regression – random on random
City Latitude Mean Jan. We begin by finding
(x) temp. (°C) (y)
summary statistics for the
Belgrade 45 1
table:
Bangkok 14 32
 x 312
Cairo 30 14
Dublin 50 3
 y 108
Havana 23 22  x 11 636
2

Kuala Lumpur 3 27
 y 2494
2

Madrid 40 5
New York 41 0
 xy 2000
Reykjavik 30 –1 We then use these to
Tokyo 36 5 calculate the gradient
(b) and y-intercept (a)
for the regression line.

40 of 58 © Boardworks Ltd 2005


Regression – random on random

To find the gradient, we need Sxy and Sxx:


 x  y  312 108
 x 312
S xy  xy  2000   y 108
n 10
 1369.6  x 11 636
2

 x 
2

S xx  x  2
11 636 
312 2
 y 2494
2

n 10  xy 2000
1901.6

Therefore:

S xy 1369.6
b   –0.720 (to 3 sig. figs.)
S xx 1901.6

41 of 58 © Boardworks Ltd 2005


Regression – random on random

To find the y-intercept we also need x and y:


312
 x 312
x 31.2  y 108
10
y
108
10.8
 x 11 636
2

10  y 2494
2

So: a  y  bx 10.8  ( 0.720 31.2)  xy 2000


= 33.3 (to 3 sig. figs.) This is our
Therefore, the equation of the regression line is: estimate of the
mean January
y = 33.3 – 0.720x temperature in
Los Angeles
So, when x = 34, y = 33.3 – 0.720 × 34 = 8.82°C.

42 of 58 © Boardworks Ltd 2005


Regression – random on random

This prediction for the mean January temperature in Los


Angeles is based purely on the city’s latitude.
There are likely to be additional factors that can affect the
climate of a city, for example:
altitude;
proximity to the coast;
ocean currents;
prevailing winds.

The concept of regression we have considered here can be


extended to incorporate other relevant factors, producing a
new formula. This allows for more accurate prediction.

43 of 58 © Boardworks Ltd 2005


The dangers of extrapolation

A regression equation can only confidently be used to predict


values of y that correspond to x values that lie within the range
of the data values available.
It can be dangerous to
extrapolate (i.e. to predict)
from the graph, a value for y
that corresponds to a value of
x that lies beyond the range of
the values in the data set.
This is because we cannot
be sure that the relationship It is reasonably
between the two variables safe to make It is unwise to
extrapolate
will continue to be true. predictions
within the range beyond the
of the data. given data.
44 of 58 © Boardworks Ltd 2005
Examination style question: regression

Examination style question: The average weight and


wingspan of 9 species of British birds are given in the table.
a) Plot the data on a scatter graph. Bird Weight Wingspan
(g) (cm)
Comment on the relationship
Wren 10 15
between the variables.
Robin 18 21
b) Calculate the regression line of Great tit 18 24
wingspan on weight. Cuckoo 57 33
c) Use your regression line to Blackbird 100 37
estimate the wingspan of a jay, Pigeon 300 67
if its average weight is 160 g. Lapwing 220 70
d) Explain why it would be Crow 500 99
inappropriate to use your line Common gull 400 100
to estimate the wingspan of a duck, if the average
weight of a duck is 1 kg.
45 of 58 © Boardworks Ltd 2005
Examination style question: regression

a)

The graph indicates that there is fairly strong positive


correlation between weight and wingspan – this means that
wingspan tends to be longer in heavier birds.

46 of 58 © Boardworks Ltd 2005


Examination style question: regression

b) Summary values for the paired data are: x = weight


 x 1623  y 466 y = wingspan

 562 397
x 2
 32 890
y 2

 xy 131 541
These can be used to find the gradient of the regression line:

S xy  xy 
 x  y 
131 541
1623 466
n 9
47 505.67
 x
2

S xx  x 2

 562 397 
16232
n 9
269 716
S xy
47 505.67
Therefore: b   0.176 (to 3 sig. figs.)
S xx 269 716
47 of 58 © Boardworks Ltd 2005
Examination style question: regression

To find the y-intercept we also need x and y :


1623
x 180.33
9
466
y 51.78
9

So: a  y  bx  51.78  (0.176 180.33)


20.04

Therefore, the equation of the regression line is:


y = 20.0 + 0.176x
where y = wingspan and x = weight.

48 of 58 © Boardworks Ltd 2005


Examination style question: regression

c) When the weight is 160 g, we can predict the wingspan


to be:
20.0
y = 20.0 + 0.176x = + (0.176 × 160)
= 48.2 cm (to 3 sig. figs.)

d) The average weight of a duck is outside the range of


weights provided in the data. It would therefore be
inappropriate to use the regression line to predict the
wingspan of a duck, as we cannot be certain that the same
relationship will continue to be true at higher weights.

Note: The regression coefficient (0.176) can be


interpreted here as follows: as the weight increases by
1 g, the wingspan increases by 0.176 cm, on average.

49 of 58 © Boardworks Ltd 2005


Predicting x from y – random on random

We now turn our attention to the situation where we wish to


estimate a value of x when we are given a value of y.
We will continue to assume that both variables are random.
To predict x given y (when both variables are random), we use
the regression line of x on y. This line has the equation:
x = a′ + b′y This regression line is
designed to minimize
S xy the sum of the squares
where: b  and a  x  by of the deviations in the x
S yy direction.

Note that both the regression line of x on y and the


regression line of y on x pass through the mean point.
The two lines won’t in general be equal, unless the
points lie in a perfect straight line.
50 of 58 © Boardworks Ltd 2005
Predicting x from y – random on random

Examination style question: 15 AS level mathematics


students sit papers in C1 and S1. Their results are
summarized below, with c representing the percentage
mark in C1, and s the percentage mark in S1.

 c 888  c 2
58 362  s 943  s 2
66 445  cs 61 878
a) Calculate the regression line of s on c and the regression

line of c on s.
b) Caroline was absent for her C1 examination, but scored
52% in S1. Use the appropriate regression line to
c) estimate
Calculateher
thepercentage score in
product–moment the C1 paper.
correlation coefficient
between the marks in the two papers. Comment on the
implications of this for the accuracy of the estimate
found in b).
51 of 58 © Boardworks Ltd 2005
Predicting x from y – random on random

 c 888  58 362


c 2
 s 943  66 445
s 2
 cs 61 878
a) From these summary values we can calculate:
8882 9432
Scc 58 362  5792.4 S ss 66 445  7161.733
15 15
888 943
Scs 61 878   6052.4
15
Also: c 59.2 , s 62.8667
For the regression line of s on c:
6052.4
b 1.0449 a 62.8667  (1.0449 59.2) 1.00862
5792.4
So, the equation of the regression line of s on c is:
s = 1.01 + 1.04c
52 of 58 © Boardworks Ltd 2005
Predicting x from y – random on random

8882 943 2
Scc 58 362  5792.4 S ss 66 445  7161 .733
15 15
888 943
Scs 61 878  6052.4 c 59.2 s 62.8667
15
For the regression line of c on s:
6052.4

b   0.8451 a 59.2  (0.845162.8667)  6.07
7161.733
So, the equation of the regression line of c on s is:
c = 6.07 + 0.845s
b) We wish to estimate the value of c when s = 51. Both
variables are random, so we use the regression line of c on s:
c = 6.07 + 0.845s = 6.07 + (0.845 × 51) = 49.2
So we estimate Caroline to have scored 49% in C1.
53 of 58 © Boardworks Ltd 2005
Predicting x from y – random on random

8882 943 2
Scc 58 362  5792.4 S ss 66 445  7161 .733
15 15
888 943
Scs 61 878  6052.4 c 59.2 s 62.8667
15
c) The PMCC is calculated as follows:
Scs 6052.4
r   0.94
Scc S ss 5792.4 7161.733
The PMCC indicates that there is very strong positive
correlation between the marks in C1 and S1 – the points on
the scatter graph would lie very close to a straight line.
This suggests that the mark estimated in b) is likely to fairly
accurate.

54 of 58 © Boardworks Ltd 2005


Regression – controlled variables

We will now consider a situation where one of the variables


(here assumed to be x) is a controlled variable. This means
that the values of x are fixed – they were decided upon when
the experiment was planned.

If x is a controlled variable, the regression line of x on y does


not have any statistical meaning, since the values of x are not
random.

We consequently use only the regression line of y on x,


whether we are estimating a y or a x value.

55 of 58 © Boardworks Ltd 2005


Regression – controlled variables

Examination style question: An agricultural researcher


wishes to explore how the yield of a crop is affected by the
amount of fertilizer used. She designs an experiment in
which she fertilizes a small plot of land with a pre-determined
amount of fertilizer. She obtains the following results:
Amount of fertiliser (kg), x 2 4 6 8 10 12
Crop yield (kg), y 8.55 9.34 9.52 10.39 11.42 11.57

a) Calculate the regression line of y on x.


b) The regression line of x on y is: x = –23.8 + 3.04y
Use the appropriate regression line to estimate how much
fertilizer would be needed to achieve a crop yield of 10 kg.
Explain how you decided which regression line to use.

56 of 58 © Boardworks Ltd 2005


Regression – controlled variables

a) Amount of fertiliser (kg), x 2 4 6 8 10 12


Crop yield (kg), y 8.55 9.34 9.52 10.39 11.42 11.57

 x 42  364
x 2
 y 60.79  623.20
y 2
 xy 447.74
Also: x 7, y 10.132

From these we get: Sxx = 70 and Sxy = 22.2

The gradient of the regression line is: b = 22.2 ÷ 700.317


=
and the intercept is: a = 10.132 – (0.317 × 7) = 7.91
Therefore the regression line is y = 7.91 + 0.317x.

57 of 58 © Boardworks Ltd 2005


Regression – controlled variables

b) Since x is a controlled variable, only the regression line of y


on x has meaning. Therefore, this equation should be used
to estimate x when y = 10:
y = 7.91 + 0.317x
10 = 7.91 + 0.317x
2.09 = 0.317x
x = 6.59

Note: The intercept (7.91) represents the crop yield that might
be expected if no fertilizer were to be applied. The equation of
the line also shows that increasing the amount of fertilizer by
1 kg, increases the expected crop yield by 0.317 kg.

58 of 58 © Boardworks Ltd 2005

You might also like