0% found this document useful (0 votes)
31 views

6 Correlation and Regression

The document discusses correlation and simple linear regression. It defines correlation as measuring the relationship between two variables, and regression as identifying relationships between variables using mathematical equations. It also defines key terms like independent variable, dependent variable, and correlation coefficient.

Uploaded by

nithnithya1411
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

6 Correlation and Regression

The document discusses correlation and simple linear regression. It defines correlation as measuring the relationship between two variables, and regression as identifying relationships between variables using mathematical equations. It also defines key terms like independent variable, dependent variable, and correlation coefficient.

Uploaded by

nithnithya1411
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

AAMS1773 QUANTITATIVE STUDIES

CHAPTER 6:
CORRELATION AND SIMPLE LINEAR REGRESSION
Correlation
▪ measures the strength of the relationship between two
variables
▪ involves a bivariate data / distribution

Regression
▪ a study to identify the relationship between two or
more variables using a mathematical equation ▪ is
normally used for estimation purposes

Independent Variable / Explanatory Variable (X) ▪ the


variable that is used to explain the variation in the
dependent variable
▪ the variable that is used as a basis for prediction

Dependent Variable (Y)


▪ the variable to be predicted or explained

Example:
A study on relationship between the sales of ice cream
and the temperatures
▪ temperature is an independent variable since it can
be used to explain the sales of ice cream
▪ sales is a dependent variable since the sales
depends on temperature
Univariate distribution
▪ data of single characteristic is grouped
together ▪ Example: height of student , price of
item etc
Bivariate distribution
▪ data of two characteristics are grouped together ▪
Example: sales of ice cream and temperature, sales of
good and advertisement expenses.
Chapter 6 – Page 1
Scatter Diagram
▪ a plot of paired observations ( X, Y )

▪ illustrates whether

♦ any relationship between the dependent and


independent variables exists
♦ the relationship is positive or negative
♦ the relationship is linear or non-linear

▪ A positive relationship exists when both variables ↑


(or ↓) at the same time
▪ In a negative relationship, as one variable ↑, the other
variable will ↓, and vice versa

Example:
The data below relates the weekly maintenance cost ($) to the age
(in months) of ten machines of similar type in a manufacturing
company.
Machine 1 2 3 4 5 6 7 8 9 10
Age (X) 5 10 15 20 30 30 30 50 50 60
Cost (Y) 190 240 250 300 310 335 300 300 350 395
Construct a scatter diagram and comment on
it. Solution:
Scatter Diagram of Weekly Maintenance
Cost and Age of Machine
)
250
Maintenance cost ($
400 200
350 150
0 10 20 30 40 50 60 Age of machine
300
(months)
Comment:

Chapter 6 – Page 2
Two types of correlation
1. Linear correlation
✓ correlation is said to be linear if the relationship can be
represented by a straight line
2. Non-linear correlation (or curvilinear correlation) ✂ correlation is
said to be non-linear if the relationship can be represented by a
curve

Positive linear correlation


▪ An increase in the independent variable (X) will result an
increase in the dependent variable (Y)

Negative linear correlation


▪ An increase in the independent variable (X) will result a
decrease in the dependent variable (Y)

Correlation Coefficient ( r)
▪ measure the strength of linear relationship between two
variables
▪ has a range of values from –1 to +1 i.e. −1 ≤ r ≤ +1

▪ if r = 0, then there is no linear relationship


between the two
variables
▪ words of difference strength are used to describe the degree
of correlation, rough guides are listed in the following table for
interpretation purpose
Degree of correlation Positive linear Negative
correlation linear
correlation
Perfect +1 -1

0.9 ≤ r < 1.0 -1.0 < r ≤ -0.9


Strong/high very

0.8 ≤ r < 0.9 -0.9 < r ≤ -0.8


fairly

0.6 ≤ r < 0.8 -0.8 < r ≤ -0.6


Moderate

0.3 ≤ r < 0.6 -0.6 < r ≤ -0.3


Weak fairly

0.0 < r < 0.3 -0.3 < r < 0.0


very

Absent 0

▪ The degree of strength of the relationship does not depend on


the sign of the coefficient of correlation.
▪ E.g. Coefficient of –0.92 and +0.92 have equal strength, both
indicate very strong correlation between the two variables.
Chapter 6 – Page 3
10
Examples of scatter diagram: 10

Perfect positive linear correlation (r =


+1)
5
5

Perfect negative linear correlation (r = x


-1)

y
0 x
0
0 2 4 6 8 10
0 2 4 6 8 10
y

Strong positive linear correlation Weak negative linear correlation (x


and y strongly linearly related) (x and y somewhat linearly related)

y 0
0 2 4 6 8 10
10
y
10

x
0 2 4 6 8 10
5

x
0

No linear correlation (x and y not linearly related)


y
10

x
0
0 2 4 6 8 10

Chapter 6 – Page 4
Product Moment Correlation Coefficient, r

The product moment correlation coefficient provides a


measure of the strength of the linear relationship that exists
between two variables, X and Y.
n XY X Y
Σ−ΣΣ

= ΣΣ−Σ
r 2222
( )( )
[ ( ) ][ ( ) ] n X X n Y Y Σ −

where nis the number of pair bivariate


(X,Y)values.

Example:
Calculate the product moment correlation coefficient for the
following data. What does the value of the coefficient indicate?
X 5 6 7 9 8
Y 8 9 9 11 13

Solution:
2 2
X Y XY
X Y
5 8 40 25 64
6 9 54 36 81
7 9 63 49 81
9 11

8 13

ΣX = ΣY = ΣXY = ΣX2 = ΣY2 =

n XY X Y
Σ−ΣΣ
2222
r
( )( ) [ ( ) ][ ( ) ] n X X n Y Y

= Σ−ΣΣ−Σ
22
5(360) (35)(50)
[5(255) (35) ][5(516) (50) ]
−−

− =
0.7906
=

r = 0.7906 indicates that there is a moderately high positive


linear correlation between X and Y. As X increases, Y would also
increase.

Chapter 6 – Page 5
Coefficient of Determination,
2
r
▪ is the square of the coefficient of correlation

▪ indicates the percentage of the variation in dependent


variable (Y) that is explained by the variation in the given
independent variable (X)
2
01
▪ since −1≤ r ≤1, it follows that ≤r≤

Example:

r==
If 0.8 0.64 2 2
r = 0.8then

▪ 64% of the variation in the dependent variable (Y) is explained


by the variation in the given independent variable (X)
▪ the rest (36%) are unexplained or unaccounted for by X and
may be due to other factors

Example:
Refer to the data given in the previous example, calculate the
product moment correlation coefficient between age and
maintenance cost. Hence, find the coefficient of determination and
comment on the results.
Machine 1 2 3 4 5 6 7 8 9 10
Age (X) 5 10 15 20 30 30 30 50 50 60
Cost (Y) 190 240 250 300 310 335 300 300 350 395

Solution:
X Y XY X2 Y2
5 190 950 25 36100
10 240 2400 100 57600
15 250 3750 225 62500
20 300 6000 400 90000
30 310 9300 900 96100
30 335 10050 900 112225
30 300 9000 900 90000
50 300 15000 2500 90000
50 350 17500 2500 122500
60 395 23700 3600 156025

Chapter 6 – Page 6
∑X= ∑Y= ∑ XY = ∑ X2 = ∑ Y2 =

n XY X Y
Σ−ΣΣ

=
r
( )( ) 2222
[ ( ) ][ ( ) ] n X X n Y Y Σ − Σ Σ − Σ

Coefficient of Determination = r 2 =

Comment:
r = indicates that there is a linear correlation between and
. As increases,
would also increase.

r2 = shows that of the variations in can be


explained by the variations in
and the rest is due to other factors.

Correlation and Causation

✓ Causation ⇒ Correlation
E.g. Age of machine causes the maintenance cost to increase.
Therefore, definitely there is a correlation between the age and
the maintenance of the machine.

⇒Causation
✓ Correlation not always
E.g. There might be a strong positive correlation between ice
cream sales and umbrella sales, but they are NOT THE CAUSE
of each other. The real cause for changes of both variables may
possibly be the weather.

Thus, care must be taken as not to interpret a high correlation


between 2 variables into a cause and effect relationship unless the
relationship is meaningful.
Chapter 6 – Page 7
Spearman’s Rank Correlation Coefficient, r s
▪ measures the correlation based on the ranks of two sets of
data (X and Y)
▪ as an approximation to the product moment correlation
coefficient
▪ can be used even though the variables to be correlated are
not represented in numeric form (qualitative data)

▪ Example: 1. Discipline and exam marks.


2. Job performance and qualification.

▪ Ranking are usually allocated in ascending order; rank 1 to


the smallest item; rank 2 to the next larger and so on,
although it is perfectly feasible to allocate in descending
order. However, which method is selected must be used on
both variables.
The procedure for obtaining r s is given as follows:
STEP 1 Rank the X values (to give R1 values)

STEP 2 Rank the Y values (to give R2 values)

STEP 3 For each pair of ranks, calculate d 2 = (R1 – R2)2


and then calculate ∑ d2

STEP 4 The value of the rank correlation coefficient can


be found using the following formula:

6 d
Σ 2

s ( 1)
r with −1≤ rs

=−
12 ≤ +1
nn

where ≡rank coefficient of correlation


r
s ≡rank of X
≡difference between two corresponding ≡rank of Y
ranks R2
d

(d = R1− R2) ≡number of pairs of observations


R1 n

Chapter 6 – Page 8
Example (Data had been ranked)
X and Y were judges at a beauty contest in which there were 10
competitors. Their rankings are shown below.
Competitor A B C D E F G H I J
X 4 9 2 5 3 10 6 7 8 1
Y 6 10 2 8 1 9 7 4 5 3

Calculate a coefficient of rank correlation between these two sets


of rankings and comment briefly on your result.

Solution:
Competitor A B C D E F G H I J
R1 4 9 2 5 3 10 6 7 8 1
R2 6 10 2 8 1 9 7 4 5 3
d = R 1– R 2 – 2 – 1 0 –3 2 1 –1 3 3 –2
2
4 1 0 9 4 1 1 9 9 4
d

42 2
=−
2
n = 10, d 6 6(42)
Σ
Σd = rs

1 22= −

nn 1
( 1) −
0.7455
10(10 1) =

Spearman’s coefficient of rank correlation for the data is 0.7455,


indicating that there is a moderately high degree of association
between rankings of X and Y i.e. the opinions of the 2 judges
agree moderately well.

Example (Data had not been ranked)


The following data show the average rent and rates (RM per
square feet) for a selection of areas.
Rate (X) 1.68 1.46 1.57 13.3 3.18 1.95 1.07 1.71 1.22 6.46
7
Rent (Y) 3.81 4.19 4.87 22.8 6.47 6.48 2.66 6.49 5.33 15.2
5 3

Calculate Spearman’s rank correlation coefficient to access the


degree of correlation between rate and rent. Comment on your
finding.

Chapter 6 – Page 9
Solution:
2
Rate (X) Rank of X Rent (Y) Rank of Y d = R1 – R2
(R1 ) (R2) d

1.68 3.81

1.46 4.19

1.57 4.87

13.37 22.85

3.18 6.47
1.95 6.48

1.07 2.66

1.71 6.49

1.22 5.33

6.46 15.23
2

Σd =

=−
d 6
Σ 2
rs


n=10, nn
12 ( 1)
=

The result shows that there is a fairly strong positive rank


correlation between rankings of rate and rankings of rent. High
rankings of rate normally paired with high rankings of rent and vice
versa.

Note:
Sometimes two or more individuals or entries may be tied in rank,
in this case, each is given the average of the ranks as shown by
the following example
Salesman 1 2 3 4 5 6 7 8
Sales 20 35 25 20 35 40 20 10
Ranking

Salesman 1, 4 and 7 are tied for rank 2, 3 and 4, the average of 2,


3 & 4 = (2+3+4)/3 = 3 is assigned to each of these 3 salesmen.
Salesman 2 and 5 are tied for rank 6 and 7, the average of 6 & 7 =
6.5 is assigned as the rank for each of these 2 salesmen.
Chapter 6 – Page 10
Example (Data had tied rank)
The following data relate to the number of vehicles owned per 100
population (X) and road deaths per 100,000 population for 12
countries. Calculate the Spearman’s rank correlation coefficient
and comment on the result.
X 30 31 32 30 46 30 19 35 40 46 57 30
Y 30 14 30 23 32 26 20 21 23 30 35 26

Solution:
2
X R1 Y R2 d = R 1– R 2
d
30 30 - 5.5 30.25

31 14 5 25

32 30 -2 4

30 23 -1 1

46 32 - 0.5 0.25

30 26 -3 9

19 20 -1 1

35 21 5 25

40 23 4.5 20.25

46 30 1.5 2.25

57 35 0 0

30 26 -3 9

127 2
Σd =

=−
d 6 6(127)
Σ 2
rs
1 22= −

( 1) − −
n=12, 1
0.5559
nn 12(12 1) =

The result shows that there is a fairly weak positive rank


correlation between rankings of vehicles owned and rankings of
number of road deaths.

Comparison of product moment correlation and rank


correlation:

Product moment coefficient, r


▪ The standard measure of correlation

▪ Data must be numeric

Chapter 6 – Page 11
Spearman’s rank coefficient, r s
▪ Only an approximation to the product moment coefficient
▪ Easier to use with less calculations

▪ Can be used with non-numeric data

▪ Can be insensitive to small changes in actual values. This is


easily seen using the data values 12.3, 12.4 and 23, say,
where the allocated ranks would be 1, 2 and 3. No account is
taken of the small difference between the first two values
compared with the large difference between the second and
third values

LINEAR REGRESSION
▪ Regression is concerned with obtaining a mathematical
equation which describes the relationship between two
variables
▪ The equation can be used for comparison or estimation
purpose

Simple Linear Regression


▪ the simplest form of linear relationship between two variables
▪Y=a+bX
where Y a
≡interception of the line at the y-axis
b
X ≡regression coefficient/ slope or gradient
≡dependent variable of the line ≡independent variable

Note: 1. b indicates the changes in Y when a unit change in X 2. b is


positive ⇒ positive linear relationship between X and Y

3. b is negative ⇒ negative linear relationship between X


and Y

Least squares method


▪ the standard technique for obtaining a linear regression line
such that the error sum of squares is the minimum, i.e. the
least squares regression line gives a minimum value for the
sum of the squares of the vertical deviations of every scatter
point from the regression line.
Chapter 6 – Page 12

Σ−ΣΣ
Least squares regression line

▪ The least squares regression line of Y


on X is where ˆ= a + b X Y
n XY X Y

Σ−Σ
b ( )( ) ()
=
22
nXX
Y
Σ

a = Y − bX
or
nX aΣ

= −
n b

and n ≡ total observations in a set of bivariate data ( X, Y ).

Notes:
For any set of bivariate data, the least squares regression line of Y
on X
1. is used to estimate a value of Y given a value of X
2. passes through the mean point
(X ,Y )of the data
Example

The following table shows the output at a factory and costs of


production over the past 5 months. Find the equation of the least
squares regression line.
Month 1 2 3 4 5
Output (000’s units) 20 16 24 22 18
Costs (RM’000) 82 70 90 85 73

Solution:
Let X = output in 000’s units; Y = total costs in RM’000.
X Y XY X2
20 82 1640 400
16 70 1120 256
24 90 2160 576
22 85 1870 484
18 73 1314 324

ΣX = 100 ΣY = 400 ΣXY = 8104


2040 2
ΣX =

Chapter 6 – Page 13

n XY X Y Σ − Σ Σ
b 22 ()
nXX
=
Σ−Σ =
( )( )
= − =
Y
Σ Σ
a b X
n n ˆ
= a + bX =
∴The regression line is Y

Regression Analysis as a forecasting tool • Two types of


estimation using the regression equation 1. Extrapolation
estimate
⇨ Extrapolation range of X ≡find the value of Y outside the
observed

⇨ most commonly used for forecasting using a time series


⇨ may be less accurate and unreliable to a certain extent 2.
Interpolation estimate
⇨ Interpolation range of X ≡find the value of Y within the observed

⇨ forecasting using interpolation is more accurate and more


reliable than using extrapolation

Example:
The data below relates the weekly maintenance cost ($) to the age
(in months) of machines of similar type in a manufacturing
company.
Machine 1 2 3 4 5 6 7 8 9 10
Age (X) 5 10 15 20 30 30 30 50 50 60
Cost (Y) 190 240 250 300 310 335 300 300 350 395

(a) Find the least squares regression line of maintenance cost on


age.
(b) Plot a scatter diagram and draw the regression line. (c) Using
the regression line, predict the maintenance costs for a machine of
this type that aged (i) 40 months old and (ii) 80 months old,
respectively. Comment and compare on the accuracies of your
estimates.
Chapter 6 – Page 14
Solution:

From the previous example, we have


ΣXY = ΣX = ΣY = ΣX2 = n XY X Y

Σ−ΣΣ

= =
(a) 22 ()
b nXXΣ−Σ
( )( )

= − =
Y
Σ Σ
a b X
n n

The least squares regression line of maintenance cost on


age is

(b) Scatter diagram and the regression line:


Scatter Diagram of Weekly Maintenance Cost
250
Maintenance cost ($)
and Age of Machine 200
400
150
350 0 10 20 30 40 50 60 Age of machine (months)

300

Plotting the regression line:


X

(c) (i) When X = 40, Comment: ∧

Y=

This is an estimate since the value of X = lies the observed


range of X: [ , ]. The estimate is

Chapter 6 – Page 15
(ii) When X = 80, Comment: ∧

Y=

This is an estimate since the value of X = lies the observed


range of X: [ , ]. The estimate is

The estimate in part ( ) is more accurate because it is an


_______________ estimate.

Interpretation of 'a ' and 'b '

In the regression equation Y= a + b X,


✓ a is the estimated value of Y when X = 0; i.e. the Y-intercept


value
✓ b indicates the changes in Y when a unit change in X ✓ b is
positive ⇒ positive linear relationship between X and Y ✓ b is

negative ⇒ negative linear relationship between X and Y ✓ b will

always have the same sign as the coefficient of correlation, r

Example:

If
Y= a + b X = 3.33 + 0.47 X, then interpret the values of 'a ' and
'b ' ; where Y = sales ($'000) and X = advertising costs ($'00),

Solution:

a : The estimated sales is $3,330 if there is no expenditure on


advertising.

b : For each $100 increase in advertising expenditure, sales is


estimated to increase by $470.

Chapter 6 – Page 16
Example:
If Y= a + interpret the values of 'a '
∧ bX =28 + 2.6X, then and 'b '

; where Y= expenditure in $'000 and X = output in 000's units.

Solution:

a:

b:

THE ADVANTAGES AND DISADVANTAGES OF REGRESSION


ANALYSIS
Advantages:
(a) It can be used to estimate a line of best fit using all the data
available. It is likely to provide a more reliable estimate than
any other technique of producing a straight line of best fit (for
example, estimating by eye).

(b) The reliability of the estimate can be evaluated by calculating


the correlation coefficient r.

Disadvantages:
(a) It assumes a linear relationship between the two variables,
whereas a non-linear relationship may exist.

(b) When it is used for forecasting future values, it assumes that


what has happened in the past will provide a reliable guide to
the future which may not be always true in real life situations.

(c) The technique assumes that the value of Y depends solely on


the value of X. In reality, the value of Y might depend on
several other variables, not just on X.
Chapter 6 – Page 17
Computer Application – Using Excel

Example
In Mr. Steve's physical fitness course, several fitness scores were taken. The
following sample is the number of push-ups and sit-ups done by ten randomly
selected students:
Student 1 2 3 4 5 6 7 8 9 10
Push-ups (X) 27 22 15 35 30 52 35 55 40 40
Sit-ups (Y) 30 26 25 42 38 40 32 54 50 43

Follow the instruction below:

Step 1: Key in the data in an Excel worksheet as shown in Figure 1.

Step 2: Click Data → Data analysis


Choose Regression
Step 3: Input Y-Range → Highlight the range of Y values in your
worksheet. Input X-Range → Highlight the range of X values in
your worksheet

Step 4: Labels in first row → Check this box if you had entered the
variable name in your first cell.

Step 5: Output range → Key in one cell destination where your output will
start.

Step 6: OK → When you are done, click ok.

Chapter 6 – Page 18

Figure 1

The Summary Output:


1. From the Regression Statistics, you can get the �� value by finding the
square root of R Square, �� = √0.70465805.

2. The Coefficients are values of a and b.


�� = 14.90822536 �� = 0.657885317
̂
Hence, �� = 14.9082 + 0.6579��

Chapter 6 – Page 19
Scatter Diagram

Step 1: Highlight both columns of data. On the Insert tab, click the Scatter (X, Y)
chart command button. Select the Chart subtype that doesn’t include
any lines as shown in Figure 2.

Figure 2
Step 2: Right-click the x axis or y axis and click Format Axis. On the Format
Axis pane, set the desired Minimum and Maximum bounds as
appropriate. Additionally, you can change the Major units that control
the spacing between the gridlines.

Figure 3

Chapter 6 – Page 20
Step 3: Add Axis Titles and a Trendline by clicking the Add Chart Element
Menu.

Figure 4

A plot of the data points (scatter plot) and the fitted regression line is shown in
Figure 5.

Scatter Diagram of Push-ups and Sit-ups


60
50
20
Sit-ups 10 20 30 40 50 60 Push-ups
40
Figure 5
30

Chapter 6 – Page 21
AAMS1773 QUANTITATIVE STUDIES
Tutorial 6 (CORRELATION AND SIMPLE LINEAR REGRESSION)

1. The following shows the number of price quotations issued and the number of
sales made by a random sample of 8 salesmen. The figures relate to the
same period of time:
Number of price 105 213 96 157 114 103 237 185
quotations
issued
Number of sales 78 104 63 83 54 59 137 96

(a)Construct a scatter diagram to illustrate these figures.


(b)Calculate the product moment correlation coefficient.
(c) Without performing any further calculations, interpret the results obtained
in parts (a) and (b).

2. An analysis of a number of companies showed the following relationship


between expenditure on research & development and profits:
Company Research and Profits ($ millions)
Development Expenditure
($ millions)
A 35 68
B 46 303
C 15 70
D 80 290
E 35 108
F 61 140
G 52 90

(a)Calculate the product moment correlation coefficient.


(b)Rank the data and calculate the Spearman’s rank correlation coefficient. (c)
Comment on the difference and accuracy between the answers obtained in
parts (a) and (b).

3. A sample of eight employees is taken from the production department of a light


engineering factory. The data which follows relate to the number of week
experience in the wiring of components, and the number of components which
rejected as unsatisfactory last week.

Chapter 6 – Page 22
Employee A B C D E F G H
Weeks of experience 4 5 7 9 10 11 12 14
Number of rejects 21 22 15 18 14 14 11 13

(a)Calculate the product moment correlation coefficient and coefficient of


determination. Interpret your answers.
(b) Find the least squares regression line of rejects on experience. Predict the
number of rejects you would expect from an employee with one week of
experience.
ˆ
(c) Interpret the values of the constants ‘a’ and ‘b’ of the regression line Y

=
a + bX in part (b).
(d) Calculate the Spearman's rank correlation coefficient.

4. The total monthly cost and the monthly output of electronic component in a
factory for ten months are tabulated below:
Output (‘000s) 21 3 5 24 19 15 11 9 14 9
Total cost ($’000) 65 30 31 71 54 52 40 33 45 38
(a)Calculate the product moment correlation coefficient and coefficient of
determination. Interpret your answers.
(b) Find the least squares regression line of total cost on output. (c) State the
fixed cost of the factory and the average variable cost per unit of production.
(d)Estimate the total cost if the production levels were:
(i) 8,500 units
(ii) 25,000 units
Comment on the accuracy or the estimates.

5. The following data is taken from a manufacturing company’s budget relates to


volume of sales and the corresponding expenses.
Sales volume (‘000 units) 5 6 7 8 9 10
Total expenses ($‘000) 74 77 82 86 92 95

(a)Obtain the product moment correlation coefficient and comment on the


result.
(b)Calculate the coefficient of determination and interpret the result.

Chapter 6 – Page 23
(c) Draw a scatter diagram and comment on the appropriateness of fitting a
straight line relationship. By the method of least squares draw the
regression line of best fit.
(d)What will be the expected total expenses when the volume of sales is 7500
units?

6. The number of customer complaints received by a retailer was worrying the


management. 8 stores were selected. The area of selling space and the
number of complaints against the particular store are given below:
Store Selling Area (’00 m2) Number of complaints

A 12 42
B 95 124
C 8 12
D 45 102
E 34 53
F 72 145
G 60 88
H 19 26

(a) Find the least squares regression line of number of complaints on selling
area. Interpret the values of the constants ‘a’ and ‘b’ of the regression line
=a + bX.
ˆ
Y
(b)Hence using the least squares line, estimate the number of complaints
received when the selling area of the store is:
(i) 5000 square meters; (ii) 10000 square meters.
(c) Which estimate obtained in part (b) would be more reliable? Give reasons.

Answers
1. (b) r = 0.9319
2. (a) r = 0.6370 (b) rs = 0.6339
3. (a) r = -0.8714, 75.93% (b) Ŷ = 24.8929 – 0.9881X, 23.9048 (d) -
0.9107
4. (a) r = 0.9745, 94.97% (b)Ŷ = 19.5945 + 2.0235X
(c)$19,595, $2.02 per unit (d)(i)36.7943($000) (ii) 70.182($000) 5. (a) r =
0.9963 (b) 99.26% (c) Ŷ = 51.3333 + 4.4X (d) 84.3333 ($’000)
6. (a) Ŷ = 12.9609 + 1.4154X (b) (i) 83.7309 (ii) 154.5009

Chapter 6 – Page 24

You might also like