0% found this document useful (0 votes)
73 views66 pages

Regression - Correlation Worksheet - Student

Uploaded by

pratham.parab24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views66 pages

Regression - Correlation Worksheet - Student

Uploaded by

pratham.parab24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Simple Linear Regression Analysis

1
Salesperson
1
2
3
4
5
6
7
8
9
10
Experience in Years Sales
1 80
3 97
4 92
4 102
6 103
8 111
10 119
10 123
11 117
13 136
A sales manager collected the following data on annual sales for new custom
salespersons.

Salesperson 1 2 3 4 5 6 7 8 9 10
Years 1 3 4 4 6 8 10 10 11 13
Sales 80 97 92 102 103 111 119 123 117 136

a.Develop a scatter diagram for these data with years of experience as the ind

b.Develop an estimated regression equation that can be used to predict annua

Y= 4x +80

c.Use the estimated regression equation to predict annual sales for a salesper

Y= 4 x+80 = 4*9 + 80= 36 +80= 116 Units

d. What is the value of coefficient of determination r2 . comment on the goo

e.What is the value of sample correlation coefficient? does it reflect a strong

sales.

f.Test the significant relationship at the 0.05 level of significance. what is yo

g.Provide interpretation of slope and intercept of the estimated regression eq


for new customer accounts and the number of years of experience for a sample of 10

erience as the independent variable.

to predict annual sales given the years of experience.

es for a salesperson with 9 years of experience.

ment on the goodness of fit.

reflect a strong or weak relationship between years of experience of the sales person

ance. what is your conclusion.

ed regression equation.
e for a sample of 10

of the sales person and the


x Y
2 Distance Days
1 8
3 5
4 8
6 7
8 6
10 3
12 5
14 2
14 4
18 2

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.843121469436239
R Square 0.710853812224323
Adjusted R 0.674710538752363
Standard E 3.15840837737315
Observatio 10

ANOVA
df SS MS F Significance F
Regression 1 196.195652173913 196.1956521739 19.66767 0.002182936
Residual 8 79.8043478260869 9.975543478261
Total 9 276

Coefficients Standard Error t Stat P-value Lower 95%


Intercept 19.3260869565217 2.5335835033841 7.62796526371 6.142E-05 13.48363292
X Variable -2.06521739130435 0.465681909495911 -4.4348241776 0.002183 -3.1390818
A sociologist was hired by a large city h
absent per year and the distance (miles) b
data were collected.

Distance 1 3 4 6 8 10 12 14 14 18
Days 8 5 8 7 6 3 5 2 4 2

a.Develop a scatter diagram for these dat

b.Develop an estimated regression equati

Y= 8.10 -0.34X

c.Use the estimated regression equation p

work is 5 miles.

Y= 8.10 -0.34* 5= 6.4= approx 6 day

d. What is the value of coefficient of de

R Square= 0.71 , Model is stron

71% of the variation in dep var Y

(miles) between home and work


gnificance F

e.What is the value of sample correlation


Upper 95% Lower 95.0% Upper 95.0%
25.168540992211 13.4836329208 25.1685409922
f.Test the significant relationship at the 0
-0.991352982317 -3.1390818003 -0.9913529823
Since p = 0.00 < 0.05 , H0 is rejected, we

no: of days employees are absent ( unaut


e.What is the value of sample correlation

f.Test the significant relationship at the 0

Since p = 0.00 < 0.05 , H0 is rejected, we

no: of days employees are absent ( unaut

g.Provide interpretation of slope and inte

Intercept (a) , if x=0, y = ?????


Y= 8.10 -0.34* 0= 8.1

Slope ( b)= with one unit increase in X, w


b= - 0.34, Increase in X= distance by

H0: There is no significant relationshsip between distance (miles)


are absent ( unauthorized) per year.

H1: There is significant relationshsip between distance (miles) b


are absent ( unauthorized) per year.
p = 0.00 < 0.05 , H0 is rejected, we can conclude there is significa
home and work and no: of days employees are absent ( unauthor
ired by a large city hospital to investigate the relationship between the number of unauthorized d
he distance (miles) between home and work for the employees . a sample of 10 employees was c

12 14 14 18
42

iagram for these data with distance (miles) between home and work as the independent variable

ed regression equation that can be used to predict expected no: of days employees are absent per

egression equation predict expected no: of days employees are absent per year , if the distance (m

6.4= approx 6 days

of coefficient of determination r2 . comment on the goodness of fit.

Model is strong enough to be fitted on the given data.

tion in dep var Y( no: of days employees are absent per year ) is explained by

home and work )

f sample correlation coefficient? does it reflect a strong or weak relationship between the variabl

relationship at the 0.05 level of significance. what is your conclusion.

, H0 is rejected, we can conclude there is significant relationshiip between distance (miles) bet

es are absent ( unauthorized) per year.


f sample correlation coefficient? does it reflect a strong or weak relationship between the variabl

relationship at the 0.05 level of significance. what is your conclusion.

, H0 is rejected, we can conclude there is significant relationshiip between distance (miles) bet

es are absent ( unauthorized) per year.

on of slope and intercept of the estimated regression equation.

y = ?????
1

unit increase in X, what will be change in Y.


e in X= distance by 1mile with decrease the expected leaves taken by employees by 3

en distance (miles) between home and work and no: of days employees

distance (miles) between home and work and no: of days employees

e there is significant relationshsip between distance (miles) between


absent ( unauthorized) per year.
mber of unauthorized days that employees are
of 10 employees was chosen, and the followoing

e independent variable.

mployees are absent per year .

year , if the distance (miles) between home and

r ) is explained by ind var X ( distance

hip between the variables.

en distance (miles) between home and work and


hip between the variables.

en distance (miles) between home and work and

n by employees by 34 %.
x Y
Distance Days
1 8
3 5
4 8
6 7
8 6
10 3
12 5
14 2
14 4
18 2

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.84
R Square 0.71
Adjusted R Square 0.67
Standard Error 1.29
Observations 10

ANOVA
df
Regression 1
Residual 8
Total 9

Coefficients
Intercept 8.098
Distance -0.344

Cor Analysi
Distance
Distance 1
Days -0.843
A sociologist was hi
for the employees . a

Distance 1 3 4 6 8 10
Days 8 5 8 7 6 3 5 2

a.Develop a scatter di

b.Develop an estimat

Y= 8.10 -0.34X

c.Use the estimated re

Y= 8.10 -0.34* 5= 6

d. What is the value

R Square= 0.71 ,

71% of the variat

e.What is the value of

f.Test the significant r

Since p = 0.00 < 0.05

g.Provide interpretati

Intercept (a) , if x=0,


Y= 8.10 -0.34* 0= 8.1

Slope ( b)= with one u


g.Provide interpretati

Intercept (a) , if x=0,


Y= 8.10 -0.34* 0= 8.1

Slope ( b)= with one u


b= - 0.34, Increase

SS MS F
32.6992754 32.699 19.668
13.3007246 1.663
46

Standard Error t Stat P-value


0.809 10.012 0.000
0.078 -4.435 0.002

Days

1
A sociologist was hired by a large city hospital to investigate the relationship between
for the employees . a sample of 10 employees was chosen, and the followoing data wer

Distance 1 3 4 6 8 10 12 14 14 18
Days 8 5 8 7 6 3 5 2 4 2

a.Develop a scatter diagram for these data with distance (miles) between home and wor

b.Develop an estimated regression equation that can be used to predict expected no: of

Y= 8.10 -0.34X

c.Use the estimated regression equation predict expected no: of days employees are abs

Y= 8.10 -0.34* 5= 6.4= approx 6 days

d. What is the value of coefficient of determination r2 . comment on the goodness of fi

R Square= 0.71 , Model is strong enough to be fitted on the given da

71% of the variation in dep var Y( no: of days employees are absent p

e.What is the value of sample correlation coefficient? does it reflect a strong or weak re

f.Test the significant relationship at the 0.05 level of significance. what is your conclusi

Since p = 0.00 < 0.05 , H0 is rejected, we can conclude there is significant relationshiip

g.Provide interpretation of slope and intercept of the estimated regression equation.

Intercept (a) , if x=0, y = ?????


Y= 8.10 -0.34* 0= 8.1

Slope ( b)= with one unit increase in X, what will be change in Y.


g.Provide interpretation of slope and intercept of the estimated regression equation.

Intercept (a) , if x=0, y = ?????


Y= 8.10 -0.34* 0= 8.1

Slope ( b)= with one unit increase in X, what will be change in Y. Days
b= - 0.34,9 Increase in X= distance by 1mile with decrease the expected leave
8

7 f(x) = − 0.344202898550725 x + 8.097826086


6
R² = 0.710853812224323
5
Axis Title

0
0 2 4 6 8 10 12

Axis Title

Significance F
0.002
Lower 95% Upper 95% Lower 95.0%
6.233 9.963 6.233
-0.523 -0.165 -0.523
he relationship between the number of unauthorized days that employees are absent per year and
the followoing data were collected.

) between home and work as the independent variable.

predict expected no: of days employees are absent per year .

days employees are absent per year , if the distance (miles) between home and work is 5 miles.

ent on the goodness of fit.

tted on the given data.

ployees are absent per year ) is explained by ind var X ( distance (miles) betw

flect a strong or weak relationship between the variables.

ce. what is your conclusion.

significant relationshiip between distance (miles) between home and work and no: of days emp

regression equation.

Y.
regression equation.

Y. Days
se the expected leaves taken by employees by 34 %.

x + 8.09782608695652
Days
Linear (Days )

10 12 14 16 18 20

Axis Title
Upper 95.0%
9.963
-0.165
e absent per year and the distance (miles) between home and work

nd work is 5 miles.

nce (miles) between home and work )

and no: of days employees are absent ( unauthorized) per year.


Weeks Sales
1 102
2 103
3 111
4 119
5 123
6 117
7 136
8 80
9 97
10 92
11 97
12 99
13 112
14 121
15 122
16 109
17 112
18 98
19 87
20 97
a. Develop a scatter diagram for the data with no: of weeks as the
independent variable.
b. Develop an estimated regression equation that can be used to
predict weekly sales.
c. Use the estimated regression equation to predict sales for 21st
week.
d. What is the value of coefficient of determination r2 . comment on
the goodness of fit.
e. What is the value of sample correlation coefficient? Can we say
sales is increasing with increasing weeks?

r= -0.22, no sales is not incraesing

f. Test the significant relationship at the 0.05 level of significance. what


is your conclusion.
p= 0.34 > 0.05 therefore relationship is not significant .
H0 is accepted, not significant

g. Provide interpretation of slope and intercept of the estimated


regression equation.
Sales
160

140

120

f(x) = − 0.523308270676692 x + 112.194736842105


100 R² = 0.0496188976610239
Axis Title

80 Sales
Linear (Sales)

60

40

20

0
0 5 10 15 20 25

Axis Title
h no: of weeks as the

that can be used to

redict sales for 21st 102


102 1
ination r . comment on
2

efficient? Can we say

evel of significance. what

ignificant .

pt of the estimated
Sales
Linear (Sales)

25
ind ( X) Dep ( Y)
3Age of Bus (Years)
Maintenance Cost ($)
1 350
2 370
2 480
2 520
2 590
3 550
4 750
4 800
5 790
5 950
The regional transit authority for a major metropolitan area wants t
buses resulted in the following data:

Age of Bus (Years) 1 2 2 2 2 3 4 4 5 5


Maintenance Cost ($) 350 370 480 520 590 550 750 800 790 950

a.Develop a scatter diagram for these data with age of bus as the in

b.Develop an estimated regression equation that can be used to pre

Y= 131.67x+220
c.Use the estimated regression equation predict expected mainetan

if x= 4 years , y = 746.68 dollars


d. What is the value of coefficient of determination R2 . comment

R square = 0.8725, model is very strongly fitting into the data p

e.What is the value of sample correlation coefficient? does it reflec

r= 0.934, Karl perason cor coef is +ve, therefore we can say wit

lies between ( 0.75 to 1).


r=
f.Test the significant relationship at the 0.05 level of significance. w

Since p< 0.05 (alpha given was 5 %), therefore H0 is rejected,so

cost.

g.Provide interpretation of slope and intercept of the estimated

slope- With increase of one unit ( one year) in x , i.e age of bus i
Since p< 0.05 (alpha given was 5 %), therefore H0 is rejected,so

cost.

g.Provide interpretation of slope and intercept of the estimated

slope- With increase of one unit ( one year) in x , i.e age of bus i

Intercept- If x =0, i.e age of bus is zero, then Y ( maintennace co


opolitan area wants to determine whether there is any realationship between the age of the bus a

550 750 800 790 950

h age of bus as the independent variable.

at can be used to predict expected mainetance cost of the bus.

ct expected mainetance cost , if the bus is 4 years old.

nation R2 . comment on the goodness of fit.

tting into the data provided. ind var x, is able to explain 87% of the variation in dep var Y.

ficient? does it reflect a strong or weak relationship between the variables.

efore we can say with the increase in age of bus in years there will be incraese in maintenan

vel of significance. what is your conclusion.

ore H0 is rejected,so we can conclude that there is a significant relationship between the tw

ept of the estimated regression equation.

in x , i.e age of bus in years, there will be increase of 131.67 dollars in maintenance cost.
ore H0 is rejected,so we can conclude that there is a significant relationship between the tw

ept of the estimated regression equation.

in x , i.e age of bus in years, there will be increase of 131.67 dollars in maintenance cost.

n Y ( maintennace cost), will be 220 dollars.

r= karl pearsons coefic


R square= coef of dete

r = underroot( R sqaur
en the age of the bus and the annual mainetanance cost. A sample of 10

ariation in dep var Y.

ncraese in maintenance cost. Also the realtionship very strong as r

nship between the two var namely age of bus and maintenance

maintenance cost.
nship between the two var namely age of bus and maintenance

maintenance cost.

karl pearsons coeficient of correlation


quare= coef of determination

underroot( R sqaure- coef of determination)


X Y
4 Hours Points
A Statistics Professor at IESMCRC is inte
45 40
30 35 earned by the students in a course. data co
90 75
60 65
Hours 45 30 90 60 105 65 90 80 55 75
105 90
65 50 Points 40 35 75 65 90 50 90 80 45 65
90 90
80 80 a.Develop an estimated regression equatio
55 45
75 65 b.test the significance of the model with (
c.Predict the total points earned by Sidhhe
SUMMARY OUTPUT
d.Develop a scatter diagram for these data
Regression Statistics
Multiple R 0.936861707333905
R Square 0.877709858668599
Adjusted R Sq0.862423591002174
Standard Erro 7.52312475205211 Lab39:
Observations 10 min score a student
can get

ANOVA
df SS MS F
Regression 1 3249.72075172049 3249.7207517205 57.4181924471
Residual 8 452.779248279513 56.597406034939
Total 9 3702.5

Coefficients Standard Error t Stat P-value


Intercept 5.84700899947062 7.97173069743269 0.733467953371 0.48420984389
X Variable 1 0.829539438856538 0.10947432605215 7.5774792937424 6.4395946E-05
ofessor at IESMCRC is interested in the relationship between the hours spent in studying and the
students in a course. data collected of 10 students who took the course last trimester is as follows

90 60 105 65 90 80 55 75
75 65 90 50 90 80 45 65

estimated regression equation showing how total points earned is related to hours spent studying
ificance of the model with ( alpha= 0.05)
otal points earned by Sidhhesh. He spent 95 hrs studying. Y ( Score)= 84.5= approx 85 Marks
catter diagram for these data with hours spent in studying as the independent variable.

Points
100
Lab39: 90
f(x) = 0.829539438856538 x + 5.84700899947062
percentage change in 80 R² = 0.877709858668599
Independent variable 70
will impact depnadent
variable 60
Axis Title

50 Points
40 Linear (Points
Significance F 30
6.4395946E-05 20
10
0
20 30 40 50 60 70 80 90 100 110
Axis Title
Lower 95% Upper 95% Lower 95.0% Upper 95.0%
-12.535834954 24.22985295248 -12.535834954 24.22985295248
0.57709119028 1.081987687431 0.57709119028 1.081987687431

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.9368617073
R Square 0.8777098587
Adjusted R Squar 0.862423591
Standard Error 7.5231247521
Observations 10

ANOVA
df SS
Regression 1 3249.720752
Residual 8 452.7792483
Total 9 3702.5

Coefficients Standard Error


Intercept 5.8470089995 7.971730697
Hours 0.8295394389 0.109474326
ent in studying and the total points
trimester is as follows:

o hours spent studying.

5= approx 85 Marks
nt variable.

8 x + 5.84700899947062

Points
Linear (Points)

100 110
MS F Significance F
3249.720752 57.41819245 6.43959E-05
56.59740603

t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
0.733467953 0.484209844 -12.535835 24.22985295 -12.535835 24.22985295
7.577479294 6.43959E-05 0.57709119 1.081987687 0.57709119 1.081987687
5

Place Adjusted gross income


Los Angeles 36,664
Sacramento 38,845
Atlanta 34,886
Boise 32,512
Dallas 34,531
Providence 35,995
San Jose 37,799
Cheyenne 33,876
Fargo 30,513
New Orleans 30,174
Oklahoma City 30,060
Houston 37,153
Portland 34,918
Phoenix 33,291
Augusta 31,504
Albuquerque 29,199
Greensboro 33,072
Columbia 30,859
Nashville 32,566
Buffalo 34,296

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.457577120583176
R Square 0.209376821281191
Adjusted R Square 0.165453311352368
Standard Error 2493.73811623429
Observations 20
ANOVA
df
Regression 1
Residual 18
Total 19

Coefficients
Intercept 29070.2375249501
Percent audited 5439.17165668662

Adjusted Gross Income


Adjusted Gross Inc 1
Percent Audited 0.46589218934279
The transactional records access cleaninghouse at Syracuse
following table shows the average adjusted gross income re
Percent audited
Adjusted Gross Income 36,664 38,845 34,886 32,512 34,531 35,99
1.3 34,296
1.1 Percent Audited 1.3 1.1 1.1 1.1 1 1 0.9 0.9 0.9 0.9 0.8 0.8 0.7 0.7 0
1.1
1.1 Develop the estimated regression equation that could be us
At the 0.05 level of significance, determine whether the ad
1
1
0.9
0.9
0.9
0.9
0.8
0.8
0.7 1.4
0.7 1.2
0.7
1
0.6
0.8
0.6

Axis Title
0.6
0.5
0.5 0.4

0.5 0.2

0
28000
SS MS F Significance F
29643757.487525 29643757.487525 4.766850865 0.0424967266173022
111937136.262475 6218729.79235972
141580893.75

Standard Error t Stat P-value Lower 95%


2141.60416038871 13.5740479322162 6.77298E-11 24570.8941429176
2491.24811250177 2.18331190273316 0.042496727 205.25358942277

Percent Audited

1
aninghouse at Syracuse University reported data showing the odds of an Internal Revenue Servic
djusted gross income reported and the percent of the returns that were audited for 20 selected IRS
4,886 32,512 34,531 35,995 37,799 33,876 30,513 30,174 30,060 37,153 34,918 33,291 31,504 29,199 33,072
9 0.9 0.9 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.5 0.5 0.5

uation that could be used to predict the percent audited given the averge adjusted gross income r
ermine whether the adjusted gross income and the percent audited are related.

Chart Title
1.4

1.2

1 f(x) = 3.86778211976645E-05 x − 0.470953656567224


R² = 0.217055532090618
0.8
Axis Title

Colu
0.6 Line

0.4

0.2

0
28000 30000 32000 34000 36000 38000 40000
Axis Title
Upper 95% Lower 95.0% Upper 95.0%
33569.58091 24570.89414 33569.58091
10673.08972 205.2535894 10673.08972
nternal Revenue Service audit. The
ited for 20 selected IRS districrts.
291 31,504 29,199 33,072 30,859 32,566

djusted gross income reported.


ted.

Column D
Linear (Column D)

40000
x Y
Days
Distance
1 8 SUMMARY OUTPUT
3 5
4 8 Regression Statistics
6 7 Multiple R 0.84312146944
8 6 R Square 0.71085381222
10 3 Adjusted R Square 0.67471053875
12 5 Standard Error 1.28941482065
14 2 Observations 10
14 4
18 2 ANOVA
df
Regression 1
Residual 8
Total 9

Coefficients
Intercept 8.09782608696
X Variable 1 -0.34420289855
SS MS F Significance F
32.69928 32.69928 19.66767 0.002182935671
13.30072 1.662591
46

Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0%
Upper 95.0%
0.808822 10.01187 8.41E-06 6.232678893618 9.962973 6.232679 9.962973
0.077614 -4.434824 0.002183 -0.52318030005 -0.165225 -0.52318 -0.165225

You might also like