Regression - Correlation Worksheet - Student
Regression - Correlation Worksheet - Student
1
Salesperson
1
2
3
4
5
6
7
8
9
10
Experience in Years Sales
1 80
3 97
4 92
4 102
6 103
8 111
10 119
10 123
11 117
13 136
A sales manager collected the following data on annual sales for new custom
salespersons.
Salesperson 1 2 3 4 5 6 7 8 9 10
Years 1 3 4 4 6 8 10 10 11 13
Sales 80 97 92 102 103 111 119 123 117 136
a.Develop a scatter diagram for these data with years of experience as the ind
Y= 4x +80
c.Use the estimated regression equation to predict annual sales for a salesper
sales.
reflect a strong or weak relationship between years of experience of the sales person
ed regression equation.
e for a sample of 10
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.843121469436239
R Square 0.710853812224323
Adjusted R 0.674710538752363
Standard E 3.15840837737315
Observatio 10
ANOVA
df SS MS F Significance F
Regression 1 196.195652173913 196.1956521739 19.66767 0.002182936
Residual 8 79.8043478260869 9.975543478261
Total 9 276
Distance 1 3 4 6 8 10 12 14 14 18
Days 8 5 8 7 6 3 5 2 4 2
Y= 8.10 -0.34X
work is 5 miles.
12 14 14 18
42
iagram for these data with distance (miles) between home and work as the independent variable
ed regression equation that can be used to predict expected no: of days employees are absent per
egression equation predict expected no: of days employees are absent per year , if the distance (m
tion in dep var Y( no: of days employees are absent per year ) is explained by
f sample correlation coefficient? does it reflect a strong or weak relationship between the variabl
, H0 is rejected, we can conclude there is significant relationshiip between distance (miles) bet
, H0 is rejected, we can conclude there is significant relationshiip between distance (miles) bet
y = ?????
1
en distance (miles) between home and work and no: of days employees
distance (miles) between home and work and no: of days employees
e independent variable.
n by employees by 34 %.
x Y
Distance Days
1 8
3 5
4 8
6 7
8 6
10 3
12 5
14 2
14 4
18 2
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.84
R Square 0.71
Adjusted R Square 0.67
Standard Error 1.29
Observations 10
ANOVA
df
Regression 1
Residual 8
Total 9
Coefficients
Intercept 8.098
Distance -0.344
Cor Analysi
Distance
Distance 1
Days -0.843
A sociologist was hi
for the employees . a
Distance 1 3 4 6 8 10
Days 8 5 8 7 6 3 5 2
a.Develop a scatter di
b.Develop an estimat
Y= 8.10 -0.34X
Y= 8.10 -0.34* 5= 6
R Square= 0.71 ,
g.Provide interpretati
SS MS F
32.6992754 32.699 19.668
13.3007246 1.663
46
Days
1
A sociologist was hired by a large city hospital to investigate the relationship between
for the employees . a sample of 10 employees was chosen, and the followoing data wer
Distance 1 3 4 6 8 10 12 14 14 18
Days 8 5 8 7 6 3 5 2 4 2
a.Develop a scatter diagram for these data with distance (miles) between home and wor
b.Develop an estimated regression equation that can be used to predict expected no: of
Y= 8.10 -0.34X
c.Use the estimated regression equation predict expected no: of days employees are abs
71% of the variation in dep var Y( no: of days employees are absent p
e.What is the value of sample correlation coefficient? does it reflect a strong or weak re
f.Test the significant relationship at the 0.05 level of significance. what is your conclusi
Since p = 0.00 < 0.05 , H0 is rejected, we can conclude there is significant relationshiip
Slope ( b)= with one unit increase in X, what will be change in Y. Days
b= - 0.34,9 Increase in X= distance by 1mile with decrease the expected leave
8
0
0 2 4 6 8 10 12
Axis Title
Significance F
0.002
Lower 95% Upper 95% Lower 95.0%
6.233 9.963 6.233
-0.523 -0.165 -0.523
he relationship between the number of unauthorized days that employees are absent per year and
the followoing data were collected.
days employees are absent per year , if the distance (miles) between home and work is 5 miles.
ployees are absent per year ) is explained by ind var X ( distance (miles) betw
significant relationshiip between distance (miles) between home and work and no: of days emp
regression equation.
Y.
regression equation.
Y. Days
se the expected leaves taken by employees by 34 %.
x + 8.09782608695652
Days
Linear (Days )
10 12 14 16 18 20
Axis Title
Upper 95.0%
9.963
-0.165
e absent per year and the distance (miles) between home and work
nd work is 5 miles.
140
120
80 Sales
Linear (Sales)
60
40
20
0
0 5 10 15 20 25
Axis Title
h no: of weeks as the
ignificant .
pt of the estimated
Sales
Linear (Sales)
25
ind ( X) Dep ( Y)
3Age of Bus (Years)
Maintenance Cost ($)
1 350
2 370
2 480
2 520
2 590
3 550
4 750
4 800
5 790
5 950
The regional transit authority for a major metropolitan area wants t
buses resulted in the following data:
a.Develop a scatter diagram for these data with age of bus as the in
Y= 131.67x+220
c.Use the estimated regression equation predict expected mainetan
r= 0.934, Karl perason cor coef is +ve, therefore we can say wit
cost.
slope- With increase of one unit ( one year) in x , i.e age of bus i
Since p< 0.05 (alpha given was 5 %), therefore H0 is rejected,so
cost.
slope- With increase of one unit ( one year) in x , i.e age of bus i
tting into the data provided. ind var x, is able to explain 87% of the variation in dep var Y.
efore we can say with the increase in age of bus in years there will be incraese in maintenan
ore H0 is rejected,so we can conclude that there is a significant relationship between the tw
in x , i.e age of bus in years, there will be increase of 131.67 dollars in maintenance cost.
ore H0 is rejected,so we can conclude that there is a significant relationship between the tw
in x , i.e age of bus in years, there will be increase of 131.67 dollars in maintenance cost.
r = underroot( R sqaur
en the age of the bus and the annual mainetanance cost. A sample of 10
nship between the two var namely age of bus and maintenance
maintenance cost.
nship between the two var namely age of bus and maintenance
maintenance cost.
ANOVA
df SS MS F
Regression 1 3249.72075172049 3249.7207517205 57.4181924471
Residual 8 452.779248279513 56.597406034939
Total 9 3702.5
90 60 105 65 90 80 55 75
75 65 90 50 90 80 45 65
estimated regression equation showing how total points earned is related to hours spent studying
ificance of the model with ( alpha= 0.05)
otal points earned by Sidhhesh. He spent 95 hrs studying. Y ( Score)= 84.5= approx 85 Marks
catter diagram for these data with hours spent in studying as the independent variable.
Points
100
Lab39: 90
f(x) = 0.829539438856538 x + 5.84700899947062
percentage change in 80 R² = 0.877709858668599
Independent variable 70
will impact depnadent
variable 60
Axis Title
50 Points
40 Linear (Points
Significance F 30
6.4395946E-05 20
10
0
20 30 40 50 60 70 80 90 100 110
Axis Title
Lower 95% Upper 95% Lower 95.0% Upper 95.0%
-12.535834954 24.22985295248 -12.535834954 24.22985295248
0.57709119028 1.081987687431 0.57709119028 1.081987687431
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9368617073
R Square 0.8777098587
Adjusted R Squar 0.862423591
Standard Error 7.5231247521
Observations 10
ANOVA
df SS
Regression 1 3249.720752
Residual 8 452.7792483
Total 9 3702.5
5= approx 85 Marks
nt variable.
8 x + 5.84700899947062
Points
Linear (Points)
100 110
MS F Significance F
3249.720752 57.41819245 6.43959E-05
56.59740603
t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
0.733467953 0.484209844 -12.535835 24.22985295 -12.535835 24.22985295
7.577479294 6.43959E-05 0.57709119 1.081987687 0.57709119 1.081987687
5
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.457577120583176
R Square 0.209376821281191
Adjusted R Square 0.165453311352368
Standard Error 2493.73811623429
Observations 20
ANOVA
df
Regression 1
Residual 18
Total 19
Coefficients
Intercept 29070.2375249501
Percent audited 5439.17165668662
Axis Title
0.6
0.5
0.5 0.4
0.5 0.2
0
28000
SS MS F Significance F
29643757.487525 29643757.487525 4.766850865 0.0424967266173022
111937136.262475 6218729.79235972
141580893.75
Percent Audited
1
aninghouse at Syracuse University reported data showing the odds of an Internal Revenue Servic
djusted gross income reported and the percent of the returns that were audited for 20 selected IRS
4,886 32,512 34,531 35,995 37,799 33,876 30,513 30,174 30,060 37,153 34,918 33,291 31,504 29,199 33,072
9 0.9 0.9 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.5 0.5 0.5
uation that could be used to predict the percent audited given the averge adjusted gross income r
ermine whether the adjusted gross income and the percent audited are related.
Chart Title
1.4
1.2
Colu
0.6 Line
0.4
0.2
0
28000 30000 32000 34000 36000 38000 40000
Axis Title
Upper 95% Lower 95.0% Upper 95.0%
33569.58091 24570.89414 33569.58091
10673.08972 205.2535894 10673.08972
nternal Revenue Service audit. The
ited for 20 selected IRS districrts.
291 31,504 29,199 33,072 30,859 32,566
Column D
Linear (Column D)
40000
x Y
Days
Distance
1 8 SUMMARY OUTPUT
3 5
4 8 Regression Statistics
6 7 Multiple R 0.84312146944
8 6 R Square 0.71085381222
10 3 Adjusted R Square 0.67471053875
12 5 Standard Error 1.28941482065
14 2 Observations 10
14 4
18 2 ANOVA
df
Regression 1
Residual 8
Total 9
Coefficients
Intercept 8.09782608696
X Variable 1 -0.34420289855
SS MS F Significance F
32.69928 32.69928 19.66767 0.002182935671
13.30072 1.662591
46
Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0%
Upper 95.0%
0.808822 10.01187 8.41E-06 6.232678893618 9.962973 6.232679 9.962973
0.077614 -4.434824 0.002183 -0.52318030005 -0.165225 -0.52318 -0.165225