STA3064 Assignment 1
STA3064 Assignment 1
a.
i. Because we are using the gestation period to predict an animal’s life
expectancy, the gestation period needs to be the variable on the x-axis.
There is a positive regression that would indicate some significance.
b.
i. Y=8.02256 + 0.02194X + ε
c.
i. The line seems to fit but there are a few outliers of the 95% confidence
interval. If those are eliminated then the line is of good fit.
d.
i. Looking at these two plots the residuals seem to meet the conditions for a
regression model. The distributions are centered at zero, the error terms
are dispersed about the plot with no sort of concentration or pattern
among them, and the bottom plot shows what seems to be a normal
distribution. These can somewhat be difficult to see, so more data points
should be added.
2.
a.
i. As the temperature increases, there is a clear positive exponential
relationship with pressure. The higher the temperature the exponentially
greater the pressure.
b.
i. Y= -1956.25845 + 6.68555X + ε
c.
i. These plots show that the conditions are not met for the regression
equation. The top plot shows a clear pattern and is not randomly
dispersed and there is also slight curvature in the bottom plot. It shows
that there is no homoscedasticity.
3.
a.
i. The top plot is the scatter plot of completion percentage vs. passer rating
and the bottom is passing touchdowns vs. passer rating.
b. With visual inspection I believe that the indicator percent would have the stronger
fit. This is because there seems to be five data points that look to be in a direct
line around where the line of best fit would be. There also seems to be a
stronger concentration around this line.
c.
i. Y= -24.07583 + 1.80263X + ε
d.
i. Through these two plots it can be shown that the regression model in part
c is very adequate. Because the top plot has no pattern or concentration
the error terms can be inferred as independent of each other and
dispersed about 0. The bottom plot, although has slight curvature, is very
linear showing that the error terms are a normal distribution.
4.
a.
i. Y= 1.07632 + 0.73926X + ε
b.
i. The line seems to be a great fit. The data points are very concentrated
along the line without many straying too far away from it.
c.
i. These two plots show that the assumptions hold because the residuals in
the top plot are randomly distributed without a pattern and the bottom plot
has slight curvature but still shows that the residuals are normal enough.
d.
i. Comparing this graph to the line of best fit graph, it is clear that month ten
is one of the two outlying months. This also makes sense because the NY
price is lower than the NE price which is typically not the case. The other
outlier is month two as the New York price is much higher than the
predicted value.
e.
i. Now that the two outliers have been removed the regression line fit is
nearly perfect. There is clear evidence that the residual assumptions are
adequate and the data points almost all fall on the line.