Sta 250 2022 Session 2
Sta 250 2022 Session 2
INSTRUCTIONS TO CANDIDATES
2. Answer ALL questions in the Answer Booklet. Start each answer on a new page.
3. Do not bring any material into the examination room unless permission is given by the invigilator.
4. Please check to make sure that this examination pack consists of:
QUESTION 1
Anna and her team conducted a study to know how the amount of life insurance depends on
the income of persons. They collected the information on twelve persons. The following table
lists the annual incomes (in RM ‘000) and amounts of life insurance policies for these twelve
persons (in RM ‘000).
Annual income 35 84 36 48 71 86 42 47 79 66 34 80
Life insurance 75 500 100 150 350 550 120 180 380 300 70 400
a) Plot a scatter diagram for the above data. What conclusion can you make from the plot?
(5 marks)
b) Compute the Pearson correlation coefficient and interpret the value obtained.
(5 marks)
QUESTION 2
a) In a long jump sport, it is believed that the distance jump depends on the height of the
players. To prove this statement, a long jump coach selects a sample of ten players and their
height were measured. The information gathered is shown in the following table.
Fit a simple linear regression equation for the above data using the least square method.
(5 Marks)
b) A statistics diagnostic test is given to all new students taking a Statistic Course at a college.
A lecturer is interested to study the relationship between the statistics diagnostic test score
and the mark scored in statistics subject by the student in their final examination. The data are
collected from thirty students and the following output from SPSS was obtained.
ii) Estimate the coefficient of determination and interpret the value obtained.
(3 Marks)
iv) Does the statistics diagnostic test score affect the final grades? Test the variable at
a a 5% significance level.
(4 Marks)
QUESTION 3
A researcher conducted a study to determine whether the weight loss (Y, in kilogram) of a
particular compound depends on the amount of time (X, in hour) the compound has been
exposed to air. A simple linear regression model has been performed, and the output are given
below.
a) Based on the graph provided, justify whether the independence and normality
assumptions of error terms are violated or not.
(4 marks)
b) Plot the residuals against the predicted values. Hence, draw your conclusion.
(4 marks)
c) Perform the lack of fit test to determine the linearity of the regression function at a 5% level
of significance.
(12 marks)
QUESTION 4
The researcher conducted a study to predict the hours per week a husband spends on
housework. The information of twelve families included:
A multiple regression model was regressed between the variables and the results of the
analysis are given in the following tables.
Coefficientsa
a. Dependent Variable: Y
f) Using the p-value approach, identify whether the no of children in the family and
the husband’s years of education are significant factors in the model. Use 5% significance
level.
(6 marks)
g) Does this model have a multicollinearity problem? Give a reason to support your answer.
Suggest one approach to remedy the multicollinearity problem.
(3 marks)
QUESTION 5
By performing forward selection method, the first variable enters the model at stage one of the
procedure is X2. Continue the procedure to obtain the most appropriate final model. Use 5%
significance level.
(6 marks)
QUESTION 6
A health researcher wants to predict VOmax, an indicator of fitness and health. The researcher
recruited 35 participants to perform VOmax test. The researcher’s goal is to predict VOmax
based on these four factors: age, weight, heart rate and gender (male=1). The spss output is
given below.
Standardized
Unstandardized Coefficients
Model Coefficients t
B Std. Error Beta
a) It was claimed that male participants have more VOmax score compare to female. Prove
this claim. Use 5% significance level.
(4 marks)
b) The researcher wants to include the factor ‘type of fitness’ into the model. ‘Type of fitness’
categorize as running, swimming or cycling. Describe how the researcher would
incorporate the new variable in the regression model.
(2 marks)
QUESTION 7
Puan Nora, a real estate negotiator, believes that the price of a house would depend on the
size of the house (X1 – in square feet) and the tax imposed by the government (X2 – in
RM). She recorded the prices (in RM ‘00) of 117 houses sold by his agency for the last five
years. A second-order model with interaction was then obtained as shown below.
ANOVAa
Model Sum of Squares df Mean Square F Sig.
Regression 13187947.582 5 2637589.516 81.303 .000b
1 Residual 3600999.204 111 32441.434
Total 16788946.786 116
a. Dependent Variable: Y
b. Predictors: (Constant), X1X2, X12, X22, X1, X2
coefficientsa
Standardized
Unstandardized Coefficients
Model Coefficients t Sig.
B Std. Error Beta
(Constant) -197.718 159.256 -1.242 .217
X1 1.072 .183 1.476 5.855 .000
X2 -.406 .216 -.877 -1.884 .062
1
X12 .001 .000 -1.286 -4.107 .000
X22 -2.078E-5 .000 -.377 -2.304 .083
X1X2 .011 .000 1.578 3.513 .001
a. Dependent Variable: Y
b) Do the quadratic terms for size of the house effect the house’s price? Use the p-value
approach at a 5% significance level.
(3 marks)
d) Write the final model based on the result obtained in b) and c).
(2 marks)