STAT 301 Lab 11: Multiple Regression
Subject: Mathematics / Statistics
Open the lab dataset on Blackboard Learn or previously saved in SPSS.
This dataset contains the data for 96 subjects. The following are the variables of interest: Sleep_Duration = hours of sleep on a school night OtherActivitiesHours = hours spent on other activities per week MessagingDuration = total hours spent messaging (before lights out + after lights out) GPA= grade point average We are interested in determining whether the subjects’ reported hours of sleep (Sleep_Duration) can be
predicted from the other variables above. To do this, we will use the data set to determine the best
multiple regression model, check all regression assumptions, and report the results. Use a 5%
significance level for this entire problem set.
1. (3 points) In SPSS, create a correlation table and find the correlation of Sleep_Duration with
each of the other 3 variables. Organize the variables (name, r) in the chart below from the
strongest to the weakest correlation with Sleep_Duration. Put a star (*) above all the variables
which have significant correlations with Sleep_Duration.
strongest weakest Note: Although there are multiple ways to choose the best multiple regression model, we will start
with the full model and delete variables one at a time. Make sure to print out your ANOVA tables,
model summary and coefficients tables for each model in questions 2-3.
2. (4 points) Perform a regression of Sleep_Duration using the 3 explanatory variables.
a Complete the table below regarding the regression line:
R2 Model Standard error ANOVA table F- test statistic 1 P-value for F-test b Complete the table below regarding the coefficients information:
Explanatory variables in model Is the variable significant? Answer yes or
no and list the correspondent p-value. OtherActivitiesHours
3. (4 points) Now drop the least significant variable and re-run the regression.
a) Complete the tables below:
R2 Model Standard error ANOVA table F- test statistic Explanatory variables in model P-value for F-test Is the variable significant? Answer yes or
no and list the correspondent p-value. b) Was dropping that variable a good change? Explain why or why not. 4. (2 points) Based on your answer to question 3b, state the regression equation for the ‘better’
model. Do not forget the proper notation. 5. (2 points) Using the regression equation from question 4, predict the sleep duration for the 12 th
grade male who reported a GPA of 1.073. In the dataset, find his explanatory variables’ values
and use them with your regression equation to find this number. Show your work and round to
3 decimals. (Hint: see the information for the subject with SubjectID=16 in the dataset.) 2 6. (2 points) What is the residual for the prediction in the above problem? Show your work. 7. (3 points) Even if the model used in the previous questions was a ‘better’ model, does the
model still have some variables that are not significant? If so, which ones? Note: Remember that an analysis is not complete unless all assumptions are checked. Although not
performed in this lab, be sure to always check the Normal probability plot and residual plots for
each explanatory variable in order to determine whether the analysis is valid.