Statistics -Data Analysis #5 ST314

Subject: Mathematics    / Statistics

Total: 40 points
Download, complete and upload as PDF or Word Document in Canvas by the due date.
No other format will be accepted. Typing or entering answers by hand is accepted as long as solutions are
neatly given and document is uploaded as PDF. Give the solutions in the space provided.
Material from Week 8 and 9 Course Materials, Chapters 12 and 13 in the text and R Code provided in
instructions on Canvas are covered on this analysis. 1 Jager Name__________________________ Part I (22 points) This analysis is based on a famous dataset from Sir Francis Galton (1888). Galton, among
many other professions, was a statistician around the turn of the century. He was a pioneer in the field of
bivariate regression. This dataset helped Galton formalize the idea of correlation between two quantitative
variables. Specifically, he was looking at the relationship between parent’s height and their children’s height.
Below is a multiple regression analysis that looks at how the parent’s height and child’s gender help predict the
child’s adult height.
midparentHeight = combination of parent height’s, (father + 1.08*mother)/2
gender = male = 1, female = 0
childHeight = child’s adult height in inches
Use this information to answer the following questions. Q1. (3 point)
Describe the relationship among the variables according to the
scatterplot. Be thorough and include context. Are there any specific features you notice? Q2. Use the ANOVA table to answer the following:
a. (2 points) Calculate R2 . Show work. How much variability in the response do the explanatory
variables help explain?
b. Conduct a model utility test F test. Use a Significance level = 0.05.
(2 points) State the Null and Alternative Hypotheses.
(0.5 points) State the F statistic along with the numerator and denominator degrees of freedom
and p-value.
(2.5 points) What can you conclude from the model utility F test? Make a statement in relation to
the p-value and significance level, and how strong the evidence is in favor of the alternative
hypothesis. Include context. 2 Jager Name__________________________ Q3. (2 point) Based on the residual plot. Are the conditions for inference met? Why or why not? Are there any
interesting features? Q4. Using the Regression model output
a. (2 points) From the software output state the estimated least squares regression model for:
?Y ?x , x =? 0+ ? 1 x1 + ? 2 x 2
b. (2 points) According, to the individual t tests. Which if any variables are significant predictors of the
response? State the test statistics and p-values for each variable.
c. (2 points) Interpret the coefficient of gender.
d. (2 points) Interpret the coefficient for midparentheight.
e. (2 points) Based on the least squares regression equation predict the height of a female child whose
parents had a midparentHeight of 70.
1 2 3 Jager Name__________________________ Part 2: (18 points) Using the student information dataset listed in the instructions for this analysis, choose two
quantitative variables to perform a bivariate regression analysis. To understand what each variable represents, a
legend is provided in the data analysis instructions. I recommend you follow the steps that are listed in the SLR
steps handout, the code instructions provided in the course notes, and/or the code examples I have provided in
week 8 module. You must use R to complete this analysis. Before you get started realize that real data can be
ugly… don’t be surprised if your analysis doesn’t turn out perfect.
Q1. Paste your R code at the end of your analysis. Parts will not be graded if evidence of work is not given.
Code must be in working order.
Q2. Describe the Relationship between the two variables.
4 Jager Name__________________________ a. (1 point) Which two variables did you choose? Which variable do you consider to be the explanatory
variable? Which is the response? (Note: It is possible that it does not matter in practice which variable is
which, however, it is important to make a distinction going forward in this analysis.)
b. (4 points) Make a scatterplot of the two variables, you must include a title and/or axes labels. Paste your
plot and describe the relationship between the two variables.
c. (1 point) Calculate the correlation coefficient r. Describe in context the strength of the relationship based
on your value. 5 Jager Name__________________________ Q3. (2 points) Using R, calculate the least squares regression line. Provide the R output for the model summary.
State the least squares regression line (model). Q4. (3 points) From the output, is there evidence your explanatory variable is significant predictor of your
response? Use a significance level of 0.05.
a. State the null and alternative hypothesis for the individual t test on the slope.
b. State the test statistic and p-value from the output.
c. Make a conclusion. Include context, a statement in terms of the alternative and the null should be
rejected based on the level of significance. Q4. (3 points) Interpret the slope of your model; include a 95% confidence interval for
does the slope and its confidence interval tell us? Show work or output. ? 1 . In context, what 6 Jager Name__________________________ Q5. (3 points) Plot the residuals from the model. Are conditions satisfied? Briefly describe the plot. Include the
conditions that need to be met. If conditions do not seem to be met (its okay!) please state why and which are
violated. Q6. (1 point) Do you think your model will do a good job at predicting values of the response from a value
from the explanatory variable? Why or why not? 7