Use the dependent variable (labeled Y) and the independent variables

Subject: Mathematics    / Statistics
Question

Use the dependent variable (labeled Y) and the independent variables (labeled X1, X2, and X3) in the data file. Use Excel to perform the regression and correlation analysis to answer the following.

Generate a scatterplot for the specified dependent variable (Y) and the X1 independent variable, including the graph of the “best fit” line. Interpret.
Determine the equation of the “best fit” line, which describes the relationship between the dependent variable and the selected independent variable.
Determine the coefficient of correlation. Interpret.
Determine the coefficient of determination. Interpret.
Test the utility of this regression model. Interpret results, including the p-value.Based on the findings in Steps 1-5, analyze the ability of the independent variable to predict the designated dependent variable.
Compute the confidence interval for ?1 (the population slope) using a 95% confidence level. Interpret this interval.Using an interval, estimate the average for the dependent variable for a selected value of the independent variable. Interpret this interval.
Using an interval, predict the particular value of the dependent variable for a selected value of the independent variable. Interpret this interval.
What can be said about the value of the dependent variable for values of the independent variable that are outside the range of the sample values? Explain.

In an attempt to improve the model, use a multiple regression model to predict the dependent variable, Y, based on all of the independent variables, X1, X2, and X3.

Using Excel, run the multiple regression analysis using the designated dependent and three independent variables. State the equation for this multiple regression model.
Perform the Global Test for Utility (F-Test). Explain the conclusion.
Perform the t-test on each independent variable. Explain the conclusions and clearly state how the analysis should proceed. In particular, which independent variables should be kept and which should be discarded. If any independent variables are to be discarded, re-run the multiple regression, including only the significant independent variables, and summarize results with discussion of analysis.
Is this multiple regression model better than the linear model generated in parts 1-10? Explain.
All DeVry University policies are in effect, including the plagiarism policy.
Part C report is due by the end of Week 7.
Part C is worth 100 total points. See grading rubric below.

Summarize your results from Steps 1–14 in a three-page report. The report should explain and interpret the results in ways that are understandable to someone who does not know statistics.

Submission: The summary report and all of the work done in 1–14 (Excel output and interpretations) as an appendix
Format for report:

Summary Report
Points 1–14 should be addressed with appropriate output, graphs, and interpretations. Be sure to number each point 1–14.

Project Part C Instructions:

Use the dependent variable (labeled Y) and the independent variables (labeled X1, X2, and X3) in the data file. Use Excel to perform the regression and correlation analysis to answer the following.

Generate a scatterplot for the specified dependent variable (Y) and the X1 independent variable, including the graph of the “best fit” line. Interpret.
Determine the equation of the “best fit” line, which describes the relationship between the dependent variable and the selected independent variable.
Determine the coefficient of correlation. Interpret.
Determine the coefficient of determination. Interpret.
Test the utility of this regression model (use a two tail test with alpha ?.05). Interpret results, including the p-value.
Based on the findings in Steps 1-5, analyze the ability of the independent variable to predict the designated dependent variable.
Compute the confidence interval for ?1 (the population slope) using a 95% confidence level. Interpret this interval.
Using an interval, estimate the average for the dependent variable for the selected value of the independent variable. Interpret this interval.
Using an interval, predict the particular value of the dependent variable for the selected value of the independent variable . Interpret this interval.
What can be said about the value of the dependent variable for values of the independent variable that are outside the range of the sample values? Explain.

In an attempt to improve the model, use the multiple regression model provided in the end of this instruction to predict the dependent variable, Y, based on all of the independent variables, X1, X2, and X3.

Use the provided Minitab multiple regression analysis using the designated dependent and three independent variables. State the equation for this multiple regression model.
Perform the Global Test for Utility (F-Test). Explain the conclusion.
Perform the t-test on each independent variable (use a test with alpha ?.05). Explain the conclusions and clearly state how the analysis should proceed. In particular, which independent variables should be kept and which should be discarded. If any independent variables are to be discarded, re-run the multiple regression, including only the significant independent variables, and summarize results with discussion of analysis.
Is this multiple regression model better than the linear model generated in parts 1-10? Explain.
All DeVry University policies are in effect, including the plagiarism policy.
Part C report is due by Wednesday, February 22nd.
Part C is worth 100 total points.

Summarize your results from Steps 1–14 in a three-page report. The report should explain and interpret the results in ways that are understandable to someone who does not know statistics.

Submission: The summary report and all of the work done in 1–14 (Excel output and interpretations) as an appendix

MiniTab Output:

Regression Analysis: Sales (Y) versus Calls (X1), Time (X2), Years (X3)

The regression equation is

Sales (Y) = 19.9 + 0.172 Calls (X1) – 0.131 Time (X2) – 0.256 Years (X3)

Predictor Coef SE Coef T P

Constant 19.936 4.426 4.50 0.000

Calls (X1) 0.17186 0.02017 8.52 0.000

Time (X2) -0.1314 0.1306 -1.01 0.317

Years (X3) -0.2564 0.2927 -0.88 0.383

S = 3.46298 R-Sq = 48.1% R-Sq(adj) = 46.5%

Analysis of Variance

Source DF SS MS F P

Regression 3 1066.18 355.39 29.64 0.000

Residual Error 96 1151.26 11.99

Total 99 2217.44