Business statistics
Hi there,
I need help with my assignment for Business statistics. It is to do with regression analysis and hypothesis testing. A good knowledge of Excel is required for this task.
I would like to know how to work out these questions too.
Data on Various Household Financial Variables for Randomly Selected Households
Household Location Family_Size Ownership Income Utilities Debt Monthly_Payment
1 2 2 1 96709 252 5692 1585
2 2 6 0 77470 216 4267 1314
3 4 3 0 65746 207 2903 383
4 1 1 1 56610 249 3896 1002
5 3 3 0 59185 217 3011 743
6 1 4 0 56441 208 3718 991
7 1 1 1 47969 243 5907 849
8 1 1 1 55487 242 2783 752
9 2 3 1 59947 256 6275 1498
10 1 6 0 68808 222 4845 991
11 1 1 1 53113 251 5267 1163
12 4 3 0 48319 209 2256 619
13 1 1 1 48064 250 3918 1434
14 4 2 0 50309 190 879 997
15 4 4 1 25853 254 4606 948
16 1 3 1 49728 272 5427 1378
17 3 2 0 79123 201 3482 902
18 1 4 1 46265 256 6658 1626
19 1 2 1 49809 253 4234 834
20 4 4 1 43640 259 5393 729
21 1 1 1 57395 248 4456 1473
22 3 1 1 52847 245 3999 1038
23 2 2 1 59486 251 2598 933
24 2 4 1 109665 255 6261 903
25 2 1 1 63825 250 5402 1346
26 1 3 0 124842 202 3911 574
27 2 2 1 88270 257 5889 1407
28 4 4 1 33646 264 4275 606
29 2 3 1 112446 254 5737 699
30 1 3 1 56067 264 5538 988
31 4 3 0 31042 214 1852 746
32 1 3 1 84272 264 7020 1510
33 2 4 1 126416 261 7212 1091
34 4 2 0 44395 211 1796 402
35 3 5 0 51533 225 3452 696
36 2 3 1 93507 249 6428 1548
37 1 4 1 75551 258 4469 919
38 2 2 1 58666 245 5516 1669
39 1 3 1 77550 256 5239 1049
40 4 3 1 34760 259 3731 493
41 3 7 1 65830 286 7228 722
42 1 3 1 56375 261 3441 881
43 2 5 1 101743 264 6067 1424
44 1 1 1 29805 253 5462 1161
45 4 1 0 35289 192 1734 457
46 2 5 1 116010 277 6153 1458
47 3 2 0 53739 199 3763 977
48 2 2 1 68806 251 4930 1453
49 2 1 1 66770 257 6025 1658
50 2 1 1 59321 247 5314 1664
51 2 4 1 80712 260 5179 1062
52 2 2 1 93156 253 5768 1660
53 3 5 0 65291 215 2767 699
54 3 5 1 58389 279 5311 573
55 1 3 0 107937 207 3934 748
56 1 2 1 54362 244 4648 1318
57 4 5 0 37741 223 3030 572
58 3 2 1 32848 254 4113 720
59 4 2 0 27791 206 1428 669
60 2 3 1 147155 260 6434 1272
61 3 2 1 75351 247 4428 1084
62 2 3 1 105213 257 6675 1476
63 1 2 0 57571 195 1821 1033
64 1 1 1 43847 244 4264 847
65 3 1 1 39862 250 4809 1472
66 4 2 0 44437 201 3705 520
67 4 2 0 46295 195 3661 501
68 2 3 1 104286 259 7043 1292
69 1 5 0 81276 221 1739 490
70 3 4 1 70304 260 5895 1435
71 2 2 1 66346 247 6584 1914
72 1 4 1 117620 249 6165 1123
73 3 2 1 89772 259 7177 1293
74 4 3 0 63834 212 3949 776
75 2 2 1 151964 246 6309 875
76 2 4 1 101510 254 6956 1534
77 2 4 1 97868 261 7363 1283
78 1 3 1 102142 272 4848 1138
79 3 2 1 33105 249 3634 976
80 1 4 1 62697 251 5235 1302
81 2 1 1 26910 244 3752 1028
82 3 5 1 64085 278 5430 640
83 2 2 1 110645 254 5429 989
84 1 2 1 96192 245 6172 1751
85 3 5 1 48304 265 5131 752
86 4 3 0 68714 207 3187 428
87 1 5 0 118372 218 2615 888
88 1 3 1 93026 254 7415 1469
89 2 2 1 112609 255 4752 943
90 4 3 0 59901 202 983 577
91 4 1 1 27377 254 3233 815
92 2 2 0 78960 199 3285 1134
93 2 3 1 114934 259 4497 1278
94 3 3 1 64753 265 5736 1014
95 2 2 1 64154 259 5048 1088
96 2 2 1 130387 241 6145 1137
97 4 6 1 41335 278 6188 959
98 2 4 1 118377 265 7345 831
99 3 3 0 76648 205 3144 578
100 2 6 1 75512 275 5991 1040
Car demand
Year Sales RP RDI IR
1993 7115274 108 1668 7.9
1994 8676408 112 1728 5.7
1995 9321305 111 1797 5.3
1996 9618508 111 1916 8
1997 7448339 118 1897 10.8
1998 7049843 128 1932 7.9
1999 8606856 136 2001 6.8
2000 9104932 143 2067 6.8
2001 9304247 154 2167 9.11
2002 8316018 166 2212 12.7
2003 6578359 179 2214 15.3
2004 6206688 190 2249 18.9
2005 5756614 198 2262 14.9
2006 6795226 203 2331 10.8
2007 7951786 209 2470 12
2008 8204694 215 2542 9.9
2009 8222475 224 2645 8.3
2010 7080889 233 2676 8.2
ETF2121/ETF5912 Data Analysis in Business
1
ASSIGNMENT 2
Due by Monday 5pm of Week 12
? This assignment comprises 15% of the assessment for ETF2121 and 15% of the
assessment for ETF5912, and it consists of questions on Statistical Inference and
Regression Analysis. The data files for this assignment must be downloaded from
the “Assignments” tab of the ETF2121/ETF5912 Moodle site.
? You must submit a printed “hard copy” of your written work (with an Assignment
Cover Sheet - from the “Assignments” tab on Moodle) by 5pm on the due date.
Submit it to your tutor in your tutorial (or to your tutor’s mailbox, 5th floor H Block).
ENSURE that you
* submit a printed “hard copy” of your assignment to your tutor.
* obtain a receipt for your submitted assignment by uploading it to Moodle.
This receipt will prove that you submitted your work, and that it was on
time. (Go to the “Assignments” tab and click on the “Assignment 2” link to
upload.)
? There is a penalty of 10% of the marks for each day the assignment is overdue.
Extensions beyond the due date will only be allowed in special circumstances.
Contact your lecturer.
? Do not submit your assignment in a folder – stapled pages are easier for the
marker. Save trees! Double-sided printing is encouraged.
? Keep your marked assignment work until after publication of final results for the unit.
? The assignment should be no more than 15 pages. PAGES EXCEEDING THE
PAGE LIMIT WILL NOT BE MARKED.
? Assignments without signed cover sheets WILL NOT BE ACCEPTED.
? For each question
o Use the relevant Excel functions and procedures.
o You may insert text boxes at relevant points in the spreadsheet for any
comments and explanations.
o In point form, give details on how the procedure was implemented. You
may use a text box within the spreadsheet for this or type your details in a
WORD document and insert them at the relevant point among your print
outs. Alternatively, you may cut and paste some of the data and your
output and insert it into the WORD document.
o Do not print out whole spreadsheets, but just sections showing relevant
data and output.
? Some Excel help is available via the “EXCEL” tab on Moodle.
ETF2121/ETF5912 Data Analysis in Business
2
Question 1
Use the relevant Excel functions and procedures to generate your output.
Consider a random sample of 100 households from a middle-class neighbourhood that
was the recent focus of an economic development study conducted by the local
government. Specifically, for each of the 100 households in the sample that is in one of
four local government locations, information was gathered on the gross annual income of
the household (in $) and on each of several other variables. The data are given in the
EXCEL file A2_1. Economic researchers would like to test for existence of a significance
difference between the mean annual income levels of householders in every pair locations
(that is, first and second; first and third; first and fourth; second and third; second and
fourth; third and fourth).
a) Before conducting any hypothesis test on the difference between the various pairs
of mean income levels, perform a test for equal population variances in income
for each pair of locations. Comment on your findings.
b) Given your conclusions in Part (a), perform a test for the existence of a difference
in mean annual income levels in each pair of locations. Comment on your
findings.
Question 2
Use the relevant Excel functions and procedures to generate your output.
Continuing on from Question 1, economic researchers want to understand the relationship
between the size of monthly mortgage or rent payment for household in this particular
neighbourhood and the following set of variables: family size, location of the household
with the neighbourhood, an indication whether those surveyed own or rent their home;
gross annual household income (in $); average monthly expenditure on utilities (in $) and
the total indebtedness (in $ and excluding the value of monthly mortgage or rent
payment) of the household. The data are in the EXCEL file A2_1.
a) Fit a regression model to predict the size of the monthly mortgage or rent
payments using all the explanatory variables. Explain whether multicollinearity is
a problem.
b) Fit the best regression model to predict the size of the monthly mortgage or rent
payments. Explain why this is the best regression model.
c) Comment on the goodness of fit of the model from Part (b).
d) Interpret each of the regression coefficients for the fitted model in Part (b).
e) For the fitted model in Part (b), are there any possible outliers? If there are, take
the necessary action.
f) For the fitted model in Part (b), after taking Part (e) into account, check for
violations of the regression assumptions.
ETF2121/ETF5912 Data Analysis in Business
3
Question 3
Use the relevant Excel functions and procedures to generate your output.
The EXCEL file A2_3 contains the following information for the years 1993 to 2010:
? Domestic Car sales (in thousands)
? Real price index for new car prices (where 1987 = 100 is the base index)
? Real disposable income (in 1992 dollars)
? Interest rate
a) Fit a multiple linear regression model to predict domestic car sales. Analyse the
fitted model.
b) Fit a multiplicative model to predict domestic car sales. Interpret the coefficients.
c) Compare the two estimated models from Parts (a) and (b). Explain which is the
better model.
d) During a year in which the real price index is 250, the interest rate is 12% and the
real disposable income per person is $3,500, and using the multiplicative model,
there is a 5% chance that car sales will be less than what particular value. (Hint:
Use the standard error of prediction and the fact that the errors are approximately
normally distributed)