week 3

week 3

I'm having a hard time with week 3. We are using data from week 1.
Attachment Preview:
ID	Sal	Compa	Mid	Age	EES	SER	G	Raise	Deg	Gen1	Gr										
1	58	1.017	57	34	85	8	0	5.7	0	M	E		The ongoing question that the weekly assignments will focus on is:  Are males and females paid the same for equal work (under the Equal Pay Act)?  								
2	27	0.870	31	52	80	7	0	3.9	0	M	B		Note: to simplfy the analysis, we will assume that jobs within each grade comprise equal work.								
3	34	1.096	31	30	75	5	1	3.6	1	F	B										
4	66	1.157	57	42	100	16	0	5.5	1	M	E		The column labels in the  table mean:								
5	47	0.979	48	36	90	16	0	5.7	1	M	D		ID – Employee sample number 			Sal – Salary in thousands     					
6	76	1.134	67	36	70	12	0	4.5	1	M	F		Age – Age in years			EES  – Appraisal rating (Employee evaluation score)					
7	41	1.025	40	32	100	8	1	5.7	1	F	C		SER – Years of service			G – Gender (0 = male, 1 = female)    					
8	23	1.000	23	32	90	9	1	5.8	1	F	A		Mid – salary grade midpoint    			Raise – percent of last raise					
9	77	1.149	67	49	100	10	0	4	1	M	F		Grade – job/pay grade			Deg (0= BS/BA 1 = MS)					
10	22	0.956	23	30	80	7	1	4.7	1	F	A		Gen1 (Male or Female)			Compa - salary divided by midpoint, a measure of salary that removes the impact of grade					
11	23	1.000	23	41	100	19	1	4.8	1	F	A										
12	60	1.052	57	52	95	22	0	4.5	0	M	E		This data should be treated as a sample of employees taken from a company that has about 1,000 								
13	42	1.050	40	30	100	2	1	4.7	0	F	C		employees using a random sampling approach.								
14	24	1.043	23	32	90	12	1	6	1	F	A										
15	24	1.043	23	32	80	8	1	4.9	1	F	A										
16	47	1.175	40	44	90	4	0	5.7	0	M	C		Mac Users: The homework in this course assumes students have Windows Excel, and								
17	69	1.210	57	27	55	3	1	3	1	F	E		can load the Analysis ToolPak into their version of Excel.								
18	36	1.161	31	31	80	11	1	5.6	0	F	B		The analysis tool pak has been removed from Excel for Windows, but a free third-party 								
19	24	1.043	23	32	85	1	0	4.6	1	M	A		tool that can be used (found on an answers Microsoft site) is:								
20	34	1.096	31	44	70	16	1	4.8	0	F	B		http://www.analystsoft.com/en/products/statplusmacle								
21	76	1.134	67	43	95	13	0	6.3	1	M	F		Like the Microsoft site, I make cannot guarantee the program, but do know that 								
22	57	1.187	48	48	65	6	1	3.8	1	F	D		Statplus is a respected statistical package.				You may use other approaches or tools				
23	23	1.000	23	36	65	6	1	3.3	0	F	A		as desired to complete the assignments.								
24	50	1.041	48	30	75	9	1	3.8	0	F	D										
25	24	1.043	23	41	70	4	0	4	0	M	A										
26	24	1.043	23	22	95	2	1	6.2	0	F	A										
27	40	1.000	40	35	80	7	0	3.9	1	M	C										
28	75	1.119	67	44	95	9	1	4.4	0	F	F										
29	72	1.074	67	52	95	5	0	5.4	0	M	F										
30	49	1.020	48	45	90	18	0	4.3	0	M	D										
31	24	1.043	23	29	60	4	1	3.9	1	F	A										
32	28	0.903	31	25	95	4	0	5.6	0	M	B										
33	64	1.122	57	35	90	9	0	5.5	1	M	E										
34	28	0.903	31	26	80	2	0	4.9	1	M	B										
35	24	1.043	23	23	90	4	1	5.3	0	F	A										
36	23	1.000	23	27	75	3	1	4.3	0	F	A										
37	22	0.956	23	22	95	2	1	6.2	0	F	A										
38	56	0.982	57	45	95	11	0	4.5	0	M	E										
39	35	1.129	31	27	90	6	1	5.5	0	F	B										
40	25	1.086	23	24	90	2	0	6.3	0	M	A										
41	43	1.075	40	25	80	5	0	4.3	0	M	C										
42	24	1.043	23	32	100	8	1	5.7	1	F	A										
43	77	1.149	67	42	95	20	1	5.5	0	F	F										
44	60	1.052	57	45	90	16	0	5.2	1	M	E										
45	55	1.145	48	36	95	8	1	5.2	1	F	D										
46	65	1.140	57	39	75	20	0	3.9	1	M	E										
47	62	1.087	57	37	95	5	0	5.5	1	M	E										
48	65	1.140	57	34	90	11	1	5.3	1	F	E										
49	60	1.052	57	41	95	21	0	6.6	0	M	E										
50	66	1.157	57	38	80	12	0	4.6	0	M	E										


Week 1.	Describing the data.																																				
																																					
																																					
1	Using the Excel Analysis ToolPak function descriptive statistics, generate and show the descriptive statistics for each appropriate variable in the sample data set.																																				
	a.  For which variables in the data set does this function not work correctly for?  Why?																																				
																																					
																																					
2	 Sort the data by Gen or Gen 1 (into males and females) and find the mean and standard deviation for each gender for the following variables:																																				
	sal, compa, age, sr and raise.			Use either the descriptive stats function or the Fx functions (average and stdev).																																	
																																					
3	What is the probability for a:																																				
	a.       Randomly selected person being a male in grade E?																																				
	b.      Randomly selected male being in grade E?																																				
	c.     Why are the results different?																																				
																																					
4	 Find:																																				
a.	 The z score for each male salary, based on only the male salaries.																																				
b.	The z score for each female salary, based on only the female salaries.																																				
c.	The z score for each female compa, based on only the female compa values.																																				
d.	The z score for each male compa, based on only the male compa values.																																				
e.	What do the distributions and spread suggest about male and female salaries?																																				
	Why might we want to use compa to measure salaries between males and females?																																				
																																					
5	Based on this sample, what conclusions can you make about the issue of male and female pay equality?  																																				
	Are all of the results consistent with your conclusion?  If not, why not?																																				
																																					
																																					
Week 2	Testing means with the t-test														
For questions 2 and 3 below, be sure to list the null and alternate hypothesis statements.  Use .05 for your significance level in making your decisions.															
For full credit, you need to also show the statistical outcomes - either the Excel test result or the calculations you performed.															
															
1	Below are 2 one-sample t-tests comparing male and female average salaries to the overall sample mean.  														
	Based on our sample, how do you interpret the results and what do these results suggest about the population means for male and female salaries?														
	Males				Females										
	Ho: Mean salary = 45				Ho: Mean salary = 45										
	Ha: Mean salary =/= 45				Ha: Mean salary =/= 45										
	Note when performing a one sample test with ANOVA, the second variable (Ho) is listed as the same value for every corresponding value in the data set.														
	t-Test: Two-Sample Assuming Unequal Variances				t-Test: Two-Sample Assuming Unequal Variances										
	Since the Ho variable has Var = 0, variances are unequal; this test defaults to 1 sample t in this situation														
		Male	Ho			Female	Ho								
	Mean	52	45		Mean	38	45								
	Variance	316	0		Variance	334.6666667	0								
	Observations	25	25		Observations	25	25								
	Hypothesized Mean Difference	0			Hypothesized Mean Difference	0									
	df	24			df	24									
	t Stat	1.968903827			t Stat	-1.913206357									
	P(T<=t) >										
For questions 3 and 4 below, be sure to list the null and alternate hypothesis statements.  Use .05 for your significance level in making your decisions.																
For full credit, you need to also show the statistical outcomes - either the Excel test result or the calculations you performed.																
																
1.      	Based on the sample data, can the average(mean) salary in the population be the same for each of the grade levels? (Assume equal variance, and use the analysis toolpak function ANOVA.)  															
	Set up the input table/range to use as follows:  Put all of the salary values for each grade under the appropriate grade label.															
	Be sure to incllude the null and alternate hypothesis along with the statistical test and result.															
	A	B	C	D	E	F	Note: Assume equal variances for all grades.									
																
																
2.      	The table and analysis below demonstrate a 2-way ANOVA with replication.  Please interpret the results.															
	Grade															
	Gender	A	B	C	D	E	F									
	M	24	27	40	47	56	76		The salary values were randomly picked for each cell.							
		25	28	47	49	66	77									
	F	22	34	41	50	65	75									
		24	36	42	57	69	77									
																
	Ho: Average salaries are equal for all grades															
	Ha: Average salaries are not equal for all grades															
	Ho: Average salaries by gender are equal															
	Ha: Average salaries by gender are not equal															
	Ho: Interaction is not significant															
	Ha: Interaction is significant															
	Perform analysis:															
	Anova: Two-Factor With Replication															
																
	SUMMARY	A	B	C	D	E	F	Total								
	M															
	Count	2	2	2	2	2	2	12								
	Sum	49	55	87	96	122	153	562								
	Average	24.5	27.5	43.5	48	61	76.5	46.83333333								
	Variance	0.5	0.5	24.5	2	50	0.5	364.5151515								
																
	F															
	Count	2	2	2	2	2	2	12								
	Sum	46	70	83	107	134	152	592								
	Average	23	35	41.5	53.5	67	76	49.33333333								
	Variance	2	2	0.5	24.5	8	2	367.3333333								
																
	Total															
	Count	4	4	4	4	4	4									
	Sum	95	125	170	203	256	305									
	Average	23.75	31.25	42.5	50.75	64	76.25									
	Variance	1.583333333	19.58333333	9.666666667	18.91666667	31.33333333	0.916666667									
																
																
	ANOVA															
	Source of Variation	SS	df	MS	F	P-value	F crit									
	Sample	37.5	1	37.5	3.846153846	0.073483337	4.747225347									
	Columns	7841.833333	5	1568.366667	160.8581197	1.45206E-10	3.105875239		Note: a number with an E after it (E9 or E-6, for example)							
	Interaction	91.5	5	18.3	1.876923077	0.172308261	3.105875239		means we move the decimal point that number of places.							
	Within	117	12	9.75					For example, 1.2E4 becomes 12000; while 4.56E-5 becomes 0.0000456							
																
	Total	8087.833333	23													
																
	Do we reject or not reject each of the null hypotheses?  What do your conclusions mean about the population values being tested?															
Interpretation:																
																
																
																
																
3.   	Using our sample results, can we say that the compa values in the population are equal by grade and/or gender, and are independent of each factor?															
	Grade	Be sure to include the null and alternate hypothesis along with the statistical test and result.														
	Gender	A	B	C	D	E	F									
																
																
	Conduct and show the results of a 2-way ANOVA with replication using the completed table above.  The results should look something like those in question 2.															
	Interpret the results. Are the average compas for each gender (listed as sample) equal?  For each grade?  Do grade and gender interaction impact compa values? 															
																
																
4.   	Pick any other variable you are interested in and do a simple 2-way ANOVA without replication.  Why did you pick this variable and what do the results show?															
	Variable name: 		Be sure to include the null and alternate hypothesis along with the statistical test and result.													
	Gender	A	B	C	D	E	F									
	M								Hint: use mean values in the boxes.							
	F															
																
5.  	 Using the results for this week, What are your conclusions about gender equal pay for equal work at this point?															
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
																
Week 4	Confidence Intervals and Chi Square  (Chs 11 - 12)					  Let's look at some other factors that might influence pay.							
For question 3 below, be sure to list the null and alternate hypothesis statements.  Use .05 for your significance level in making your decisions.													
For full credit, you need to also show the statistical outcomes - either the Excel test result or the calculations you performed.													
													
1	One question we might have is if the distribution of  graduate and undergraduate degrees independent of the grade the employee?  												
	(Note: this is the same as asking if the degrees are distributed the same way.)												
	Based on the analysis of our sample data (shown below), what is your answer?												
	Ho: The populaton correlation between grade and degree is 0.												
	Ha: The population correlation between grade and degree is > 0												
	Perform analysis:												
OBSERVED	A 	B	C	D	E	F	Total						
COUNT - M or 0	7	5	3	2	5	3	25						
 COUNT - F or 1	8	2	2	3	7	3	25						
total	15	7	5	5	12	6	50						
EXPECTED													
	7.5	3.5	2.5	2.5	6	3	25						
													
	By using either the Excel Chi Square functions or calculating the results directly as the text shows, do we												
	reject or not reject the null hypothesis?  What does your conclusion mean?												
Interpretation:													
													
2	Using our sample data, we can construct a 95% confidence interval for the population's mean salary for each gender.   												
	Interpret the results.  How do they compare with the findings in the week 2 one sample t-test outcomes (Question 1)?												
	Males	Mean	St error 			Low 	to 	High					
		52	3.658779396			44.44827933		59.55172067		Results are mean +/-2.064*standard error			
	Females	38	3.622754177			30.52263538		45.47736462		2.064 is t value for 95% interval			
													
Interpretation:													
													
													
3	Based on our sample data, can we conclude that males and females are distributed across grades in a similar pattern within the population?  												
													
4	Using our sample data, construct a 95% confidence interval for the population's mean service difference for each gender.    												
	Do they intersect or overlap?  How do these results compare to the findings in week 2, question 2?												
													
5	How do you interpret these results in light of our question about equal pay for equal work?												
													
Week 5 Correlation and Regression																			
For each question involving a statistical test below, list the null and alternate hypothesis statements.  Use .05 for your significance level in making your decisions.																			
For full credit, you need to also show the statistical outcomes - either the Excel test result or the calculations you performed.																			
																			
1	Create a correlation table for the variables in our data set. (Use analysis ToolPak function Correlation.)																		
	a. Interpret the results.  What variables seem to be important in seeing if we pay males and females equally for equal work?																		
																			
2	Below is a regression analysis for salary being predicted/explained by the other variables in our sample  (Mid,																		
 	 age, ees, sr, raise, and deg variables.) (Note: since salary and compa are different ways of																		
	 expressing an employee’s salary, we do not want to have both used in the same regression.)																		
																			
	Ho: The regression equation is not significant.																		
	Ha: The regression equation is significant.																		
	Ho: The regression coefficient for each variable is not significant																		
	Ha: The regression coefficient for each variable is significant																		
																			
	Sal			The analysis used Sal as the y (dependent variable) and															
	SUMMARY OUTPUT			mid, age, ees, sr, g, raise, and deg as the dependent 															
				variables (entered as a range).															
	Regression Statistics																		
	Multiple R	0.992154976																	
	R Square	0.984371497																	
	Adjusted R Square	0.981766746																	
	Standard Error	2.592776307																	
	Observations	50																	
																			
	ANOVA																		
		df	SS	MS	F	Significance F													
	Regression	7	17783.65546	2540.522209	377.9139269	8.44043E-36													
	Residual	42	282.3445372	6.72248898															
	Total	49	18066																
																			
		Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%										
	Intercept	-4.009	3.775	-1.062	0.294	-11.627	3.609	-11.627	3.609										
	Mid	1.220	0.030	40.674	0.000	1.159	1.280	1.159	1.280										
	Age	0.029	0.067	0.439	0.663	-0.105	0.164	-0.105	0.164										
	EES	-0.096	0.047	-2.020	0.050	-0.191	0.000	-0.191	0.000										
	SR	-0.074	0.084	-0.876	0.386	-0.244	0.096	-0.244	0.096										
	G	2.552	0.847	3.012	0.004	0.842	4.261	0.842	4.261										
	Raise	0.834	0.643	1.299	0.201	-0.462	2.131	-0.462	2.131										
	Deg	1.002	0.744	1.347	0.185	-0.500	2.504	-0.500	2.504										
																			
Interpretation:	 Do you reject or not reject the regression null hypothesis?																		
	Do you reject or not reject the null hypothesis for each variable?																		
	What is the regression equation, using only significant variables if any exist?																		
	What does result tell us about equal pay for equal work for males and females?																		
																			
																			
3	Perform a regression analysis using compa as the dependent variable and the same independent																		
	variables as used in question 2.  Show the result, and interpret your findings by answering the same questions.																		
	Note: be sure to include the appropriate hypothesis statements.																		
																			
4	Based on all of your results to date, is gender a factor in the pay practices of this company?  Why or why not?																		
	Which is the best variable to use in analyzing pay practices - salary or compa?  Why?																		
																			
																			
5	Why did the single factor tests and analysis (such as t and single factor ANOVA tests on salary equality) not provide a complete answer to our salary equality question?																		
	What outcomes in your life or work might benefit from a multiple regression examination rather than a simpler one variable test?