Employee_Salary_Data

Employee_Salary_Data

See comments at the right of the data set.											
ID	Salary	Compa	Midpoint	Age	Performance Rating	Service	Gender	Raise	Degree	Gender1	Grade
8	23	1.000	23	32	90	9	1	5.8	0	F	A
10	22	0.956	23	30	80	7	1	4.7	0	F	A
11	23	1.000	23	41	100	19	1	4.8	0	F	A
14	24	1.043	23	32	90	12	1	6	0	F	A
15	24	1.043	23	32	80	8	1	4.9	0	F	A
23	23	1.000	23	36	65	6	1	3.3	1	F	A
26	24	1.043	23	22	95	2	1	6.2	1	F	A
31	24	1.043	23	29	60	4	1	3.9	0	F	A
35	24	1.043	23	23	90	4	1	5.3	1	F	A
36	23	1.000	23	27	75	3	1	4.3	1	F	A
37	22	0.956	23	22	95	2	1	6.2	1	F	A
42	24	1.043	23	32	100	8	1	5.7	0	F	A
3	34	1.096	31	30	75	5	1	3.6	0	F	B
18	36	1.161	31	31	80	11	1	5.6	1	F	B
20	34	1.096	31	44	70	16	1	4.8	1	F	B
39	35	1.129	31	27	90	6	1	5.5	1	F	B
7	41	1.025	40	32	100	8	1	5.7	0	F	C
13	42	1.050	40	30	100	2	1	4.7	1	F	C
22	57	1.187	48	48	65	6	1	3.8	0	F	D
24	50	1.041	48	30	75	9	1	3.8	1	F	D
45	55	1.145	48	36	95	8	1	5.2	0	F	D
17	69	1.210	57	27	55	3	1	3	0	F	E
48	65	1.140	57	34	90	11	1	5.3	1	F	E
28	75	1.119	67	44	95	9	1	4.4	1	F	F
43	77	1.149	67	42	95	20	1	5.5	1	F	F
19	24	1.043	23	32	85	1	0	4.6	1	M	A
25	24	1.043	23	41	70	4	0	4	0	M	A
40	25	1.086	23	24	90	2	0	6.3	0	M	A
2	27	0.870	31	52	80	7	0	3.9	0	M	B
32	28	0.903	31	25	95	4	0	5.6	0	M	B
34	28	0.903	31	26	80	2	0	4.9	1	M	B
16	47	1.175	40	44	90	4	0	5.7	0	M	C
27	40	1.000	40	35	80	7	0	3.9	1	M	C
41	43	1.075	40	25	80	5	0	4.3	0	M	C
5	47	0.979	48	36	90	16	0	5.7	1	M	D
30	49	1.020	48	45	90	18	0	4.3	0	M	D
1	58	1.017	57	34	85	8	0	5.7	0	M	E
4	66	1.157	57	42	100	16	0	5.5	1	M	E
12	60	1.052	57	52	95	22	0	4.5	0	M	E
33	64	1.122	57	35	90	9	0	5.5	1	M	E
38	56	0.982	57	45	95	11	0	4.5	0	M	E
44	60	1.052	57	45	90	16	0	5.2	1	M	E
46	65	1.140	57	39	75	20	0	3.9	1	M	E
47	62	1.087	57	37	95	5	0	5.5	1	M	E
49	60	1.052	57	41	95	21	0	6.6	0	M	E
50	66	1.157	57	38	80	12	0	4.6	0	M	E
6	76	1.134	67	36	70	12	0	4.5	1	M	F
9	77	1.149	67	49	100	10	0	4	1	M	F
21	76	1.134	67	43	95	13	0	6.3	1	M	F
29	72	1.074	67	52	95	5	0	5.4	0	M	F

Week 1.	Measurement and Description - chapters 1 and 2																																																							
																																																								
																																																								
1	Measurement issues.  Data, even numerically coded variables, can be one of 4 levels - 																																																							
	nominal, ordinal, interval, or ratio.  It is important to identify which level a variable is, as																																																							
	this impact the kind of analysis we can do with the data.  For example, descriptive statistics 																																																							
	such as means can only be done on interval or ratio level data.																																																							
	Please list under each label, the variables in our data set that belong in each group.																																																							
	Nominal	Ordinal	Interval	Ratio																																																				
																																																								
																																																								
																																																								
																																																								
																																																								
																																																								
b.	For each variable that you did not call ratio, why did you make that decision?																																																							
																																																								
																																																								
																																																								
																																																								
																																																								
2	The first step in analyzing data sets is to find some summary descriptive statistics for key variables.																																																							
	For salary, compa, age, performance rating, and service; find the mean, standard deviation, and range for 3 groups: overall sample, Females, and Males.																																																							
	You can use either the Data Analysis Descriptive Statistics tool or the Fx =average and =stdev functions.  																																																							
	 (the range must be found using the difference between the =max and =min functions with Fx) functions.																																																							
	Note: Place data to the right, if you use Descriptive statistics, place that to the right as well.																																																							
			Salary	Compa	Age	Perf. Rat.	Service																																																	
	Overall	Mean																																																						
		Standard Deviation																																																						
		Range																																																						
	Female	Mean																																																						
		Standard Deviation																																																						
		Range																																																						
	Male	Mean																																																						
		Standard Deviation																																																						
		Range																																																						
																																																								
3	What is the probability for a:							Probability																																																
	a.       Randomly selected person being a male in grade E?																																																							
	b.      Randomly selected male being in grade E?  																																																							
		Note part b is the same as given a male, what is probabilty of being in grade E?																																																						
	c.     Why are the results different?																																																							
																																																								
4	For each group (overall, females, and males) find:								Overall	Female	Male																																													
a.	The value that cuts off the top 1/3 salary in each group.																																																							
b.	The z score for each value:																																																							
c.	The normal curve probability of exceeding this score:																																																							
d.	What is the empirical probability of being at or exceeding this salary value?																																																							
e.	The value that cuts off the top 1/3 compa in each group.																																																							
f.	The z score for each value:																																																							
g.	The normal curve probability of exceeding this score:																																																							
h.	What is the empirical probability of being at or exceeding this compa value?																																																							
i.	How do you interpret the relationship between the data sets?  What do they mean about our equal pay for equal work question?																																																							
																																																								
																																																								
																																																								
5.      	What conclusions can you make about the issue of male and female pay equality?  Are all of the results consistent? 																																																							
	What is the difference between the sal and compa measures of pay?																																																							
																																																								
																																																								
	Conclusions from looking at salary results:																																																							
																																																								
																																																								
	Conclusions from looking at compa results:																																																							
																																																								
																																																								
	Do both salary measures show the same results?																																																							
																																																								
																																																								
	Can we make any conclusions about equal pay for equal work yet?																																																							
																																																								

Week 2	Testing means																				Q3		
	In questions 2 and 3, be sure to include the null and alternate hypotheses you will be testing.  																	Ho	Female		Male	Female	
	In the first 3 questions use alpha = 0.05 in making your decisions on rejecting or not rejecting the null hypothesis.																	45	34		1.017	1.096	
																		45	41		0.870	1.025	
1	Below are 2 one-sample t-tests comparing male and female average salaries to the overall sample mean.  																	45	23		1.157	1.000	
	(Note: a one-sample t-test in Excel can be performed by selecting the 2-sample unequal variance t-test and making the second variable = Ho value -- see column S)																	45	22		0.979	0.956	
	Based on our sample, how do you interpret the results and what do these results suggest about the population means for male and female average salaries?																	45	23		1.134	1.000	
	Males				Females													45	42		1.149	1.050	
	Ho: Mean salary = 45				Ho: Mean salary = 45													45	24		1.052	1.043	
	Ha: Mean salary =/= 45				Ha: Mean salary =/= 45													45	24		1.175	1.043	
																		45	69		1.043	1.210	
	Note: While the results both below are actually from Excel's t-Test: Two-Sample Assuming Unequal Variances, 																	45	36		1.134	1.161	
	having no variance in the Ho variable makes the calculations default to the one-sample t-test outcome - we are tricking Excel into doing a one sample test for us.																	45	34		1.043	1.096	
		Male	Ho			Female	Ho											45	57		1.000	1.187	
	Mean	52	45		Mean	38	45											45	23		1.074	1.000	
	Variance	316	0		Variance	334.6666667	0											45	50		1.020	1.041	
	Observations	25	25		Observations	25	25											45	24		0.903	1.043	
	Hypothesized Mean Difference	0			Hypothesized Mean Difference	0												45	75		1.122	1.119	
	df	24			df	24												45	24		0.903	1.043	
	t Stat	1.968903827			t Stat	-1.913206357												45	24		0.982	1.043	
	P(T<=t) > 0.05?				Is P-value > 0.05?													45	65		1.157	1.140	
	Why do we not reject Ho?				Why do we not reject Ho?																		
Interpretation:																							
																							
																							
																							
																							
2	Based on our sample data set, perform a 2-sample t-test to see if the population male and female average salaries could be equal to each other.																						
	(Since we have not yet covered testing for variance equality, assume the data sets have statistically equal variances.)																						
																							
	Ho: 																						
	Ha: 																						
	Test to use:																						
	Place  B43 in Outcome range box.																						
																							
 																							
																							
																							
																							
																							
																							
																							
																							
																							
																							
																							
																							
																							
																							
	P-value is:																						
	Is P-value < 0.05?																						
	Reject or do not reject Ho:																						
If  the null hypothesis was rejected, what is the effect size value:																							
	Meaning of effect size measure:																						
																							
	Interpretation:																						
																							
b.	Since the >																															
Interpretation:																																
																																
																																
2	Using our sample data, construct a 95% confidence interval for the mean salary difference between the genders in the population.    																															
	 How does this compare to the findings in week 2, question 2?																															
																																
	Difference	St Err.	T value			Low 	to 	High																								
																																
																																
																																
				Yes/No																												
	Can the means be equal?				Why?																											
																																
	How does this compare to the week 2, question 2 result (2 sampe t-test)?																															
																																
a.	Why is using a two sample tool (t-test, confidence interval) a better choice than using 2 one-sample techniques when comparing two samples?																															
																																
																																
3	We found last week that the degrees compa values within the population.																															
	 do not impact compa rates.  This does not mean that degrees are distributed evenly across the grades and genders.																															
	Do males and females have athe same distribution of degrees by grade?																															
	(Note: while technically the sample size might not be large enough to perform this test, ignore this limitation for this exercise.)																															
																																
	What are the hypothesis statements:																															
	Ho: 																															
	Ha:																															
Note:  You can either use the Excel Chi-related functions or do the calculations manually.																																
	Data input tables - graduate degrees by gender and grade level																															
OBSERVED	A 	B	C	D	E	F	Total		Do manual calculations per cell here (if desired)																							
M Grad									A 	B	C	D	E	F																		
Fem Grad								M Grad																								
Male Und								Fem Grad																								
Female Und								Male Und																								
								Female Und																								
																																
									Sum =																							
EXPECTED																																
M Grad								For this exercise - ignore the requirement for a correction																								
Fem Grad								for expected values less than 5.																								
Male Und																																
Female Und																																
																																
																																
																																
																																
																																
Interpretation:																																
				What is the value of the chi square statistic: 																												
				What is the p-value associated with this value: 																												
				Is the p-value <0.05?																												
				Do you reject or not reject the null hypothesis: 																												
				If you rejected the null, what is the Cramer's V correlation:																												
				What does this correlation mean?																												
				What does this decision mean for our equal pay question: 																												
																																
																																
4	Based on our sample data, can we conclude that males and females are distributed across grades in a similar pattern																															
	within the population?																															
																																
	What are the hypothesis statements:																															
	Ho: 																															
	Ha:																															
																																
										Do manual calculations per cell here (if desired)																						
		A 	B	C	D	E	F			A 	B	C	D	E	F																	
	OBS COUNT - m								M																							
	OBS COUNT - f								F																							
																																
									Sum = 																							
	EXPECTED																															
																																
																																
																																
				What is the value of the chi square statistic: 																												
				What is the p-value associated with this value: 																												
				Is the p-value <0.05?																												
				Do you reject or not reject the null hypothesis: 																												
				If you rejected the null, what is the Phi correlation:																												
				What does this correlation mean?																												
																																
				What does this decision mean for our equal pay question: 																												
																																
5.      How do you interpret these results in light of our question about equal pay for equal work?																																
																																
Week 5 Correlation and Regression																										
																										
1.    	Create a correlation table for the variables in our data set. (Use analysis ToolPak or StatPlus:mac LE function Correlation.)																									
	a. 	Reviewing the data levels from week 1, what variables can be used in a Pearson's Correlation table (which is what Excel produces)?																								
																										
																										
	b. Place table here (C8 in Output range box):																									
																										
																										
																										
																										
																										
																										
																										
																										
																										
	c.	Using r = approximately .28 as the signicant r value (at p = 0.05) for a correlation between 50 values, what variables are																								
		significantly related to Salary?																								
		To compa?																								
																										
	d.	Looking at the above correlations - both significant or not - are there any surprises -by that I 																								
		mean any relationships you expected to be meaningful and are not and vice-versa?																								
																										
	e.	Does this help us answer our equal pay for equal work question?																								
																										
																										
2		Below is a regression analysis for salary being predicted/explained by the other variables in our sample  (Midpoint,																								
		 age, performance rating, service,  gender, and degree variables. (Note: since salary and compa are different ways of																								
		 expressing an employee’s salary, we do not want to have both used in the same regression.)																								
		Plase interpret the findings.																								
																										
		Ho: The regression equation is not significant.																								
		Ha: The regression equation is significant.																								
		Ho: The regression coefficient for each variable is not significant						  Note: technically we have one for each input variable.																		
		Ha: The regression coefficient for each variable is significant						  Listing it this way to save space.																		
																										
		Sal																								
		SUMMARY OUTPUT																								
																										
		Regression Statistics																								
		Multiple R	0.991559075																							
		R Square	0.983189399																							
		Adjusted R Square	0.980843733																							
		Standard Error	2.657592573																							
		Observations	50																							
																										
		ANOVA																								
			df	SS	MS	F	Significance F																			
		Regression	6	17762.29967	2960.383279	419.1516111	1.81215E-36																			
		Residual	43	303.7003261	7.062798282																					
		Total	49	18066																						
																										
			Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%																
		Intercept	-1.749621212	3.618367658	-0.483538816	0.63116649	-9.046755043	5.547512618	-9.046755043	5.547512618																
		Midpoint	1.216701051	0.031902351	38.13828812	8.66416E-35	1.152363828	1.281038273	1.152363828	1.281038273																
		Age	-0.00462801	0.065197212	-0.070984788	0.943738987	-0.136110719	0.126854699	-0.136110719	0.126854699																
		Performace Rating	-0.056596441	0.034495068	-1.640711097	0.108153182	-0.126162375	0.012969494	-0.126162375	0.012969494																
 		Service	-0.042500357	0.084336982	-0.503935003	0.616879352	-0.212582091	0.127581377	-0.212582091	0.127581377																
		Gender	2.420337212	0.860844318	2.81158528	0.007396619	0.684279192	4.156395232	0.684279192	4.156395232																
		Degree	0.275533414	0.799802305	0.344501901	0.732148119	-1.337421655	1.888488483	-1.337421655	1.888488483																
		Note: since Gender and Degree are expressed as 0 and 1, they are considered dummy variables and can be used in a multiple regression equation.																								
																										
																										
		Interpretation:																								
		For the Regression as a whole:																								
					What is the value of the F statistic: 																					
					What is the p-value associated with this value: 																					
					Is the p-value <0.05?																					
					Do you reject or not reject the null hypothesis: 																					
					What does this decision mean for our equal pay question: 																					
																										
		For each of the coefficients:				Intercept	Midpoint	Age	Perf. Rat.	Service	Gender	Degree														
					What is the coefficient's p-value for each of the variables: 																					
					Is the p-value < 0.05?																					
					Do you reject or not reject each null hypothesis: 																					
					What are the coefficients for the significant variables?																					
					Using only the significant variables, what is the equation?	Salary =																				
					Is gender a significant factor in salary:																					
					If so, who gets paid more with all other things being equal?																					
					How do we know? 																					
																										
																										
3		Perform a regression analysis using compa as the dependent variable and the same independent																								
		variables as used in question 2.  Show the result, and interpret your findings by answering the same questions.																								
		Note: be sure to include the appropriate hypothesis statements.																								
		Regression hypotheses																								
		Ho:																								
		Ha:																								
		Coefficient hypotheses (one to stand for all the separate variables)																								
		Ho:																								
		Ha:																								
		Put C94 in output range box																								
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
																										
		Interpretation:																								
		For the Regression as a whole:																								
					What is the value of the F statistic: 																					
					What is the p-value associated with this value: 																					
					Is the p-value < 0.05?																					
					Do you reject or not reject the null hypothesis: 																					
					What does this decision mean for our equal pay question: 																					
																										
		For each of the coefficients: 				Intercept	Midpoint	Age	Perf. Rat.	Service	Gender	Degree														
					What is the coefficient's p-value for each of the variables: 																					
					Is the p-value < 0.05?																					
					Do you reject or not reject each null hypothesis: 																					
					What are the coefficients for the significant variables?																					
					Using only the significant variables, what is the equation?	Compa = 																				
					Is gender a significant factor in compa:																					
					If so, who gets paid more with all other things being equal?																					
					How do we know? 																					
																										
																										
4		Based on all of your results to date, do we have an answer to the question of are males and females paid equally for equal work?																								
				If so, which gender gets paid more? 																						
				 How do we know?																						
		Which is the best variable to use in analyzing pay practices - salary or compa?  Why?																								
		What is most interesting or surprising about the results we got doing the analysis during the last 5 weeks?																								
																										
																										
																										
5		Why did the single factor tests and analysis (such as t and single factor ANOVA tests on salary equality) not provide a complete answer to our salary equality question?																								
		What outcomes in your life or work might benefit from a multiple regression examination rather than a simpler one variable test?