Pennsylvania ECON 306 – calculations in STATA
Subject: Economics / General Economics
ECON 306 – Homework 3 The following two problems will require a lot of calculations in STATA (or however you opt to
execute the calculations). It will generate many pages of output. Here is how your should
organize it. The first pages should contain your answers to all the questions, along with
showing any key algebraic equations or explanations you need to use along the way. After
that, include a printout of the output from the regressions you executed in support of your
answers. Highlight any numbers in this output that you used in the first section. (To save paper,
you may print this section double-side and/or with 2-up format.) Last, include a copy of the DO
file that contains the commands you asked STATA to execute. Be sure you organize these in a
way that will be clear to the reader.
1. (52 points total, 4 points each part) With this assignment you will find a STATA data file
called boston.dta. For reference, the variables in this file are:
nox = nitric oxides concentration (parts per 10 million)
rm = average number of rooms per dwelling
age = proportion of owner-occupied units built prior to 1940
dis = weighted distances to five Boston employment centers
ptratio = pupil-teacher ratio by town
lstat = percent lower status of population
medv = median value of owner-occupied homes (in thousands of dollars)
Open this dataset within STATA (only STATA can open it). Before you begin answering
the following, it’s not a bad idea to ask STATA to summarize the data using the command
summarize. You should also start a log file to store your results.
a.) Run the following regression: MEDV 0 1 * RM
b.) Hypothesize the sign of the bias, if any, resulting from excluding age from the
regression. Explain your reasoning. (There is no wrong answer as long as you make
a sensible story.)
c.) Use the data to verify (or not) your claim from b). Break down the bias into its
d.) Now, run the regression:
MEDV 0 1 * NOX 2 * RM 3 * AGE 4 * DIS 5 * PTRATIO 6 * LSTAT
e.) At a level of ?=.05, for which, if any, values of ?i, would you reject the null
hypothesis that ?i=0? f.) What is the predicted medv with nox=0.5, rm=4, age=60, dis=3, ptratio=20,
g.) Redo (f) but with nox=0.6. What is the difference in predicted medv between these
two communities? Compare this with the coefficient of nox.
h.) Ceteris Peribus, compared to (f), what is the impact of reducing the pupil-teacher
ratio to 18?
i.) What percentage of the variation in medv is explained by the six X-variables?
Now change the measurement of nox. Use the ‘gen’ command:
and then use this in place of nox in the regression command
regress medv noxppm rm age dis ptratio lstat
j.) Compare the coefficient, standard error, and t-ratio for noxppm to that of nox.
Interpret the difference between this model and the previous.
k.) Also compare the and remaining coefficients. Interpret the difference between this model and the original regression model.
Now change the variable age to newage
and then use this in place of age in the original regression command. That is,
regress medv nox rm newage dis ptratio lstat
l.) Compare the coefficient, standard error, and t-ratio for newage to that of age.
Interpret the difference between this model and the previous.
m.) Also compare the and remaining coefficients. Interpret the difference between this model and the original regression model. 2. (48 points total. 5 points each part, +3 for free.) For the following problem, use the
STATA dataset called crime.dta. This data set was compiled by Christopher Cornwell
and William Trumbull to study factors that influence crime rates. The data set contains
observations for 90 counties in North Carolina for 1981. The definitions of the variables
represented in the data set are:
prbarr=probability of arrest
prbconv=probability of conviction
prbpris=probability of a prison sentence
avgsen=average sentence in days
polpc=number of police per capita
pctymle=percent young males
wmfg=average weekly wage in manufacturing
wcon=average weekly wage in construction
wtuc=average weekly wage in transportation,utilities,and communications
wtrd=average weekly wage in wholesale and retail trade
wfir=average weekly wage in finance,insurance,and real estate
wser=average weekly wage in services
wfed=average weekly wage in federal government
wsta=average weekly wage in state government
wloc=average weekly wage in local government
According to the economic model of crime rates, lower crime rates are associated with
better labor markets (higher wages), more police presence and tougher sentences, and
lower population density. We will use this data set to examine these hypotheses. Use a
significance level of ?=.05 for all hypothesis tests.
d.) Run a regression of crmrte on all of the other variables. Call this Model 1.
Do any t-statistics indicate a variable is not statistically significant? Which?
Interpret the F-statistic STATA has calculated for Model 1.
Test the hypothesis that the coefficients on wfed and wsta are equal to each other.
Use the t-test method described in the lectures. What transformation do you need to
do here? Be specific.
e.) Test the hypothesis that the coefficients on wfed, wsta and wloc are all equal to
each other. Do this by writing down the formula for the relevant F-statistic.
Calculate it (by running the appropriate restricted regression) and test the hypothesis.
Report these results. This restricted version of the regression will be called Model 2. f.) Return to Model 1. Now test the hypothesis that pctmin and pctymle both equal
zero. Do this by writing down the formula for the relevant F-statistic. Calculate it
(by running the appropriate restricted regression) and test the hypothesis. Report
these results. This restricted version of the regression will be called Model 3.
The model could potentially be simplified by replacing all the wage variables with an
average. Specifically, let us define
wmfg wcon wtuc wtrd wfir wser wfed wsta wloc
Generate this variable.
avgwage g.) Return to Model 1 and run using avgwage in place of the individual wage variables.
Check the validity of this restriction. As before, do this by writing down the formula
for the relevant F-statistic. Calculate it (by running the appropriate restricted
regression) and test the hypothesis. Report these results. This restricted version of the
regression will be called Model 4.
h.) Let’s focus our attention on the coefficient for the variable polpc. How does the
value of this coefficient change – as well as its statistical significance –as we move
from model to model? To answer this, write down a table containing the results for
this coefficient for each of the four models. In this table, include the coefficient
values, the values of the t-statistic (for a hypothesis that the coefficient=0,) and
whether you’d reject the hypothesis.
i.) What do your results in the last question imply about the relationship between the
number of police and the crime rate. Are you confident in these results based on the
work you have done? Why or why not?