|
Economics of Discrimination University of California, Berkeley Spring 2006 Professor Martha Olney |
Thinking about the Iowa data set
The data set contains 347 cases (people) who each answered a series of questions. Some of the responses on the original surveys were blank and were coded as "no response." (Check out http://eh.net/databases/labor/codebooks/io01.asc for a full description of what data were collected for each variable.) Look at the data: sometimes "no response" was probably "no" (especially when the answers were either "no response" or "yes"). Other times "no response" might indicate a case to be omitted from a regression.
Start by asking a seemingly easy question: Did men earn more than women?
The first step: create some
summary statistics.
What are the average
earnings for men? What are the average earning for women?
What
if you distinguish by both occupation & sex? See if you can
come
up with these results:
|
|
|
| all, n = 347 | $263.95 |
| All, with totear>0, n=280, | $329.26 |
| Men with totear>0, n=136, | $414.42 |
| Women with totear >0, n=144, | $248.83 |
| Teachers with totear>0, n=260, | $268.90 |
| Principals with totear>0, n = 11, | $1,196.82 |
| Superintendents with totear>0, n = 9, | $1,012.78 |
Create the following Dummy Variables:
male = 1 if man, 0 if woman
teacher=1 if teacher, 0 if not
Principal = 1 if principal, 0 if not
Super = 1 if superintendent, 0 if not
Grad = 1 if college grad, 0 if not
You also might want to make this adjustment that I made to a
variable:
exper: I changed value -8 ("first term") to 0;
dropped those
with no response (-9) for regressions including exper
A second step: look at
some plots of the data\
First, graph totear by age.

Hm: got some outliers on age and some 0 values on totear.
Better restrict this to age > 0 and totear > 0.

It looks as if earnings increase with age, vaguely. Is there any difference in the pattern by gender? (0 is female; 1 is male)

The next step: do some
regression analysis.
A naive way to start
would be to regress earnings (TOTEAR) on sex (SEX). You'll
probably
get a statistically significant difference. But is it telling you
anything? Maybe the men earn more than the women b/c they are
more
likely to be principals or superintendents than are the women.
So,
better control for occupation (OCCUP) by including that as an
independent
variable too. Maybe older workers get paid more, so maybe we
should
include age (AGE). But be careful — don't include the cases where
the age is given as -9. Maybe earnings and age are not linearly
related,
so better experiment with some non-linear specifications. Maybe
experience
(TERMS) matters. Maybe having a college education (COLLEGE)
increases
wages.
See if you can come up with the following regression results.
all, totear > 0, n = 280
adjusted R2 = 0.0937
------------------------------------------------------------------------------
totear | Coef. Std.
Err. t
P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 165.5895
30.31019 5.46
0.000 105.9228 225.2561
_cons | 248.8333 21.12414
11.78 0.000
207.2497 290.4169
------------------------------------------------------------------------------
all, totear > 0, n=280
adjusted R2 = 0.7018
------------------------------------------------------------------------------
totear | Coef. Std.
Err. t
P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 87.74763
17.78556 4.93
0.000 52.73505 122.7602
principal
| 880.6714 45.77201
19.24 0.000
790.5648 970.778
super | 735.6299 49.32827
14.91 0.000
638.5225 832.7374
_cons | 228.3992 12.19427
18.73 0.000
204.3936 252.4048
------------------------------------------------------------------------------
teachers, totear > 0, n = 260
adjusted R2 = 0.0763
------------------------------------------------------------------------------
totear | Coef. Std.
Err. t
P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 81.77202
17.28235 4.73
0.000 47.73959 115.8045
_cons | 231.1571 11.74105
19.69 0.000
208.0367 254.2776
------------------------------------------------------------------------------
teachers, totear>0, exper >=
0, age>0, n=258
adjusted R2 = 0.2974
------------------------------------------------------------------------------
totear | Coef. Std.
Err. t
P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 72.04632
16.50058 4.37
0.000 39.55034 104.5423
grad | 87.51573
24.49493 3.57
0.000 39.27579 135.7557
exper | 6.732298
1.227631 5.48
0.000 4.31462 9.149977
age | -2.19464 1.734191
-1.27 0.207 -5.609928
1.220649
_cons | 211.493
34.39206 6.15
0.000 143.7618 279.2242
------------------------------------------------------------------------------
Things to think about:
hmmmmm, earnings decline with age, all else constant (though the result
is not strongly statistically significant). The problem is that
we
are trying to force a linear specification of the relationship between
income and age. So try a couple of nonlinear
specifications.
Create lntotear = ln (totear); create agesq = age2.
Then
see if you can get
teachers, totear>0, exper >=
0, age>0, n=258
dependent variable: ln(totear)
adjusted R2 = 0.2396
------------------------------------------------------------------------------
lntotear
| Coef. Std.
Err. t
P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | .2825867
.0690862 4.09
0.000 .1465295 .4186439
grad | .2507171
.1025577 2.44
0.015 .0487415 .4526926
exper | .0235584
.00514 4.58
0.000 .0134358 .033681
age | -.0049082 .0072609
-0.68 0.500 -.0192077
.0093912
_cons | 5.161052 .1439959
35.84 0.000
4.877468 5.444635
------------------------------------------------------------------------------
teachers, totear>0, exper >=
0, age>0, n=258
adjusted R2 = 0.3349
------------------------------------------------------------------------------
totear | Coef. Std.
Err. t
P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 60.68894
16.31503 3.72
0.000 28.55775 92.82013
grad | 92.26855
23.86309 3.87
0.000 45.27204 139.2651
exper | 5.70523
1.222988 4.66
0.000 3.29665
8.11381
age | 22.18697 6.463625
3.43 0.001
9.457361 34.91657
agesq | -.3597616 .0920669
-3.91 0.000 -.5410802
-.178443
_cons | -141.989 96.45028
-1.47 0.142 -331.9403
47.96233
------------------------------------------------------------------------------
The way things are set up here, the difference between men and women is caught in the constant term. But the return to age, to experience, to education are all the same for both men and women. That is true because we have just "grad" or "exper" or "age" as a dependent variable, so the partial derivative of totear with regard to any of these variables is the same whether the teacher is a man or a woman. The way to get around that is to create some new variables. If "femgrad" = grad for women but =0 for men, then we could have both grad and femgrad as independent variables and that would allow us to see if the return to education is the same for both men and women, on average.
Create: femgrad = grad for women, =0 for men
femage = age for women, 0 for men
femagesq = agesq for women, 0 for men
femexp = exper for women, 0 for men
Regress totear on male, grad, femgrad, exper, femexp, age,
femage,
agesq, femagesq
for teachers with exper>=0 and age>0 (n=258). You should
get this result:
Source |
SS
df
MS
Number of obs = 258
-------------+------------------------------
F( 9, 248) = 15.23
Model | 1924620.59 9
213846.732
Prob > F = 0.0000
Residual
| 3481861.62 248
14039.7646
R-squared = 0.3560
-------------+------------------------------
Adj R-squared = 0.3326
Total | 5406482.21 257
21036.8958
Root MSE = 118.49
------------------------------------------------------------------------------
totear | Coef. Std.
Err. t
P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | 374.3804
217.5006 1.72
0.086 -54.00343 802.7641
grad | 76.81646
28.55401 2.69
0.008 20.57718 133.0557
femgrad
| 56.28438 52.63502
1.07 0.286
-47.38427 159.953
exper | 6.405744
1.619435 3.96
0.000 3.216145 9.595342
femexp | -1.648958 2.494469
-0.66 0.509 -6.562003
3.264088
age | 13.98413
8.63981 1.62
0.107 -3.032624 31.00089
femage | 20.31798
14.75055 1.38
0.170 -8.734345 49.3703
agesq | -.2572549 .1172501
-2.19 0.029 -.4881878
-.0263221
femagesq
| -.2755972 .2272215
-1.21 0.226 -.7231272
.1719327
_cons | -318.958 167.4522
-1.90 0.058 -648.7677
10.85179
------------------------------------------------------------------------------
Questions to consider, based on regression result:
Is the return to college education greater for men than for
women?
Is the difference statistically significant?
Is the return to experience greater for men than for women? Is the difference statistically significant?
Is the return to age greater for men than for women? Is the
difference
statistically significant?