Economics 154
Economics of Discrimination
University of California, Berkeley
Spring 2006
Professor Martha Olney

Thinking about the Iowa data set

The data set contains 347 cases (people) who each answered a series of questions.  Some of the responses on the original surveys were blank and were coded as "no response."  (Check out  http://eh.net/databases/labor/codebooks/io01.asc for a full description of what data were collected for each variable.)  Look at the data:  sometimes "no response" was probably "no" (especially when the answers were either "no response" or "yes").  Other times "no response" might indicate a case to be omitted from a regression.

Start by asking a seemingly easy question: Did men earn more than women?

The first step: create some summary statistics. 
What are the average earnings for men?  What are the average earning for women?  What if you distinguish by both occupation & sex?  See if you can come up with these results:
 

mean values of TOTEAR
all, n = 347  $263.95
All, with totear>0, n=280, $329.26
Men with totear>0, n=136,  $414.42
Women with totear >0, n=144,  $248.83
Teachers with totear>0, n=260,  $268.90
Principals with totear>0, n = 11,  $1,196.82
Superintendents with totear>0, n = 9,  $1,012.78

Create the following Dummy Variables:
   male = 1 if man,  0 if woman
   teacher=1 if teacher, 0 if not
   Principal = 1 if principal, 0 if not
   Super = 1 if superintendent, 0 if not
   Grad = 1 if college grad, 0 if not

You also might want to make this adjustment that I made to a variable:
     exper: I changed value -8 ("first term") to 0; dropped those with no response (-9) for regressions including exper

A second step:  look at some plots of the data\

First, graph totear by age. 



Hm:  got some outliers on age and some 0 values on totear.  Better restrict this to age > 0 and totear > 0.

It looks as if earnings increase with age, vaguely.  Is there any difference in the pattern by gender? (0 is female; 1 is male)



The next step: do some regression analysis.
 A naive way to start would be to regress earnings (TOTEAR) on sex (SEX).  You'll probably get a statistically significant difference.  But is it telling you anything?  Maybe the men earn more than the women b/c they are more likely to be principals or superintendents than are the women.  So, better control for occupation (OCCUP) by including that as an independent variable too.  Maybe older workers get paid more, so maybe we should include age (AGE).  But be careful — don't include the cases where the age is given as -9.  Maybe earnings and age are not linearly related, so better experiment with some non-linear specifications.  Maybe experience (TERMS) matters.  Maybe having a college education (COLLEGE) increases wages.
 

See if you can come up with the following regression results

all, totear > 0, n = 280
adjusted R2 =  0.0937
------------------------------------------------------------------------------
      totear |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        male |   165.5895   30.31019     5.46   0.000     105.9228    225.2561
       _cons |   248.8333   21.12414    11.78   0.000     207.2497    290.4169
------------------------------------------------------------------------------

all, totear > 0, n=280
adjusted R2 = 0.7018
------------------------------------------------------------------------------
      totear |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        male |   87.74763   17.78556     4.93   0.000     52.73505    122.7602
   principal |   880.6714   45.77201    19.24   0.000     790.5648     970.778
       super |   735.6299   49.32827    14.91   0.000     638.5225    832.7374
       _cons |   228.3992   12.19427    18.73   0.000     204.3936    252.4048
------------------------------------------------------------------------------

teachers, totear > 0, n = 260
adjusted R2 = 0.0763
------------------------------------------------------------------------------
      totear |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        male |   81.77202   17.28235     4.73   0.000     47.73959    115.8045
       _cons |   231.1571   11.74105    19.69   0.000     208.0367    254.2776
------------------------------------------------------------------------------

teachers, totear>0, exper >= 0, age>0, n=258
adjusted R2 = 0.2974
------------------------------------------------------------------------------
      totear |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        male |   72.04632   16.50058     4.37   0.000     39.55034    104.5423
        grad |   87.51573   24.49493     3.57   0.000     39.27579    135.7557
       exper |   6.732298   1.227631     5.48   0.000      4.31462    9.149977
         age |   -2.19464   1.734191    -1.27   0.207    -5.609928    1.220649
       _cons |    211.493   34.39206     6.15   0.000     143.7618    279.2242
------------------------------------------------------------------------------


Things to think about:
hmmmmm, earnings decline with age, all else constant (though the result is not strongly statistically significant).  The problem is that we are trying to force a linear specification of the relationship between income and age.  So try a couple of nonlinear specifications.  Create lntotear = ln (totear); create agesq = age2.  Then see if you can get

teachers, totear>0, exper >= 0, age>0, n=258
dependent variable: ln(totear)

adjusted R2 = 0.2396
------------------------------------------------------------------------------
    lntotear |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        male |   .2825867   .0690862     4.09   0.000     .1465295    .4186439
        grad |   .2507171   .1025577     2.44   0.015     .0487415    .4526926
       exper |   .0235584     .00514     4.58   0.000     .0134358     .033681
         age |  -.0049082   .0072609    -0.68   0.500    -.0192077    .0093912
       _cons |   5.161052   .1439959    35.84   0.000     4.877468    5.444635
------------------------------------------------------------------------------


teachers, totear>0, exper >= 0, age>0, n=258
adjusted R2 = 0.3349  
------------------------------------------------------------------------------
      totear |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        male |   60.68894   16.31503     3.72   0.000     28.55775    92.82013
        grad |   92.26855   23.86309     3.87   0.000     45.27204    139.2651
       exper |    5.70523   1.222988     4.66   0.000      3.29665     8.11381
         age |   22.18697   6.463625     3.43   0.001     9.457361    34.91657
       agesq |  -.3597616   .0920669    -3.91   0.000    -.5410802    -.178443
       _cons |   -141.989   96.45028    -1.47   0.142    -331.9403    47.96233
------------------------------------------------------------------------------


The way things are set up here, the difference between men and women is caught in the constant term.  But the return to age, to experience, to education are all the same for both men and women. That is true because we have just "grad" or "exper" or "age" as a dependent variable, so the partial derivative of totear with regard to any of these variables is the same whether the teacher is a man or a woman.  The way to get around that is to create some new variables.  If "femgrad" = grad for women but =0 for men, then we could have both grad and femgrad as independent variables and that would allow us to see if the return to education is the same for both men and women, on average.

Create: femgrad = grad for women, =0 for men
 femage = age for women, 0 for men
 femagesq = agesq for women, 0 for men
 femexp = exper for women, 0 for men

Regress totear on male, grad, femgrad, exper, femexp, age, femage, agesq, femagesq
for teachers with exper>=0 and age>0 (n=258).  You should get this result:

     Source |       SS       df       MS              Number of obs =     258
-------------+------------------------------           F(  9,   248) =   15.23
       Model |  1924620.59     9  213846.732           Prob > F      =  0.0000
    Residual |  3481861.62   248  14039.7646           R-squared     =  0.3560
-------------+------------------------------           Adj R-squared =  0.3326
       Total |  5406482.21   257  21036.8958           Root MSE      =  118.49

------------------------------------------------------------------------------
      totear |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        male |   374.3804   217.5006     1.72   0.086    -54.00343    802.7641
        grad |   76.81646   28.55401     2.69   0.008     20.57718    133.0557
     femgrad |   56.28438   52.63502     1.07   0.286    -47.38427     159.953
       exper |   6.405744   1.619435     3.96   0.000     3.216145    9.595342
      femexp |  -1.648958   2.494469    -0.66   0.509    -6.562003    3.264088
         age |   13.98413    8.63981     1.62   0.107    -3.032624    31.00089
      femage |   20.31798   14.75055     1.38   0.170    -8.734345     49.3703
       agesq |  -.2572549   .1172501    -2.19   0.029    -.4881878   -.0263221
    femagesq |  -.2755972   .2272215    -1.21   0.226    -.7231272    .1719327
       _cons |   -318.958   167.4522    -1.90   0.058    -648.7677    10.85179
------------------------------------------------------------------------------


Questions to consider, based on regression result:
Is the return to college education greater for men than for women?  Is the difference statistically significant?

Is the return to experience greater for men than for women?  Is the difference statistically significant?

Is the return to age greater for men than for women?  Is the difference statistically significant?
 
 


Return to Econ 154 home page
This page prepared by Professor Martha Olney

Last updated 2/15/2006
University of California, Berkeley
Department of Economics
549 Evans Hall #3880
Berkeley CA  94720-3880
phone: 510-642-6083
fax: 510-642-6615