
Post on 09-Apr-2018

221 views 0 download

Transcript of ie_06_dummy

8/8/2019 ie_06_dummy 1/10

Dummy variables


Dummy variablesIntroduction to Econometrics

Francois Cocquemas

March 16, 2010

8/8/2019 ie_06_dummy 2/10

Dummy variables

Outline 2/10

These slides correspond to Chapter 7 in Wooldridge.

8/8/2019 ie_06_dummy 3/10

Dummy variables

Outline 3/10

Describing qualitative dataFar from all of the data of interest to econometricians is

quantitative.For instance, gender of individuals, whether they are married,the industry of firms, countries or regions are all considered tobe qualitative.

How do we include this into a regression?

In many cases, the information can be described as being trueor false, or the character present or absent. In those cases, itis easy to set up a binary variable or dummy variable takingvalues 0 and 1.Be careful when you define which value corresponds to which

characteristic.For instance, male is usually set to 1 when the individual ismale and 0 when female, while if rather we define female wewould likely do the opposite. Those are clearer than a gender

variable. It does not matter to the result, but it does to their


8/8/2019 ie_06_dummy 4/10

Dummy variables

Outline 4/10

Describing categories or ranges

Dummy variables are also useful to describe categories.Indeed, even if the variable is not binary, if it takes a finitenumber of values then it can be described by a complete setof dummy variablesFor instance, if eyes colour can be brown, blue, green or red,

we can have four dummy variables for each of these colour,taking 1 whenever an individual has eyes of this colour.More complex for Bowie.

Notice that summing all variables in a complete set shouldgive you 1 for all observations!

This technique can also be useful for quantitative data whichyou do not believe should be considered as one continuousvariable. A dummy variable for several ranges allows you todistinguish the effects of what you might see as “thresholds”.

Example: in the Mincer equation, we often use dummyvariables for high school dropouts, high school graduates, etc.

8/8/2019 ie_06_dummy 5/10

Dummy variables

Outline 5/10

Using a dummy variable in a regression

Including a dummy variable in a standard OLS regression is assimple as any other variable:

wage = β 0 + δ0female + β 1educ +

Coefficient δ0 is the difference in hourly wage between maleand female, given the same amount of education (and errorterm). If it is negative, women earn less than men on average.

This coefficient can be seen as an intercept shift.

8/8/2019 ie_06_dummy 6/10

Dummy variables

Outline 6/10

Intercept shift



slope = 1




men: wage = 0  


women:wage = (


0) +





D i bl

8/8/2019 ie_06_dummy 7/10

Dummy variables

Outline 7/10

Using a set of dummy variables

What happens if we use a complete set of dummy variables?

wage = β 0 + δ0eyesbrown+ δ1eyesblue 

+δ2eyesgreen+ δ3eyesred + β 1educ +

The four dummies sum to one, hence we have perfect

collinearity. The regression will not be able to identifyproperly the coefficients. It is as if we had a single variablealways equal to one (like for the intercept).

One possible way out is then to drop the intercept. Eachdummy coefficient will then be interpreted as the intercept for

this specific group.Another (more common) possibility is to drop one variable inthe set. This will be the baseline and the other dummycoefficients will read directly as the difference from this


D mm i bl s

8/8/2019 ie_06_dummy 8/10

Dummy variables

Outline 8/10

Dummy variables in R

By default, R will automatically remove the last dummyvariable if you provide a complete set.

However, you are well-advised to do it yourself as this will helpwith the interpretation, and also because other software maynot be as kind.

There are many methods to create dummy variables fromqualitative data. One of the easiest way is to use ifelse (see


Dummy variables

8/8/2019 ie_06_dummy 9/10

Dummy variables

Outline 9/10

Example from Alesina, Algan, Cahuc and Giuliano (2009)

!"#$% &' (")*$+ ,*%- "./ 0"#12 2%34$",*1.

5%6%./%., 7"2*"#$% (*2*.3 81-,9,",% 2%34$",*1. 1: 

)*.*)4) ;"3%0.<=/6 6%2 >"6*,"?

<&? <@? <A?

9,21.3 :")*$+ ,*%-BACDEE





<BFJ@?0%3"$ 12*3*.-

81))1. 0"; 12*3*.K%:%2%.>%

8*7*$ 0"; 12*3*.BADAEEE






9>"./*."7*". 12*3*.BCJ@






=%2)". 12*3*. B@F@<B&FG?



M#-%27",*1.- DC FD DA


9142>%' N12$/ O"$4%- 9427%+P Q$%-*." %, =4$*".1 <@CCH?P R0M <@CCH? "./ S1,%21 %, "$B <@CCF?

Dummy variables

8/8/2019 ie_06_dummy 10/10

Dummy variables

Outline 10/10

Fixed effects

Dummy variables are also frequently used as fixed effects.

Typically, we might add time-fixed effect to our regression tocapture structural changes underlying our regression. Forinstance, this could be a dummy variable for each year or each

period (minus one).In many cases, it is also useful to define a set of individual-fixed effects to capture all unobserved individualcharacteristics.

This might lead to a potentially large number of dummy

variables, which is usually not a problem with moderncomputers.However, you must have several observation for each individualor you will not have degrees of freedom!