A Brief Introduction to Stata(3)

27
A Brief Introd uction to Stat a(3)

description

A Brief Introduction to Stata(3). 3. Working with data files: changing dataset. 3.1. Generating new variables 3.2. Labeling 3.3. Keeping and Dropping Variables and Observations 3.4. Producing Graphs 3.5. Combining Data Sets. 3.1. Generating new variables. - PowerPoint PPT Presentation

Transcript of A Brief Introduction to Stata(3)

Page 1: A Brief Introduction to Stata(3)

A Brief Introduction to Stata(3)

Page 2: A Brief Introduction to Stata(3)

3. Working with data 3. Working with data files: files: changing datasetchanging dataset

Page 3: A Brief Introduction to Stata(3)

3.1. Generating new variables 3.2. Labeling 3.3. Keeping and Dropping Variables and Ob

servations 3.4. Producing Graphs 3.5. Combining Data Sets

Page 4: A Brief Introduction to Stata(3)

3.1. Generating new variables the command generate (abbreviated gen) crea

tes new variables, while the command replace changes the values of an existing variable:

. gen oldhead=1 if age>32 (4438 missing values generated) . replace oldhead=0 if age<=32 (4438 real changes made)

Page 5: A Brief Introduction to Stata(3)

The following points should be made: If a generate or replace command is issued

without any conditions, that command applies to all observations in the data file.

While using the generate command, care should be taken to handle missing values properly.

The right hand side of the = sign in the generate or replace commands can be any expression involving variable names, not just a value.

The command replace does not have to always follow the generate command. The replace command can be used to change the values of any existing variable, independently of generate command.

Page 6: A Brief Introduction to Stata(3)

Calculates the maximum between the food and non-food expenditures for each household:

. gen maxexp=max(food,nfood) . egen maxexp=rmax(food nfood)

Page 7: A Brief Introduction to Stata(3)

The more powerful feature of egen command is its ability to create statistics involving multiple observations.

. egen avgtotex=mean(totex) . egen avgtea=mean(totex), by(regn)

Page 8: A Brief Introduction to Stata(3)

3.2. Labeling 3.2.1. Labeling variables

. label variable oldhead "HH Head is over 32" . label var oldhead "HH Head is over 32"

to see the new label, type: . des oldhead

Page 9: A Brief Introduction to Stata(3)

3.2.2. Labeling Data To attach a label to the entire data set : . label data “FIES 2000” To see this label : . des

Page 10: A Brief Introduction to Stata(3)

3.2.3. Labeling Values of variables .gen majisland=1 if regn<=5 | regn==13 | regn=

=14 .replace majisland=2 if regn>5 & regn<=8 .replace majisland=3 if majisland==. . tab majisland . label define majlabel 3 "Mind" 2 "Vis" 1 "L

uz" . label values majisland majlabel . tab majisland . tab majisland, nolabel

Page 11: A Brief Introduction to Stata(3)

3.3. Keeping and Dropping Variables and Observations keep var1 var2 var3 (or keep var1-var3) drop var4 var5 var6 (or drop var4-var6) . drop if age>=80 . keep if fsize<=6 . drop in 1/20

Page 12: A Brief Introduction to Stata(3)

You cannot include a variable list in a drop or keep command :

. keep hcn fsize if fsize<=6 invalid syntax r(198); You have use two commands to do the job: . keep if fsize<=6 . keep hhcode fsize

Page 13: A Brief Introduction to Stata(3)

3.4. Producing Graphs shows the distribution of the age of the

household head in a bar graph: . histogram age The number of bars may be increased, up

to a maximum of 50, by adding the option bin(#).

. histogram age, bin(12)

Page 14: A Brief Introduction to Stata(3)

scatter toinc age, t1(total income by age) saving(incage, replace) s(.)

Page 15: A Brief Introduction to Stata(3)

3.5. Combining Data Sets 3.5.1. Appending Data Sets . use popproj1 . sort year . save popproj1, replace

. use popproj2, clear . sort year . save popproj2, replace

Page 16: A Brief Introduction to Stata(3)

. use popproj1 . append using popproj2 . save, replace

Page 17: A Brief Introduction to Stata(3)

3.5.2. Merging Data Sets . use popproj1, clear . sort year . save, replace

use mmlaproj, clear . sort year . save, replace . merge year using popproj1 .save,replace

Page 18: A Brief Introduction to Stata(3)

. tab _merge _merge==1 obs. from master data

_merge==2 obs. from only one using dataset

_merge==3 obs. from at least two datasets,

master or using . keep if _merge==3

Page 19: A Brief Introduction to Stata(3)

Review . gen oldhead=1 if age>32 . replace oldhead=0 if age<=32 . gen maxexp=max(food,nfood) . egen maxexp=rmax(food nfood) . egen avgtotex=mean(totex) . egen avgtea=mean(totex), by(regn)

Page 20: A Brief Introduction to Stata(3)

. label variable oldhead "HH Head is over 32"

. des oldhead

. label data “FIES 2000”

. des

Page 21: A Brief Introduction to Stata(3)

. label define majlabel 3 "Mind" 2 "Vis" 1 "Luz"

. label values majisland majlabel . tab majisland

keep var1 var2 var3 (or keep var1-var3) drop var4 var5 var6 (or drop var4-var6) . drop if age>=80 . keep if fsize<=6 . drop in 1/20

Page 22: A Brief Introduction to Stata(3)

. histogram age . histogram age, bin(12)

. scatter toinc age, t1(total income by age) saving(incage, replace) s(.)

. merge .append

Page 23: A Brief Introduction to Stata(3)

4. Working with .log and .do 4. Working with .log and .do filesfiles

Page 24: A Brief Introduction to Stata(3)

4.1. Keeping track of work 4.2. Batch processing

Page 25: A Brief Introduction to Stata(3)

4.1. Keeping track of work .log using "c:\intropov\logfiles\log1.log” . log close The default extension name is "SMCL" to stan

d for a formatted log file. We can change these default to an ordinary "LOG" file, and say providing the name log1 in some appropriate folder, such as c:\intropov\logfiles. All commands issued in between plus corresponding outputs are saved in the .log file.

Page 26: A Brief Introduction to Stata(3)

The main advantage of using .do file instead of typing commands line by line is repeatability. Usually if it takes quite some steps to obtain the desired output, you should edit a .do file because you may need to do it tens of times.

Page 27: A Brief Introduction to Stata(3)

4.2. Batch processing dofile.doc