Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the...

34
Empirical Methods in Trade: Analyzing Trade Costs and Trade Facilitation June 2015 Bangkok, Thailand Cosimo Beverelli Simon Neumueller (ERSD/WTO) (ERSD/WTO) 1

Transcript of Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the...

Page 1: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

Empirical Methods in Trade: Analyzing Trade Costs and Trade Facilitation

June 2015

Bangkok, Thailand

Cosimo Beverelli Simon Neumueller

(ERSD/WTO) (ERSD/WTO)

1

Page 2: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

Content

2

a. Resources

b. Stata windows

c. Organization of the “Bangkok_June_2015\Stata” folder

d. The “directory_definition” do file

e. Datasets used in this introduction to Stata

f. Do files

g. Importing data into Stata

h. Basic commands

i. Merging datasets

j. Macros

k. Loops

l. Graphics

Page 3: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

a. Resources

3

1. Stata help and Stata manual

2. A variety of books covering Stata exist

Web resources:

1. Germán Rodríguez’s webpage • Data management, graphics and programming

2. UCLA IDRES’ webpage • Very comprehensive , covering all sorts of topics (data management,

analysis,…) with several examples

• FAQ

3. Statalist • Typically accessed via a google search

Page 4: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

b. Stata windows

4

Page 5: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in
Page 6: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

Type commands here

Page 7: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

Type commands here

List of variables and labels

Page 8: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

Type commands here Some properties of the variables

List of variables and labels

Page 9: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

Type commands here Some properties of the variables

List of variables and labels Actions taken in the session

Page 10: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

c. Organization of the “Bangkok_June_2015\Stata” folder

6

• The “Bangkok_June_2015\Stata” folder contains the following sub-folders: • data

• do_files

• results

• Do not change the name of the folder or of the sub-folders

• The first thing to do is to define your own working directory (see slide c.)

Page 11: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

d. The “directory_definition” do file

7

• To make things easy for all of us, we define a “path” for the working directory that can be easily changed, and will be changed only once and for all

• When you open Stata, the first thing to do is to open the “directory_definition” do-file in the do-file editor

• Click here:

• …or Ctrl+9

• The “directory_definition” looks like this and you have to change it to your own path

Page 12: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

e. Datasets and do files used in this introduction to Stata

8

• To apply some of the Stata commands described in this presentation, we will use two datasets:

• WDI.csv – a very small subset of the World Development indicators

• WB_ES.xls – derived from the World Bank Enterprise Surveys

• You can find the datasets in the “data” directory: "$BKK\data\Introduction_Stata"

• There are also 4 do files

• 01_data_intro_stata

• 02_descriptive

• 03_reshape_merge

• 04_loops

• You can find the do files in the directory: "$BKK\do_files\Introduction_Stata“

Page 13: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

f. Do files

9

• A do file is a set of Stata commands typed in a plain text file

• When you work with STATA, always use do files

• E.g. one do file for creating your master dataset and one do file for regressions

• Do files can also be used to set globals and directories or to run a series of different do files after each other

Page 14: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

Typical commands at beginning of each do file:

clear all /* removes all data in the current Stata session*/

set more off, perm /* prevents Stata to pause while running a do file */

capture log close /* closes a log file */

cd “directory” /* sets the directory, e.g.. “$BKK\data*/

log using “filename”, replace /* useful for long do files, allows printing */

use “dataset.dta”, replace /* open dataset in Stata format (.dta); “ ”

are not necessarily needed */

Page 15: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

Typical commands at beginning of each do file:

clear all /* removes all data in the current Stata session*/

set more off, perm /* prevents Stata to pause while running a do file */

capture log close /* closes a log file */

cd “directory” /* sets the directory, e.g.. “$BKK\data*/

log using “filename”, replace /* useful for long do files, allows printing */

use “dataset.dta”, replace /* open dataset in Stata format (.dta); “ ”

are not necessarily needed */

Notes

• “*” treats everything after it in a line as a comment

• “/* text */” will make Stata treat “text” as a comment (“text” can span over several lines)

Page 16: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

g. Importing data into Stata

11

• insheet using filename.csv, clear

• Typically used for text files that are either comma or tab-separated

• import excel using filename.xls, sheet(“Sheet1”) first clear

• Reads excel files directly into Stata

• Allows to specify variables, cell range and worksheet to import

• Copy paste is strongly discouraged. Watch out. The accuracy of copied numbers depends on:

o How data are formatted in excel, i.e. how many digits are shown

o Your settings in Stata (use set type double before copying)

To save the dataset (in Stata format): • save filename, replace

Page 17: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

h. Basic commands

12

• Identify missing values (represented by . (dot) or empty cell) • inspect varlist

• codebook varlist

• Identify duplicate observations • duplicates (report/drop/tag/list) varlist

• Identify number of unique values

• unique varlist

• Browse the dataset

• browse varlist

Page 18: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

Other basic commands

• describe

• generate

• destring/tostring

• replace

• rename

• Alternative: renvars

• keep

• drop

• The list can go on…what is important is to keep in mind that, in case of doubt, you can always use the “help”:

• help command

Page 19: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

List of useful operators commonly used in expressions

Arithmetic Logical Relational

+ add ! not (also ~) == equal

- subtract | or != not equal (also ~=)

* multiply & and < less than

/ divide <= less than or equal

^ raise to power > greater than

+ string concatenation >= greater than or equal

Page 20: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

Commands for descriptive statistics

• summarize varlist

• tabulate var1 var 2

• table rowvar (colvar), content()

• tabstat varlist, statistics() by()

Page 21: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

The egen command

• Often used command to create new variables

• Commonly used egen functions (refer to WB_ES dataset):

• bysort cou sector: egen sales_sec=total(sales), missing

• bysort cou sector: egen sales_sec=mean(sales)

• egen exp_tot=rowtotal(exp_intermediate exp_final)

• egen id_cluster=group(cou sector)

• egen cou_sec=concat(cou sector)

• See 02_descriptive.do

• Further functions include: max, min, count, tag,…

Page 22: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

String functions

• generate newvar =function()

• Some useful functions are:

• abbrev() –> shortens the string the number of indicated characters

• length() –> returns the length of the string, i.e. number of characters

• subinstr() –> allows to replace or delete particular substrings

• substr() –> allows to extract substrings based on its position

• upper (lower) –> Changes the entire string to upper-case (lower- case) strings

• trim() –> removes leading and trailing blanks of the string

Page 23: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

The collapse command

• collapse (mean) varlist (sum) varlist, by(varlist)

• Creates an aggregate dataset by e.g. averaging or summing variables across the dimension identified in by()

• All observations not included in the command are dropped

• Useful in analysis when moving to a higher level of aggregation, e.g. aggregating trade flows from HS 6-digit to HS 2-digit

• Useful for calculating descriptive statistics before exporting them to excel using export excel

Page 24: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

The collapse command

• collapse (mean) varlist (sum) varlist, by(varlist)

• Creates an aggregate dataset by e.g. averaging or summing variables across the dimension identified in by()

• All observations not included in the command are dropped

• Useful in analysis when moving to a higher level of aggregation, e.g. aggregating trade flows from HS 6-digit to HS 2-digit

• Useful for calculating descriptive statistics before exporting them to excel using export excel

• If you do not want to collapse, duplicates drop after bysort (): egen gives the same results as collapse. Example (see 02_descriptive.do): • collapse (mean) sales (sum) dummy_exp, by(cou sector)

is equivalent to:

• bysort cou sector: egen avg_sales = mean(sales)

• bysort cou sector: egen number_exporters =

total(dummy_exp)

• keep cou sector avg_sales number_exporters

• duplicates drop

Page 25: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

The reshape command

• reshape wide (long) ‘stub’, i(var) j(var) options

• Reshapes dataset from long to wide format and vice versa

• Data dimensions such as country, year or sector are normally put in long format

• ‘stub’ are variables in reshape wide and stubs of variables in reshape long

• i(var) are identifying dimensions; j(var) dimension to change

• Exercise: Open WDI.dta and reshape it first long and then wide (see 03_reshape_merge.do)

i j stub

1 1 4.1

1 2 4.5

2 1 3.3

2 2 3.0

i stub1 stub2

1 4.1 4.5

2 3.3 3.0

Long

Wide

Reshape

Page 26: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

i. Merging datasets

20

• To merge datasets you can use joinby or merge

• joinby varlist using filename, unmatched(both)

• The command forms all pairwise combination for varlist

• unmatched can keep unmatched observations from the master dataset, the using dataset or both (see generated variable _merge)

• Exercise: Open WB_ES.dta and merge it with WDI.dta (see 03_reshape_merge.do)

Page 27: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

j. Macros

21

• Macros are names associated with some text • The commands global and local assign strings to global and local macro names

• global mname [=exp | :extended_fcn | [`]"[string]"['] ] • Global macros, once defined, are available anywhere in Stata

• Evaluate it using $mname

• local lclname [=exp | :extended_fcn | [`]"[string]"['] ]

• Simplest example: local c USA JPN

• Evaluate it using `lclname'

• Local macros work only within the do file in which they are defined

• Globals and locals have a variety of uses

• To define the directories for this class, i.e. directory_definition.do

• They are used in loops (see next slides)

• A set of explanatory variables can be grouped under one macro name

Page 28: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

k. Loops

22

• See Stata help and Germán Rodríguez’s webpage

• Two main commands: foreach and forvalues

• foreach loops through strings of text, forvalues loops through numbers

• Syntax (3 alternatives): foreach item in a-list-of-things (e.g. a b d) {

commands referring `item‘ }

foreach varname of varlist list-of-variables {

commands referring to `varname‘ }

forvalues number = first(step)last {

commands referring to `number' }

Page 29: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

23

Examples for loops in WB_ES.dta

Ex1 foreach k in USA JPN { /* Loop over any_list */

egen sales_`k'=total(sales) if cou=="`k'"

}

Ex2 vallist cou, local(c) /* vallist shows values and creates

local */

foreach k of local c { /* Loop over a local macro */

capture drop sales_`k'

egen sales_`k'=total(sales) if cou=="`k'"

}

Ex3 forvalues k=1(1)3 { /* Loop over sector codes. The

range can be defined in different ways*/

egen total_`k'=total(sales) if sector==`k'

}

• foreach can also be used to loop over variables and numbers • foreach k of var varlist; foreach k of num numlist

Page 30: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

l. Graphics

24

• Useful links: official Stata or UCLA IDRES

• Histograms and bar graphs: • histogram var

• graph bar (stat) var, over(var)

Page 31: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

25

More graphics

• Distribution plots: • histogram var, frequency kdensity

• twoway kdensity var

• twoway (kdensity var1 if var2==“”) (kdensity var1 if var2

==“”), by(var)

Page 32: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

26

More graphics

• Scatter plots: • scatter yvar xvar

• graph twoway (scatter yvar xvar) (lfit yvar xvar)

Page 33: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

27

More graphics

• Scatter plots: • scatter yvar xvar

• graph twoway (scatter yvar xvar) (lfit yvar xvar)

Page 34: Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the angkok_June_2015 \Stata folder d. The directory_definition do file e. Datasets used in

28

More graphics

• Line plots: • line yvar year

• graph twoway (line yvar year if…) (line yvar year if …)