Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets...

29
Revision of Stata basics in STATA 11: April, 2016 Dr. Selim Raihan Executive Director, SANEM Professor, Department of Economics, University of Dhaka

Transcript of Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets...

Page 1: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Revision of Stata basics in STATA 11: April, 2016

Dr. Selim Raihan Executive Director, SANEM

Professor, Department of Economics,

University of Dhaka

Page 2: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Contents a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f) Review of Basic commands g) Merging datasets h) Graphics

Page 3: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Resources • Stata help

– Command: help command_name – For Example:

• To learn about merge command, type: help merge

• Stata manual • Web Resources:

– UCLA IDRE’s website: • http://www.ats.ucla.edu/stat/stata/ • Covers a variety of issues from data management to Regression Analysis

– Germán Rodríguez’s Website: • http://data.princeton.edu/stata/ • Covers data management, graphics and programming

Page 4: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Stata 11 Interface

Page 5: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Datasets used in this illustration

• In this illustration of the some of the basic commands of stata, we have used the following datasets: – WDI.csv a small subset of the World Development Indicators

– WB_ES_Firms.dta derived from the World Bank Enterprise Survey

– some created datasets for illustrating merge (data1, data2 and data3)

– stata’s stored datasets for illustrating graphs (sysuse lifeexp)

Page 6: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Do Files • A do file is a set of Stata Commands typed in a plain text file.

• Press Ctrl+8 to open a Do file. Or Open it from the Stata Window

New Do file Data Editor Data Browser

• To run a stata command in the do file, select the command lines, and press Ctrl+D. Or, press the execute button.

• Always use a do file

Page 7: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Commonly used commands at the beginning of a do file: Defining the Directory

• It is a good practice to define your own working directory first. It helps greatly when working in a collaboration.

• The syntax is as following:

– global name = “folder directory”

For Example:

• global gravity = "D:\Gravity\stata“ /* Put your file destination within “” */

Once we have defined the main directory (here gravity), we can define the sub-directories easily:

• global data = "$gravity\data"

• global do_files = “$gravity\do_files"

• global results = “$gravity\results"

Page 8: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Commonly used commands at the beginning of a do file

clear all /* clears all data in the current Stata session*/ set more off, perm /* shows the full result at once without any break */ capture log close /* If there is any open log file, it will close that */ cd “directory” /* sets the directory, e.g. cd “$data” */ log using “filename”, replace /* opens a new log file */ use dataset.dta, replace /* open dataset in Stata format (.dta) */ Notes

• Stata treats everything after * as comment • /* text contents */ Stata will consider these text contents as lines of comments.

Page 9: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Commonly used commands: Defining Memory

• Stata’s default file size is as following: – Matrix size 400, maximum variables 5000, and memory 50 MB. – If your data set is larger than this memory, you will see the following error message

– Therefore, you will have to reset the memory:

• In this case, you will have to reset your stata’s memory: • set matsize 10000, perm • set maxvar 20000, perm • set memory 500m, perm

• Note: writing perm at the end will change the stata memory permanently

Page 10: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Importing data into Stata • For files in .csv format, we use the command:

– Insheet using filename.csv, clear

• Stata 13 and onwards can directly import excel files to the stata: – Import excel using filename.xls, sheet (“sheet_name”) first clear

• However, in Stata 11, first you will have to change your xls/xlsx file into csv format file to open it directly in the stata:

• Open the Excel file > Go to the File > Select Save As > Select CSV (comma delimited) from the save as type > save. Now, you can open this CSV file in the stata using the insheet command.

• You can also use the software StatTransfer to transform the data from Excel to stata format.

• Copy paste from excel to stata is strongly discouraged as the accuracy of it may depend upon the data format in excel and data format settings in Stata

• To save the dataset in stata format: – Save filename, replace

Page 11: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Review of Basic Commands • For installing new packages:

– ssc install package_name – Example: to install the feature of unique command, type: ssc install unique

• To identify the missing values (represented by . (dot) or empty cell) you may use any of the following: – Inspect varlist – Codebook varlist

• To identify the number of unique values in a variable: – unique varlist

Page 12: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Review of Basic Commands • To browse the dataset:

– browse varlist /* Instead of typing browse, you can type br also: br varlist */

• To edit the dataset: – edit varlist /*short form: ed varlist */

• To describe the variables: – describe varlist /* short form: des varlist */

• To generate a new variable: – generate variable_name – Example: gen ln_sale = ln (sale)

• To Generate a dummy variable: Commmand: gen variable_name = (desired variable == some value/ “name”) Example: gen d_MN = (country == “Mongolia”)

Page 13: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Review of Basic Commands • To transform a variable into a numeric variable from string (alpha-numeric)

variable: – destring variable_name, gen (new_variable_name) – Or, destring variable_name, replace – You can also use the encode command in this purpose: encode var_name, gen

(new_variable_name)

• To rename a variable: – rename var_name new_var_name Example: rename d_MN d_Mongolia

• To replace a variable content: – replace var_name= value Example: replace trade = 1 if trade == . | trade == 0

Page 14: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Review of Basic Commands • If you want to keep only selected observations then you can use the command “keep”:

– keep if varible_name == some_value

– For example: keep if year == 2012 /* only the observations of 2012 will be kept*/

• You can use the ‘drop’ command too :

– drop if year != 2012

– drop variable_list /* It will drop all the variables mentioned in the list*/

• To re-order the list of variables in the variables window:

– order variable_list

– Example: order reporter partner year gdp gdppc

/* this command will reorder these variables serially in the stata’s variable window */

Page 15: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Review of Basic Commands • To sort the variables:

– sort variable_name(s)

– Example: sort i_country

/* Exporter countries will be sorted alphabetically*/

Page 16: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Review of Basic Commands

• Basic Operators: – Relational:

Symbol Operator

== equal to

!= (Alternatively, ~=) not equal to

< less than

> greater than

>= greater than or equal

Page 17: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Review of Basic Commands

• Basic Operators: – Logical

– Arithmetic

Symbol Operator

! Not

| Or

& And

Symbol Operator

+ Addition

- Subtraction

* Multiplication

/ Division

^ Raise to the power

Page 18: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Review of Basic Commands • Example of use of operator:

• gen bbin_dummy = 0 • replace bbin_dummy = 1 if i_country == “Bangladesh” | i_country

== “Bhutan” | i_country == “India” | i_country == “Nepal”

• Note: here, as reporter is a string variable, hence we had to use “”. • Also keep in mind that, stata is case sensitive. If the country name

in the data differs from the country name in the command, stata will not run the command.

Page 19: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Review of Basic Commands • Descriptive Statistics: • To find the summarizes of variables:

– summarize variable_lists /*short form: sum variable_list*/

• To find the detailed summarizes of the variables: – sum variable_list, detail /* it will return mean, median,

kurtosis, skewness etc*/

• To tabulate variables of interest: – tab variable_name, – tab variable1 variable2, cell – tab row_variable (column_variable), content ()

Page 20: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Merging Datasets • Merge adds variables to a dataset by joining two datasets together.

• In merge, we join two datasets:

– Master file: the data file with which we will merge the other file – Using file: the data file we will be merging with the master file.

• To merge a using file with a master file, they must have: – At least a common variable based on which we will merge. – The variable must be in the same format in both the master file and the using

file. • If they are in string (alphanumeric) format in the both the files, their spelling must be

same (i.e. country names, etc).

• The common variables must have the same name.

Page 21: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Merging Datasets • Merge could be of four types:

– 1:1 merge /* Here we merge each item of the master file with the correspondent using file variable(s) */

– Command: • merge 1:1 var_list using “file_directory of the using file” • Example:

– cd “$data\stata_introduction” – use data1.dta /*master file*/ – sort country year – merge 1:1 country year using data2

– If the master file and the using file are not located in the same directory then we have to specify that directory:

– merge 1:1 country year using "F:\Gravity Data\merge example\data2.dta”

Page 22: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Merging Datasets: Example 1:1 merge Master file: data1 Using file: data2

Merged dataset

Page 23: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Merging Datasets: Example 1:1 merge

The variables those were present only in the master file, or in the using file are

merged, however, created a missing value.

Page 24: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Merging Datasets

• m:1 merge: we use it when we need to merge 1 item of the using file with many items of the master file.

• Command :

– merge m:1 var_list using “file_directory"

– Example: • merge m:1 country using "F:\Gravity Data\merge

example\data3.dta"

Page 25: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Merging Datasets: Example m:1

Master file: data1

Using file: data3

Merged dataset

merge m:1 country

using "F:\Gravity

Data\merge

example\data3.dta

"

Page 26: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Merging Datasets • 1:m merge: When we want to merge every item of the using file with 1 item of the

master file.

– Command:

• merge 1:m var_list using “file_directory of the using file”

• m:m merge: When we merge every item of the master file with every item of the using file; similarly every item of the using file will be merged with every item of the master file.

– Command:

• merge m:m var_list using “file_directory of the using file”

– Be careful when you apply m:m merge as it may change your data matrix in an unexpected way.

Page 27: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Graphics

• Scatter plot:

• Let’s use the following stata’s stored file for this purpose:

– sysuse lifeexp

• Now, let’s see the relationship between per capita GNP and life expectancy at birth:

• Command:

• twoway (scatter y_var x_var) (lfit y_var x_var)

• twoway (scatter lexp gnppc) (lfit lexp gnppc)

Page 28: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Graphics

• Histogram:

• command:

– histogram var

• Example:

– histogram lexp

Page 29: Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f)

Graphics

• Distribution plots:

• Commands:

– histogram var, frequency kdensity

– twoway kdensity var

• Example:

– twoway kdensity lexp

• Note: Kdensity refers to kernel density function