Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets...
Embed Size (px)
Transcript of Introduction to STATA - Welcome to United Nations … a) Resources b) Stata 11 Interface c) Datasets...
Revision of Stata basics in STATA 11: April, 2016
Dr. Selim Raihan Executive Director, SANEM
Professor, Department of Economics,
University of Dhaka
Contents a) Resources b) Stata 11 Interface c) Datasets used in this introduction to Stata d) Do files e) Importing data into Stata f) Review of Basic commands g) Merging datasets h) Graphics
Resources • Stata help
– Command: help command_name – For Example:
• To learn about merge command, type: help merge
• Stata manual • Web Resources:
– UCLA IDRE’s website: • http://www.ats.ucla.edu/stat/stata/ • Covers a variety of issues from data management to Regression Analysis
– Germán Rodríguez’s Website: • http://data.princeton.edu/stata/ • Covers data management, graphics and programming
Stata 11 Interface
Datasets used in this illustration
• In this illustration of the some of the basic commands of stata, we have used the following datasets: – WDI.csv a small subset of the World Development Indicators
– WB_ES_Firms.dta derived from the World Bank Enterprise Survey
– some created datasets for illustrating merge (data1, data2 and data3)
– stata’s stored datasets for illustrating graphs (sysuse lifeexp)
Do Files • A do file is a set of Stata Commands typed in a plain text file.
• Press Ctrl+8 to open a Do file. Or Open it from the Stata Window
New Do file Data Editor Data Browser
• To run a stata command in the do file, select the command lines, and press Ctrl+D. Or, press the execute button.
• Always use a do file
Commonly used commands at the beginning of a do file: Defining the Directory
• It is a good practice to define your own working directory first. It helps greatly when working in a collaboration.
• The syntax is as following:
– global name = “folder directory”
• global gravity = "D:\Gravity\stata“ /* Put your file destination within “” */
Once we have defined the main directory (here gravity), we can define the sub-directories easily:
• global data = "$gravity\data"
• global do_files = “$gravity\do_files"
• global results = “$gravity\results"
Commonly used commands at the beginning of a do file
clear all /* clears all data in the current Stata session*/ set more off, perm /* shows the full result at once without any break */ capture log close /* If there is any open log file, it will close that */ cd “directory” /* sets the directory, e.g. cd “$data” */ log using “filename”, replace /* opens a new log file */ use dataset.dta, replace /* open dataset in Stata format (.dta) */ Notes
• Stata treats everything after * as comment • /* text contents */ Stata will consider these text contents as lines of comments.
Commonly used commands: Defining Memory
• Stata’s default file size is as following: – Matrix size 400, maximum variables 5000, and memory 50 MB. – If your data set is larger than this memory, you will see the following error message
– Therefore, you will have to reset the memory:
• In this case, you will have to reset your stata’s memory: • set matsize 10000, perm • set maxvar 20000, perm • set memory 500m, perm
• Note: writing perm at the end will change the stata memory permanently
Importing data into Stata • For files in .csv format, we use the command:
– Insheet using filename.csv, clear
• Stata 13 and onwards can directly import excel files to the stata: – Import excel using filename.xls, sheet (“sheet_name”) first clear
• However, in Stata 11, first you will have to change your xls/xlsx file into csv format file to open it directly in the stata:
• Open the Excel file > Go to the File > Select Save As > Select CSV (comma delimited) from the save as type > save. Now, you can open this CSV file in the stata using the insheet command.
• You can also use the software StatTransfer to transform the data from Excel to stata format.
• Copy paste from excel to stata is strongly discouraged as the accuracy of it may depend upon the data format in excel and data format settings in Stata
• To save the dataset in stata format: – Save filename, replace
Review of Basic Commands • For installing new packages:
– ssc install package_name – Example: to install the feature of unique command, type: ssc install unique
• To identify the missing values (represented by . (dot) or empty cell) you may use any of the following: – Inspect varlist – Codebook varlist
• To identify the number of unique values in a variable: – unique varlist
Review of Basic Commands • To browse the dataset:
– browse varlist /* Instead of typing browse, you can type br also: br varlist */
• To edit the dataset: – edit varlist /*short form: ed varlist */
• To describe the variables: – describe varlist /* short form: des varlist */
• To generate a new variable: – generate variable_name – Example: gen ln_sale = ln (sale)
• To Generate a dummy variable: Commmand: gen variable_name = (desired variable == some value/ “name”) Example: gen d_MN = (country == “Mongolia”)
Review of Basic Commands • To transform a variable into a numeric variable from string (alpha-numeric)
variable: – destring variable_name, gen (new_variable_name) – Or, destring variable_name, replace – You can also use the encode command in this purpose: encode var_name, gen
• To rename a variable: – rename var_name new_var_name Example: rename d_MN d_Mongolia
• To replace a variable content: – replace var_name= value Example: replace trade = 1 if trade == . | trade == 0
Review of Basic Commands • If you want to keep only selected observations then you can use the command “keep”:
– keep if varible_name == some_value
– For example: keep if year == 2012 /* only the observations of 2012 will be kept*/
• You can use the ‘drop’ command too :
– drop if year != 2012
– drop variable_list /* It will drop all the variables mentioned in the list*/
• To re-order the list of variables in the variables window:
– order variable_list
– Example: order reporter partner year gdp gdppc
/* this command will reorder these variables serially in the stata’s variable window */
Review of Basic Commands • To sort the variables:
– sort variable_name(s)
– Example: sort i_country
/* Exporter countries will be sorted alphabetically*/
Review of Basic Commands
• Basic Operators: – Relational:
== equal to
!= (Alternatively, ~=) not equal to
< less than
> greater than
>= greater than or equal
Review of Basic Commands
• Basic Operators: – Logical
^ Raise to the power
Review of Basic Commands • Example of use of operator:
• gen bbin_dummy = 0 • replace bbin_dummy = 1 if i_country == “Bangladesh” | i_country
== “Bhutan” | i_country == “India” | i_country == “Nepal”
• Note: here, as reporter is a string variable, hence we had to use “”. • Also keep in mind that, stata is case sensitive. If the country name
in the data differs from the country name in the command, stata will not run the command.
Review of Basic Commands • Descriptive Statistics: • To find the summarizes of variables:
– summarize variable_lists /*short form: sum variable_list*/
• To find the detailed summarizes of the variables: – sum variable_list, detail /* it will return mean, median,
kurtosis, skewness etc*/
• To tabulate variables of interest: – tab variable_name, – tab variable1 variable2, cell – tab row_variable (column_variable), content ()
Merging Datasets • Merge adds variables to a dataset by joining two datasets together.
• In merge, we join two datasets:
– Master file: the data file with which we will merge the other file – Using file: the data file we will be merging with the master file.
• To merge a using file with a master file, they must have: – At least a common variable based on which we will merge. – The variable must be in the same format in both the master file and the using
file. • If they are in string (alphanumeric) format in the both the files, their spelling must be
same (i.e. country names, etc).
• The common variables must have the same name.
Merging Datasets • Merge could be of four types:
– 1:1 merge /* Here we merge each item of the master file with the correspondent using file variable(s) */
– Command: • merge 1:1 var_list using “file_directory of the using file” • Example:
– cd “$data\stata_introduction” – use data1.dta /*master file*/ – sort country year – merge 1:1 country year using data2
– If the master file and the using file are not located in the same directory then we have to specify that directory:
– merge 1:1 country year using "F:\Gravity Data\merge example\data2.dta”
Merging Datasets: Example 1:1 merge Master file: data1 Using file: data2
Merging Datasets: Example 1:1 merge
The variables those were present only in the master file, or in the using file are
merged, however, created a missing value.
• m:1 merge: we use it when we need to merge 1 item of the using file with many items of the master file.
• Command :
– merge m:1 var_list using “file_directory"
– Example: • merge m:1 country using "F:\Gravity Data\merge
Merging Datasets: Example m:1
Master file: data1
Using file: data3
merge m:1 country
Merging Datasets • 1:m merge: When we want to merge every item of the using file with 1 item of the
• merge 1:m var_list using “file_directory of the using file”
• m:m merge: When we merge every item of the master file with every item of the using file; similarly every item of the using file will be merged with every item of the master file.
• merge m:m var_list using “file_directory of the using file”
– Be careful when you apply m:m merge as it may change your data matrix in an unexpected way.
• Scatter plot:
• Let’s use the following stata’s stored file for this purpose:
– sysuse lifeexp
• Now, let’s see the relationship between per capita GNP and life expectancy at birth:
• twoway (scatter y_var x_var) (lfit y_var x_var)
• twoway (scatter lexp gnppc) (lfit lexp gnppc)
– histogram var
– histogram lexp
• Distribution plots:
– histogram var, frequency kdensity
– twoway kdensity var
– twoway kdensity lexp
• Note: Kdensity refers to kernel density function