STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard...
-
Upload
sheena-simmons -
Category
Documents
-
view
218 -
download
0
Transcript of STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard...
![Page 1: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/1.jpg)
STATA for S-052
M. Shane Tutwiler
Your Friendly S-040 Lecturer
William Johnston
IT Services
Harvard Graduate School of Education
![Page 2: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/2.jpg)
Getting the files
The do-file used in this workshop as well as all data files are in the Stata Help tab of the course iSite.
– Download SATdata.csv, auto.dta and Stata for S-052.do and save them to a new folder called Stata_Workshop on your desktop or on a usb drive.
![Page 3: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/3.jpg)
• Office: Gutman 324
• Email: – [email protected]
• Want to set up a consultation? – hgse.service-now.com/ess/research.do
• Want to learn more on your own?– itservices.gse.harvard.edu/its/services/research-online-
resources/stata
Contact Information
![Page 4: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/4.jpg)
Agenda: Overview
I. Overview of Stata
II. Getting Started
III. ‘Do’ files
IV. Basic data cleaning
V. Basic data management
VI. Beginning analysis
VII. Questions
![Page 5: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/5.jpg)
Getting Help in Stata
• Many pathways to getting help in Stata:
. help command
. search command
. findit command
• Use the help menu• Look online with a web browser• Set up an appointment
• ([email protected])!
![Page 6: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/6.jpg)
Some notes
• A word about programming in and using Stata
• Stata is case sensitive, so Myvar is different from myvar
• All commands in Stata are lower-case
• and = “&“, or = “|“, not = “! “
• Assignment is “=“ , value equivalency is “==“
• Missing values are coded as extremely large numbers, and are represented by a . or a blank
![Page 7: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/7.jpg)
How to Begin a Session?
• Specify your directory
– cd “_______”
• Begin using a log file
– log using “______.log”
• Open your data and look at it
– insheet using “SATdata.csv”, comma
– browse
– describe
![Page 8: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/8.jpg)
Anatomy of a Stata Command
• Stata commands follow a pattern:
• [prefix:] command [varlist] [if] [in] [weight ] [, options]
• For example: • bysort region: summarize expense, detail• mean csat if income >= 30000 & region != .• list state in 1/10, nolabel
![Page 9: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/9.jpg)
Getting Started
• Opening Data• Stata formatted data (.dta) : use “file name”
• Comma-separated variables: insheet using “file name”, comma
• Tab-delimited variables: insheet using “file name”, tab
• Web-based data files: webuse “web location”
• Flat-files: Create a dictionary {beyond the scope of this
workshop}
![Page 10: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/10.jpg)
Looking at Data
• Look at your data – did our data import correctly?
• How are our data measured?• What kinds of variables do we have?
• Editor. edit
• Browser. browse
• Other commands. codebook. describe
![Page 11: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/11.jpg)
Examining Data
• There are several ways to look at our data in Stata• How would we describe the distribution of our data?
• Graphs of distribution• Histograms
• histogram• Scatterplots
• scatter
• Charts/Tables of frequency and distribution• Frequency tables
• table• Cross-tabs
• tabulate
![Page 12: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/12.jpg)
Basic Data Operations, part 1
• Generating a new variable
gen newvarname=expression
• Subsetting• keep varlist• drop varlist • if
• Joining Two Datasets
. Merge• Note—this is covered in detail in the Data Management
Workshop!
![Page 13: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/13.jpg)
Basic Data Operations part 2
• Labeling
• To label a variable: label variable varname labelname
• To label values:
. label define labelname 1 ‘high’ 0 ’low’ . label value variable labelname
• Renaming
. rename varname1 varname2
• Replacing values of an already generated variable
. replace newvarname=expression
![Page 14: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/14.jpg)
Apply Your Knowledge
• Use the SATdata dataset
• Generate a dichotomous variable called hi_score from the csat variable, where a value of 1 indicates a score of greater than 922 and a 0 is less than or equal to 922.
• Label it as 0=low and 1=high.
![Page 15: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/15.jpg)
Agenda
I. Overview of Stata
II. Getting Started
III. ‘Do’ files
IV. Basic data cleaning
V. Basic data management
VI. Beginning analysis
VII. Questions
![Page 16: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/16.jpg)
Beginning Analysis
• Useful commands
• Looking at Distributions• table, histogram, summarize
• Testing the Normality Assumption• sktest, ladder, gladder
• Beginning to Look at Relationships• tabulate, pwcorr, ttest, anova
![Page 17: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/17.jpg)
Apply Your Knowledge
• Generate a histogram of the expense variable.
• Generate a two-way table to see if distributions are the same or different for the values of expense by the different values of your newly created hi_score variable.
• If you have time, see if there is a significant correlation between scores on SATs and the average amount of money that each state spends on education (expense).
![Page 18: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/18.jpg)
Building Regression Models
• Regression models
• Linear regression• regress depvar indepvar1 indepvar2 …
• Logistic Regression• logit depvar indepvar1 indepvar2 …
![Page 19: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/19.jpg)
Apply Your Knowledge
• Generate two scatterplots – one to look at the relationship between expense and csat , one to look at expense and hi_score.
• Depending on your estimation of the relationship (linear or not), run the appropriate regression to test for the relative effect of expense on either csat scores or hi_scores.
![Page 20: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/20.jpg)
Saving data, code, and output
• Saving your newly transformed data• save “pathname\filename.dta”• outsheet using “pathname\filename”
• Saving your code• SAVE YOUR DO-FILE!!!!!
• Saving your output• create a log file
• . log using “pathname\filename”• . log close (!!!!) Not closing = not saving!
• Saving graphs• . graph save
![Page 21: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/21.jpg)
Agenda: Overview
I. Overview of Stata
II. Getting Started
III. ‘Do’ files
IV. Basic data cleaning
V. Basic data management
VI. Beginning analysis
VII. Questions
![Page 22: STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bf731a28abf838c7f295/html5/thumbnails/22.jpg)
Thanks!
Questions?
Gutman Library, room 323a
http://itservices.gse.harvard.edu/its/services/research