Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University...

28
Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University [email protected] [email protected]

Transcript of Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University...

Page 1: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

Teaching with Stata

Peter A. Lachenbruch&

Alan C. AcockOregon State University

[email protected]@oregonstate.edu

Page 2: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 2

First Course Requirement—Data Entry

• I want a first course to be able to do the things I want students to do:– Enter and edit data--must be “want to know topic”– Students can do a small survey to get data on topics

of interest to them. • Voter poll• Attitudes toward diversity issues on campus• Beliefs about regulating the internet

– Learn how to create a codebook, use codebook and codebook, compact

• Where possible use “real” data

Page 3: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 3

First Course Requirement—Data Management

• Balance statistical content with proper data management content—hard decision

• Storing original dataset and creating a working dataset

• Keeping a record of every data modification they make using do-file– Menu system is an aid– Do-files are the requirement

• Missing values--distinguish types• Variable names, labels, and value labels

Page 4: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 4

First Course Requirements—Data Management

• Transformations – log, , exp• Logical editing – beware of logical

transformations when missing values are present (gen y = x < 10 leads to “.” transforming to 0)

• Appending – Append student generated datasets

• Merging– Merging two waves of data

Page 5: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 5

First Course Requirements—Data Management

• Constructing Measures– When to use egen newvar =rowtotal(var1, var2, var3)

– When to use egen newvar =rowmean(var1, var2, var3)– When to use misschk command, what it does

• Suppose the variable category is 0 or 1• If there are missing values in category, there is a

difference between– gen y = 1 if category– gen y = 1 if (category==1)– gen y = 1 if (category>0)– The first and third will give scores of 1 for missing values. The

second will give a score of 0 for missing values - BEWARE

Page 6: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 6

First Course Requirements—Data Management

• edit command, insheet input, infile (csv files)

• gen newvar = ln(oldvar)• Rarely use replace oldvar = sqrt(oldvar)

– only when correcting an error – don’t replace data

• merge ptid assessment using file, update (need for data to be sorted)

Page 7: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 7

First Course Requirement (2)

– Data presentation, numerical summary measures – summarize, detail; list; browse; edit; describe; codebook; codebook, compact

– Graphic presentation--bar chart, histogram, box plot seem minimum

– Probability computations – binomial, binomialtail, chi2, chi2tail, F, Ftail, normal – use of the inverse functions for these.

Page 8: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 8

Examples

• summarize sp,detail; list sp; describe s*; codebook s*

• display binomial(10,3,0.1) for cumulative or display Binomial(10,3,.1) for reverse cumulative; Note disp 1-binomial(10,2,.1) gives the same result (also binomialtail(10,3,.1)

• display normal(1.2)• gen y = invnormal(uniform())*5+20

Page 9: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 9

First Course Requirement (3)

• Confidence intervals– Binomial – ci—ci variable– Normal – ci—ci variable– Poisson – ci—ci variable, poisson

• Percentiles – – summarize,d– centile price, c(10(10)90)

Page 10: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 10

Examples

• cii 20 4;

– cii 20 4, agresti • Sometimes we want to use the Agresti formulation. The

exact is usually preferable

• ci varname, level(99)• summarize weakness, detail

– Can use su weakn,d (i.e. abbreviate commands, options and variables)

• centile weakness,c(20,40,60,80)– Or centile weakness,c(20(20)80)

Page 11: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 11

First Course Requirements (4)

• Hypothesis Testing:– Normal r.v.s

• One sample (including paired data) -

• Two sample - ttest• K samples – ANOVA

– Binomial variables• One sample – proportion• Two samples – tabulate, chi2

Page 12: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 12

Examples• ttest sp = 120 [one-sample]• ttest spmen = spfem [paired]• ttest spmen = spfem, unpaired unequal welch

• ttest sp, by(sex) [unequal welch etc.]• Also immediate form – see help• anova sp agegrp

Page 13: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 13

Examples

• bitest success = 0.8 [one sample binomial]

• tabulate success group, chi2 row col

• prtest success, by(group) [two sample binomial]

Page 14: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 14

First Course Requirements (5)

• Hypothesis Testing (cont.)– Power considerations – sampsi (or spreadsheet – nice exercise for some good ones)

– Nonparametric methods – sign, signrank, ranksum

• Contingency tables – tabulate, epitab

Page 15: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 15

Examples

• sampsi 132.86 127.44, p(0.8) r(2) sd1(15.34) sd2(18.23)

• ranksum sp, by(survive)• signrank before = after• When should we supplement Stata with other

software such as G*power 3 that is free and more flexible than sampsi or other software such as PASS or nQuery Advisor?

Page 16: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 16

First Course Requirements (6)

• Simple linear regression – regress, rvfplot, other diagnostics

• Correlation – corr, spearman, ktau – I tend not to use corr because of the sensitivity to the normality assumption for tests and confidence intervals

• Only pwcorr and not corr provide test of significance

Page 17: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 17

Examples

• regress mpg weight• rvfplot• Stata’s “type a little, get a little” very different

from other packages• correlate mpg weight or pwcorr mpg weight (especially when you have more than 2 variables – can specify sig and obs—Note that these only work with pwcorr)

• spearman mpg weight – would be nice to have Stata produce a Spearman correlation matrix

Page 18: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 18

Examples

• It’s easy to use permutation tests

. permute anyhcq t=r(t):ttest ald7 if adult==1 & assnum==1,by(anyhcq) (running ttest on estimation sample)

Monte Carlo permutation results Number of obs = 97

command: ttest ald7, by(anyhcq)

t: r(t)

permute var: anyhcq

---------------------------------------------------------------------------

T | T(obs) c n p=c/n SE(p) [95% Conf. Interval]

-------------+-------------------------------------------------------------

t | 1.648305 13 100 0.1300 0.0336 .071073 .2120407

---------------------------------------------------------------------------

Note: confidence interval is with respect to p=c/n.

Note: c = #{|T| >= |T(obs)|}

• One can do similar things with the bootstrap• These are easy to use and intuitive for students

Page 19: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 19

Use of Stata in the Classroom

• Use Stata sparingly– It’s not easy to follow commands typed or used from

menus – students will get confused– Have handouts of what you do – make spacing large

enough that students can annotate – even if only to write nasty things about the instructor

– Balancing coverage of Stata, e.g. data management with coverage of Statistics is a constant issue

– Remember – it’s a course in statistics, not in Stata

Page 20: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 20

Data Sets

• Place data sets on a LAN or common drive or available for copying to flash drive or CD

• Use real data – Not too many variables– May have missing values – but should not

affect main analyses – unless you want to demonstrate the problems with missing values

Page 21: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 21

In the Classroom

• Using CD rather than flash drive is better(?)– Many desktops have USB port located

inconveniently (darn you Dell!)– Sometimes newer PCs have USB port on

monitor, and laptops usually have an easy slot for the flash drive

– Light level in the room should allow students to read easily

– Days of dim projectors are over

Page 22: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 22

In the Classroom (2)

• Enlarge the Stata font by using right mouse button– I have found that 14 point is pretty good– Be careful about wraparound of output – if

needed, reduce point size temporarily– Don’t ever use red on blue font– See what I mean? It’s more difficult to read

• Show how to move and fix windows

Page 23: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 23

In the Classroom (2)

• Optimizing visibility with projector– Use rich color background– EditPreferencesGeneral preferences.

Blue background option good but it relies on red for errors, green for Standard text, and doesn’t bold fonts.

– Custom may be better because you can make fonts bold and pick colors that do not disadvantage students who are colorblind.

Page 24: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 24

Virtual Lab

• A server supporting 30 simultaneous sessions of Stata is remarkably inexpensive.

• A department can require students to have laptops or provide a cart with enough laptops

• Because laptops are really “dumb” terminals with server, the laptops can be cheap and not updated very often

• Any room becomes a lab• Students should have 24/7 access to the server

Page 25: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 25

Handouts and Data Sets

• Have handouts of your lecture notes • Have handouts of your data analysis

demonstrations– Include commands as well as output!

• Data sets– On line – LAN or CD or Floppy disk --Lots of laptops

don’t have floppy drives any more, flash drives are inexpensive

• Include– Student generated datasets– Datasets with large Ns and relatively few variables

Page 26: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 26

Emphasis in Course

• Lectures devoted to statistics

• Labs to learning Stata and working on homework and discussion

• Proper printing of output– Don’t split output between two pages if

possible (at least, find a good break point)– Always use a monotype font (such as Courier

New)

Page 27: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 27

Some Final Issues

• Multiple testing can distort inference (i.e. doing 100 tests guarantees some significant results – but they may be meaningless) – Worry about this

• Controlling the digits in the output. Use outreg, estout, esttab

Page 28: Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University peter.lachenbruch@oregonstate.edu alan.acock@oregonstate.edu.

WCSUG Presentation 28

The End