Lab 1 - Introduction to Stata

13
Lab 1 - Introduction to Stata Spring 2018 Contents 1 Stata 2 1.1 What is Stata? ....................................... 2 1.2 How to get Stata ...................................... 2 1.3 Stata’s window ....................................... 2 2 Use of directories on AppsStorage 3 2.1 Managing Directories: creating a directory “EC2227” .................. 4 2.2 Working Directories: make “EC2227” your working directory ............. 5 3 Accessing Stata Off Campus 6 4 Loading Data 7 5 Exploring your data 8 6 Browse/Edit 9 7 Save and open your own dataset 12 8 Stata help 12 1

Transcript of Lab 1 - Introduction to Stata

Page 1: Lab 1 - Introduction to Stata

Lab 1 - Introduction to Stata

Spring 2018

Contents

1 Stata 2

1.1 What is Stata? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 How to get Stata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Stata’s window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Use of directories on AppsStorage 3

2.1 Managing Directories: creating a directory “EC2227” . . . . . . . . . . . . . . . . . . 4

2.2 Working Directories: make “EC2227” your working directory . . . . . . . . . . . . . 5

3 Accessing Stata Off Campus 6

4 Loading Data 7

5 Exploring your data 8

6 Browse/Edit 9

7 Save and open your own dataset 12

8 Stata help 12

1

Page 2: Lab 1 - Introduction to Stata

1 Stata

1.1 What is Stata?

• Stata is a statistical programming language that runs on Windows, Mac, and Unix that youwill be using in this class to analyze data, create graphs, and run statistical tests

• Technically, you can use Stata by pointing and clicking on the menu options, but in order toconduct reproducible, scientifically-sound social research, in this class we will be learning howto write “do-files”, or scripts that instantly and automatically give you all the calculationsyou want!

1.2 How to get Stata

• You can purchase a copy of Stata if you want, but BC lets you use it for free on the BCApplications Server, either in the computer lab or on your laptop

• If this is your first time using the Applications Server, you will have to install the CitrixReceiver (instructions can be found at http://www.bc.edu/offices/help/teaching/app_

server.html)

• Stata runs exactly the same on Mac OS X or Windows, so you can use either

1.3 Stata’s window

You will see five windows when you start Stata 15 (see Figure 1 below):

Figure 1: Stata screen

1. Results: the big window in the middle, labeled “Results”, is where all output is displayed.

2

Page 3: Lab 1 - Introduction to Stata

2. Review: the window on the left, labeled “Review” shows all past commands you ran duringthe current Stata session - if you need to go back and redo something you did earlier, justclick on your previous command in the Review window.

3. Command : the window on the bottom, labeled “Command ” is where you tell Stata whatto do.

4. Variables: the window on the bottom right corner, labeled “Variables”, shows all the vari-ables that are currently in memory - when you start up Stata, it will be empty. When datais loaded, if you click on a variable’s name, the variable name will show up in the Commandwindow.

5. Properties: the window on the top right corner, labeled “Properties”, gives specific infor-mation on a variable, data and memory usage.

Note: The Results Review and Command windows are the most useful!

2 Use of directories on AppsStorage

• When working within any of the applications on the BC Application Server, your files aresaved by default to AppsStorage.bc.edu.

• Using Citrix Receiver, Stata launches in the L: directory on the AppsStorage.Note: this is not your computer, but the AppsStorage space.

• ITS strongly recommends that while working with Stata on the Apps server, you save yourfiles to the AppsStorage

• To easily access/copy/move files from/to the L: drive as if it were a hard drive on your laptopyou need to:

– “map” the L: drive on your computer (PC users)

– connect to the L: drive on your computer (MAC users)

Note: today you learned how to map or connect to the L: drive. For future reference seethe instructions on how to do that at: http://www.bc.edu/content/bc/offices/help/

teaching/app_server/apps-files/files-map.html)

• Once you map/connect to the L: drive, you can see the \\appsstorage.bc.edu on yourcomputer under SHARED for Macs, and COMPUTER → NETWORK LOCATION for PCs(see Figures 2 and 3).

3

Page 4: Lab 1 - Introduction to Stata

Figure 2: Viewing the L: drive (mac)

Figure 3: Viewing the L: drive (PC)

2.1 Managing Directories: creating a directory “EC2227”

Let’s create a directory “EC2227” on the AppsStorage sever. It is encouraged that you saveall your work from the labs there.

There are several ways to create a folder on the AppsStorage server. For example, you can:

• open the AppsStorage directory on your computer and create a folder there.

• create a folder through the Stata interface selecting: File → Change Working Directory... →Computer → your_username(\\appsstorage.bc.edu)(L:). Then Select the button MakeNew Folder and change the name of the folder from “New Folder” to “EC2227” (See figuresbelow)

Figure 4: Organizing files on the L: drive

4

Page 5: Lab 1 - Introduction to Stata

Figure 5: Create a file on the L: drive through Stata

2.2 Working Directories: make “EC2227” your working directory

• It is important that you set your working directory every time you use Stata so you knowwhere your work is saved. Note also that Stata will save all plots, worksheets, etc. by defaultto the working directory you selected.

• To change your working directory to the “EC2227” folder you can use the following command:

cd L:\EC2227

cd stands for Change Directory

The command will be very handy once we start using Do-Files.

• Or you can select File → Change Working Directory...

• To change your working directory to a folder on your personal computer, select File→ ChangeWorking Directory... From the new window select “Local Disk(C: on BC-your username)”and the folder you wish on your computer, say your desktop (see Fig.6 below)

• NOTE: Whether you decide to save files on the AppsStorage (recommended) or your owncomputer, always make sure you know where you save your work!

5

Page 6: Lab 1 - Introduction to Stata

Figure 6: Change working directory to personal computer

3 Accessing Stata Off Campus

• As long as you do not need to access files on L: drive, following regular procedures as if youwere on campus should work.

• However, if you need to access files in your L: drive folder, e.g. download solution to a problemset on your computer, you should download Cisco AnyConnect VPN Software (Eagle VPN).It is available for download at the BC software web page http://www.bc.edu/software/

applications/network.html.

• Click on “Learn how to configure Eagle VPN software”. Follow the instructions on the webpage above to install and configure the software (straightforward). You will need to have yourAgora portal login and password handy to download Eagle VPN.

• After you have installed Eagle VPN, run Cisco AnyConnect Secure Mobility Client. ClickConnect.

Figure 7: Starting Cisco AnyConnect Secure Mobility Client to access L: drive off campus.

6

Page 7: Lab 1 - Introduction to Stata

• The software will ask you to log in. Use your Agora login and password. Click OK.

Figure 8: Connecting to Cisco AnyConnect Secure Mobility Client

• Now you can access your files on the L: drive after mapping/connecting to the server asdescribed in section 2.

4 Loading Data

• Stata keeps data in its own format, which has a .dta extension

• Never try to print out a .dta file... you’ll get garbage.

There are three different commands to load data into Stata:

• webuse will load in data that is in Stata format and stored online

• use will load in data (in Stata format) from your computer (see example in Section 6)

• bcuse is a new command written for the BC Economics labs, it loads datasets from theinternet as webuse does

Note: if you are using your own version of Stata you will need to download the command.Find info and download link at http://ideas.repec.org/c/boc/bocode/s457508.

html

Try the following command in the Command window:

webuse auto, clear

• A useful option with use/webuse/bcuse commands is clear, which means that you are allowingStata to clear the memory (previous data set) in order to load the new one. Otherwise, Statawill ask you to confirm clearing the current dataset in order to load the new one.

• If you want to load in new data, or you want to start from scratch, type clear all into theCommand window.IMPORTANT: When you start writing do-files, you should start your files with this com-mand, so when you run the file, any earlier work doesn’t mess up the stuff that you’re running!

7

Page 8: Lab 1 - Introduction to Stata

Figure 9: Loading dataset auto

5 Exploring your data

• describe (des): Stata datasets often come with information about the variables. To get thisinformation, type describe or just des.

This reports the number of observations, the number of variables, the way your data aresorted, the size of your data set, variable descriptions, and the type of storage of your data.

. des

Contains data from http://www.stata-press.com/data/r12/auto.dta

obs: 74 1978 Automobile Data

vars: 12 13 Apr 2011 17:45

size: 3,182 (_dta has notes)

----------------------------------------------------------------------------

storage display value

variable name type format label variable label

----------------------------------------------------------------------------

make str18 %-18s Make and Model

price int %8.0gc Price

mpg int %8.0g Mileage (mpg)

rep78 int %8.0g Repair Record 1978

headroom float %6.1f Headroom (in.)

trunk int %8.0g Trunk space (cu. ft.)

weight int %8.0gc Weight (lbs.)

length int %8.0g Length (in.)

turn int %8.0g Turn Circle (ft.)

displacement int %8.0g Displacement (cu. in.)

gear_ratio float %6.2f Gear Ratio

foreign byte %8.0g origin Car type

---------------------------------------------------------------------------

8

Page 9: Lab 1 - Introduction to Stata

Sorted by: foreign

NOTE: Always check the units of the variable you are using so you can interpret the co-efficients of the regression output. For all datasets from Wooldridge you can check thatat: http://fmwww.bc.edu/ec-p/data/wooldridge/datasets.list.html (or just Google“Wooldridge datasets”, it should be the first result).

6 Browse/Edit

• If you want to visualize your dataset type browse or simply br. This will open up the followingwindow where you can look at your data as if you were using an Excel spreadsheet.

Figure 10: Browser/Editor

• By clicking on the “Edit Mode” button (next to the “Paste” button) you will access theEditor. From here you can directly modify the data, as if you were working in Excel.

• Once you are done with the browser/editor, simply close it by clicking on the X button atthe top-right corner.

We are still going to use the Automobile data from 1978 from last time. To load the datatype:

webuse auto

9

Page 10: Lab 1 - Introduction to Stata

• summarize: The basic descriptive statistics command in Stata is summarize or just summ.

. summ

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

make | 0

price | 74 6165.257 2949.496 3291 15906

mpg | 74 21.2973 5.785503 12 41

rep78 | 69 3.405797 .9899323 1 5

headroom | 74 2.993243 .8459948 1.5 5

-------------+--------------------------------------------------------

trunk | 74 13.75676 4.277404 5 23

weight | 74 3019.459 777.1936 1760 4840

length | 74 187.9324 22.26634 142 233

turn | 74 39.64865 4.399354 31 51

displacement | 74 197.2973 91.83722 79 425

-------------+--------------------------------------------------------

gear_ratio | 74 3.014865 .4562871 2.19 3.89

foreign | 74 .2972973 .4601885 0 1

Notice that rep78 has only 69 observations. We will talk about missing observations below.

• Also note that make has 0 observations. As we saw in the previous lab make is a stringvariable.

• Type:

. summ price

to obtain the same summary statistics for the variable price only.

10

Page 11: Lab 1 - Introduction to Stata

• Try also:

. summ price, detail

to obtain detailed summary statistics, including the median and other percentiles. The outputfrom the above command is:

Price

-------------------------------------------------------------

Percentiles Smallest

1% 3291 3291

5% 3748 3299

10% 3895 3667 Obs 74

25% 4195 3748 Sum of Wgt. 74

50% 5006.5 Mean 6165.257

Largest Std. Dev. 2949.496

75% 6342 13466

90% 11385 13594 Variance 8699526

95% 13466 14500 Skewness 1.653434

99% 15906 15906 Kurtosis 4.819188

The first column gives the 1st to the 99th percentiles for the variable (What is a percentile?).For example it shows you that the 50th percentile for price is 5006.5. Note that this is alsothe median for the variable. The second column shows the 4 smallest and 4 largest values ofthe variable price. The third column in the output presents a number of descriptive statistics,such as number of observations, mean, standard deviation, variance, skewness, etc.

• How about looking at car prices for foreign cars only? The dataset includes the variableforeign such that foreign = 1 if the car is imported and foreign = 0 if the car is made inthe US. To summarize car prices for imported cars only use the following command:

. summ price if foreign==1

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

price | 22 6384.682 2621.915 3748 12990

• Also try

. summ price if foreign==0

. summ price if foreign!=1

Note: the if statement is always followed by a double == sign not by a single =. The notoperator is !. You can also use the & (and), and | (or) operators in your if statements. Forexample:

11

Page 12: Lab 1 - Introduction to Stata

. summ price if foreign==0 & price > 5000

will present summary statistics for domestic cars which cost more than $5000.

• We can also look at the frequency distribution of a variable using the tabulate command.

. tab mpg

• We can tabulate more than one variable at a time.

. tab mpg foreign

The above command gives us the frequency distribution of miles per gallon by whether thecar is foreign or domestic.

• When you tabulate two variables at a time, sometimes the tab command will result in anerror saying “too many values”. Try switching the places of the two variables:

. tab foreign mpg

7 Save and open your own dataset

• At some point you may modify an existent dataset or create your own. You can save yourdataset in your current directory using the command

save dataset_name

• Always make sure your current directory is set correctly before saving in order to be able tolocate the file afterwards!

• If you keep working on the same dataset and you want to overwrite an existing file with thedataset you are currently using, type instead

save dataset_name, replace

• Finally, when you want to load a dataset from your hard drive (or from the L drive), first setyour current directory to the directory where your dataset is saved and then type

use dataset_name, clear

• The option clear clears Stata’s memory before loading your dataset.

12

Page 13: Lab 1 - Introduction to Stata

8 Stata help

• To get help on using Stata commands type help (command) e.g. help describe. Note:use help only if you know the name of the Stata command, otherwise you will get nothing.If you don’t know the name of the command you need you can search for it. Stata has asearch command with a few options, type help search to learn more; but I prefer findit,which searches the Internet as well as your local machine and shows results in the Viewer.

• Stata also has an online help manual. So, if you Google any of these commands, you shouldget a link to the relevant online Stata page. For more help, you can consult StataCorp’s“Frequently Asked Questions” http://www.stata.com/support/faqs/

• If you are stuck with a programming problem, the Statalist archives may be useful. http:

//www.stata.com/statalist/archive/

• Finally, UCLA has an excellent website for those trying to learn to use STATA: http://www.ats.ucla.edu/stat/stata/sk/default.htm

– You may find it useful to start a google search with “stata ucla ...”

13