Ann Arbor ASA Up and Running Series: SAS
description
Transcript of Ann Arbor ASA Up and Running Series: SAS
Ann Arbor ASAUp and Running Series:
SAS
Sponsored by the Ann Arbor Chapter of the American Statistical Association and the Department of Statistics of the University of Michigan
Contents• Starting SAS• User Interface• Libraries• Syntax• Getting Data into SAS• Examining Data• Manipulating Data• Descriptive Statistics• Graphing Data• Statistics in SAS
Up and Running Series: SAS2
Starting SAS
Start SAS 9.3 (English)
Up and Running Series: SAS3
User Interface
LogComments, warnings, etc.
Program Editor:Write and submit commands
Output (not seen)
Explorer/ Results
Up and Running Series: SAS 4
Libraries
• SAS requires the creation of Library folders to save the data– Libraries are accessed through LIBNAME command
• Four Libraries are defined by default, at the start of SAS– Maps– SASHELP: holds help info and sample datasets– SASUSER: holds settings, etc.– WORK: default temporary Library for each session
• All data stored in this folder will be deleted at the end of each SAS session
• It is recommended the creation of permanent files/Libraries
Up and Running Series: SAS5
Libraries
• Create a folder called ‘my_files’ on your desktop.
• Run this command in SAS: LIBNAME a "C:\Users\uniquename\Desktop\my_files";
• Refer to datasets in that folder by with the prefix ‘a.datasetname’.
• TIP: Use memorable names for libraries, rather than ‘a’ (e.g., ‘raw’, ‘final’, ‘time1’, etc)
Up and Running Series: SAS6
Syntax
• SAS divides commands into two groups– DATA step
• create/alter datasets– PROC (Procedures)
• perform statistical analyses or generate reports.• Some exceptions to the rule:
– DATA step can be used to generate reports– PROC IMPORT creates a data set– PROC SORT alters data sets
(without telling you!)
Up and Running Series: SAS7
• PROC IMPORT– Allows the reading of standard file types– Allows the reading of plain text, with user-specified
delimiters (i.e., the characters which separate the data)
– WARNING – SAS changed PROC IMPORT for Excel and Access files, in 64-bit SAS
• DATA step– Allows the reading of non-standard file types, complex
file structures, and unusual delimiters.
Getting data into SAS
Up and Running Series: SAS8
DATA step
• SAS syntax can be used to read in raw data files (.txt, .csv files), specifying which variables to read in, which ones are text/numeric, combining multiple rows into one case, etc.
• However, this is a more advanced topic.– Follow up with an Intro class from CSCAR, or by
going through examples from the literature(e.g., ‘The Little SAS Book’).
Up and Running Series: SAS10
Examining Data
• VIEWTABLE Window– Select dataset icon in Explorer
• PROC CONTENTS– Produces a listing of data set information, including
the variables and their properties
• PROC PRINT– Prints a subset of variables or cases to the output
window
Up and Running Series: SAS11
VIEWTABLE Window
Up and Running Series: SAS12
PROC CONTENTS
• In the Editor window, type:PROC CONTENTS data=a.class2;run;
• Highlight the syntax• Submit for processing
– Click on icon of ‘running-man’– Right click on selected syntax
Submit Selection
Up and Running Series: SAS13
PROC CONTENTS
Up and Running Series: SAS14
PROC PRINT
• In the Editor window, type:PROC PRINT data=a.class2;run;
• Submit for processing
Up and Running Series: SAS15
PROC PRINT
Up and Running Series: SAS16
Manipulating Data
• Usually done within a data step– Match data sets using a shared key variable– Create new variables, or drop/rename existing
variables– Take one or more subsets of the data– Sort the data by specific variable(s).
• Overwrite existing or create new datasets– PROC SORT– Adding/Removing variables– Merging Datasets
Up and Running Series: SAS17
PROC SORT• In the Editor window, type:
PROC SORT data=a.class2 out=a.class2sorted;
by age descending weight height;run;
• Submit for processing
• WARNING: PROC SORT alters data– Store in a new dataset
out=‘newdatasetname’;
Up and Running Series: SAS18
PROC SORT
Up and Running Series: SAS19
Adding/Removing variables
• Create new data set, compute new variables, remove unwanted variables
DATA a.class2metric (drop=weight height sex age); set a.class2;height_cm=height*2.54;weight_kg=weight/2.2;label height_cm=‘Height in CM’
weight_kg=‘Weight in Kilograms’;run;
PROC PRINT data=a.class2metric;run;
• Submit for processing
Up and Running Series: SAS20
Adding/Removing variables
Up and Running Series: SAS21
Merging Datasets• Data sets must be sorted by the same key
variable(s)proc sort data=a.class2;
by name; proc sort data=a.class2metric;
by name; data classmerged;
merge a.class2 a.class2metric;by name;
run;• Submit for processing
Up and Running Series: SAS22
Merging Datasets
Up and Running Series: SAS23
Merging Datasets
Up and Running Series: SAS24
Descriptive Statistics
• PROC FREQ– Produces a table of counts and percentages– For cross-tabulations, statistical tests can also
be performed; e.g., independence testing
• PROC MEANS– Produces descriptive statistics such as mean,
standard deviation, minimum, maximum
Up and Running Series: SAS25
PROC FREQ
• In the Editor window, type proc freq data=a.class2;
tables age*sex;run;
• Submit for processing
Up and Running Series: SAS26
PROC FREQ
Up and Running Series: SAS27
PROC MEANS
• In the Editor window, type proc means data=a.class2;
var age weight height;run;
• Submit for processing
Up and Running Series: SAS28
PROC MEANS
Up and Running Series: SAS29
Graphing DataPROC GPLOT
• Simple bivariate scatterplot• Separate lines• Multiple variables scatterplot• Options
Up and Running Series: SAS30
PROC GPLOT
• Simple bivariate scatterplot:proc gplot data=a.class2;
symbol1 value=dot interpol=rl;plot weight*height;
run;
• Submit for processing
Up and Running Series: SAS31
PROC GPLOT - Log
Up and Running Series: SAS32
PROC GPLOT
Up and Running Series: SAS33
• To graph separate lines for each level of a categorical variable, type:
proc gplot data=a.class2;symbol1 value=dot interpol=rl;plot weight*height = sex;
run;• Submit for processing
PROC GPLOT
Up and Running Series: SAS34
PROC GPLOT
Up and Running Series: SAS35
• Multiple variables on the same graph:proc gplot data=a.class2;
symbol1 value=dot interpol=rl color=blue;
symbol2 value=dot interpol=rl color=red;
plot weight * age;plot2 height * age;
run; quit;
• Submit for processing
PROC GPLOT
Up and Running Series: SAS36
PROC GPLOT
Up and Running Series: SAS37
value=___
• Any character enclosed in single quotes
• Special characters– dot– plus sign– star– square– ...and many others
interpol=___
• RL / RQ / RC– linear– quadratic – cubic – regression curves
• JOIN– connects consecutive
points (line graph)• BOX
PROC GPLOT
Up and Running Series: SAS 38
Statistics in SAS
• PROC CORR– Correlational analyses
• PROC REG– Statistical Regression
• PROC UNIVARIATE– To assess normality of regression residuals
Up and Running Series: SAS39
PROC CORR
• Compute bivariate correlation coefficients
proc corr data = a.class2;var age;with height weight;
run;
Up and Running Series: SAS40
PROC CORR
Up and Running Series: SAS41
PROC REG• Run a regression on merged ‘class’ dataset
– Save residuals and predicted values in an output dataset
– Request residual plotproc reg data=a.classmerged;
model height_cm=age weight / partial; output out=reg_data p=predict r=resid
rstudent=rstudent; plot rstudent. * height_cm;
run;quit;
• Notes – the quit command terminates the regression procedure; otherwise it keeps running; the output data set will be in the work library, since no library was specified.Up and Running Series: SAS 42
PROC REG
Up and Running Series: SAS43
PROC REG
Up and Running Series: SAS44
PROC REG
Up and Running Series: SAS45
PROC REG
Up and Running Series: SAS46
PROC UNIVARIATE
• Assess normality of regression residuals stored in the output dataset from PROC REG:
proc univariate data=reg_data;var rstudent;histogram;qqplot / normal (mu=est
sigma=est);run;quit;
Up and Running Series: SAS47
PROC UNIVARIATE
Up and Running Series: SAS 48
PROC UNIVARIATE
Up and Running Series: SAS 49
PROC UNIVARIATE
Up and Running Series: SAS 50
QUESTIONS
Up and Running Series: SAS51
Winter 2013 Training from CSCARhttp://cscar.research.umich.edu/workshops/
Introduction to SAS® - January 28,30, February 1,4,6,8, 2013
Intermediate Topics in SPSS: Data Management and Macros - February 5,7, 2013
Intermediate Topics in SPSS: Advanced Statistical Models - February 12,14, 2013
Intermediate SAS® - February 25,27, March 1, 2013
Regression Analysis - March 11,13,15, 2013
Applications of Hierarchical Linear Models - March 18,20,22, 2013
Statistical Analysis with R - March 19,21, 2013
Introduction to NVivo - April 3, 2013
Applied Structural Equation Modeling - April 10,11,12, 2013
Up and Running Series: SAS 52
Further Resources
• The Little SAS Book: A Primer• UCLA site
– software tutorials, classes and lectures on statistical methods – an incredible site! http://www.ats.ucla.edu/stat/
• SAS Documentation: http://support.sas.com/documentation/
Documentation also found in ‘SAS help’ files.
Up and Running Series: SAS53
54
Other Winter 2013 Workshopsfrom Ann Arbor ASA
R - January 31, 1-3 PM Angell Hall Computing Classroom B (also known as MH444-B)
For more information go to: http://community.amstat.org/annarbor/home
Up and Running Series: SAS
PLACE Starbucks State & Liberty, lower level
TIME 6:00pm – 6:45pm,
DATE TOPIC 24-JAN Business Meeting 1 -APR Business Meeting and Election of Officers
For more information go to: http://community.amstat.org/annarbor/home
Chapter Meetings open to all
Up and Running Series: SAS 55