STAT 3304/5304 Introduction to Statistical...

31
STAT 3304/5304 Introduction to Statistical Computing Introduction to SAS

Transcript of STAT 3304/5304 Introduction to Statistical...

Page 1: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

STAT 3304/5304

Introduction toStatistical Computing

Introduction to SAS

Page 2: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

What is SAS?

• SAS (originally an acronym for Statistical

Analysis System, now it is not an acronym

for anything) is a program designed to

perform analysis on large sets of numerical

and character data.

• Pronounced “sass”, not spelled out as three

letters.

• Developed in the early 1970’s at North

Carolina State University.

• In 1976, The SAS Institute Inc., a privately

held corporation was formed. It grew in

popularity and capability and was used in

academic groups.

1

Page 3: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

What is SAS?

• SAS can be used without knowing much

about programming but it is also a very

sophisticated language and more can be done

with it.

• SAS was first developed to be a

programming language for statisticians and

data analysts.

• Originally intended for management and

analysis of agricultural field experiments.

2

Page 4: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

What is SAS?

• SAS has grown into the world’s largest

privately held software company.

• SAS is now located in Cary, North Carolina.

• It is a world-wide company with business

in Asia, Pacific and Latin America, Europe,

Middle East and Africa.

• SAS also has a good employee retention rate

of 96%. It also is a family oriented company

and is friendly to working women

3

Page 5: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

What is SAS?

• SAS is now one of the the most widely used

statistical software.

• Continual product line expansion and

diversification of clientele have resulted in

SAS products being used by over 40,000

customer sites in 50 countries.

• There are 3.5 million users of SAS products.

• Part of the reason for the continual growth

is that the SAS Institute works with the end

user to improve its product.

• It offers solutions for data warehousing, data

mining, data visualization, and applications

development.

4

Page 6: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

What is SAS?

• The SAS System is an applications systemthat can be used as

– a statistical package

– a data base management system

– a high level programming language

• An applications system is software that givesyou the tools you need to make the datauseful and meaningful.

• In order to be useful, an applications systemshould give you

– total control of your data,

– facilitate applications that run in morethan one computing environment, and

– accommodate varying skill levels ofpotential users.

5

Page 7: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

What is SAS?

• SAS is able to run on a variety of platforms

and SAS is also portable across computing

environments.

• A computing environment is determined by

the HARDWARE and the host OPERATING

SYSTEM running it.

• SAS can be used on IBM mainframes, UNIX

based machines, on personal computers

using Windows.

• “Portability” means that SAS applications:

– Function the same

– Look the same

– Produce the same results

• You can develop SAS applications in one

environment and run them in other

environments without rewriting the programs.

6

Page 8: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

Modes for Running SAS

• SAS can be run in a variety of styles, or

‘modes’, depending on what type of

operating system it is being run on. The

modes most often used include:

– Batch Mode:

∗ user writes whole SAS programs, saves

them into a file, then runs SAS from a

command line prompt.

– Interactive Line Mode:

∗ user enters commands line by line in

response to prompts issued by the SAS

System.

7

Page 9: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

Modes for Running SAS

• – Interactive window mode (SAS Display

Manager System):

∗ user interacts with SAS through

Windows using pull-down menus,

dialog boxes and icons.

∗ this is the version used on Windows and

Macintosh.

– SAS Enterprise Guide:

∗ SAS Enterprise Guide software runs only

under Windows

∗ It can write SAS code for you through

its extensive menu system.

8

Page 10: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

How does SAS work?

• With any body of data, you must perform

four basic tasks to make it useful and mean-

ingful.

– ACCESS – First, you access the data

through the SAS system

– MANAGE – Update, rearrange, combine,

edit, or subset data before analyzing

– ANALYZE – Ranges from simple

descriptive statistics to more advanced or

specialized analyses for econometrics and

forecasting, statistical design, computer

performance evaluation, and operations

research

– PRESENT – Presentation capabilities

range from simple list and tables to

multidimensional plots to elaborate

full-color graphics, both on paper and on

your display.

9

Page 11: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

How does SAS work?

• A SAS program is a sequence of statements

executed in order.

• A statement gives information or

instructions to SAS and must be

appropriately placed in the program.

• SAS is very lenient about the format of its

input – statements can be broken up across

lines, multiple statements can appear on a

single line, and blank spaces and lines can be

added to make the program more readable.

• The most effective strategy for learning SAS

is to concentrate on the details of the data

step, and learn the details of each procedure

as you have a need for them.

10

Page 12: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

SAS Windows

• There are five basic SAS windows: Results

and Explorer windows, and three

programming windows: Editor, Log, and

Output.

• There are also many other SAS windows

that you may use for tasks such as

getting help, changing SAS system options,

and customizing your SAS session.

• Results: The Results window is like a

table of contents for your Output window;

the results tree lists each part of your results

in an outline form.

• Explorer: The Explorer window gives you

easy access to your SAS files and libraries.

11

Page 13: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

SAS Windows

• Editor: The Editor window can use the text

editor to type in, edit, and submit SAS

programs as well as edit other text files such

as raw data files.

• Log: The Log window contains notes about

your SAS session, and after you submit a

SAS program, any notes, errors, or warnings

associated with your program as well as the

program statements themselves will appear

in the Log window.

• Output: If your program generates any

printable results, then they will appear in the

Output window.

12

Page 14: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

SAS Windows

• In Windows operating environments, the de-

fault editor is the Enhanced Editor.

• The Enhanced Editor is syntax sensitive andcolor codes your programs making it easierto read them and find mistakes.

– Green: Comments

– Dark Blue: Keywords in major SAS commands

– Blue: Keywords that have special meaning as SAScommands

– Yellow Highlight: Data

– Red: Statements that SAS does not understand

• The Enhanced Editor also allows you to col-lapse and expand the various steps in your

program.

• For other operating environments, the de-fault editor is the Program Editor whosefeatures vary with the version of SAS and

operating environment.

13

Page 15: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

General Syntax and Rules

• SAS statements may be in upper or lower

case and may begin on any column.

• SAS statements always end with a semicolon

(;).

• SAS statements may also extend across lines,

and more than one SAS statement may

appear on a single line.

• SAS variable names must be 32 characters

or less, constructed of letters, digits and the

underscore character.

• The first character must be an English letter

(A, B, C, . . ., Z) or underscore ( ). Subse-

quent characters can be letters, numeric dig-

its (0, 1, . . ., 9), or underscores. Characters

such as dashes and spaces are not allowed.

14

Page 16: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

General Syntax and Rules

• Its a good idea not to start variable names

with an underscore, because special system

variables are named that way.

• Data set names follow similar rules as vari-

ables, but they have a different name space.

• There are virtually no reserved keywords in

SAS; its very good at figuring things out by

context.

• SAS is not case sensitive, except inside of

quoted strings.

• Missing values are handled consistently in

SAS, and are represented by a period (.).

• Each statement in SAS must end in a semi-

colon (;).

15

Page 17: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

General Syntax and Rules

• To make your programs more

understandable, you can insert comments

into your programs.

• Comments are usually used to annotate the

program, making it easier for someone to

read your program and understand what you

have done and why.

• It doesnt matter what you put in your

comments, SAS will not look at it.

• There are two styles of comments you can

use: one starts with an asterisk (*) and ends

with a semicolon (;). The other style starts

with a slash asterisk (/*) and ends with an

asterisk slash (*/).

16

Page 18: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

Getting Help

• The bulk of SAS documentation is available

online, at

http://support.sas.com/documentation/onlinedoc/

• A catalog of printed documentation avail-

able from SAS can be found at

http://support.sas.com/publishing/

• Online help: Type help in the SAS display

manager input window.

• Sample Programs, distributed with SAS on

all platforms.

• SAS Institute Home Page:

http://www.sas.com

• SAS Institute Technical Support:

http://support.sas.com/resources/

17

Page 19: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

Getting Help

• Searchable index to SAS-L, the SAS mailing

list:

http://www.listserv.uga.edu/archives/sas-l.html

• Michael Friendlys Guide to SAS Resources

on the Internet:

http://www.math.yorku.ca/SCS/StatResource.html#SAS

• Brian Yandells Introduction to SAS:

http://www.stat.wisc.edu/~yandell/software/sas/intro.html

18

Page 20: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

Two Parts of a SAS Program

• There are two main components to most

SAS programs

– DATA steps: create SAS data sets, read

in, manipulated and edited data.

– PROC steps: process SAS data sets

(creating reports, graphs, editing data,

sorting data, etc.) and can also create

data sets.

• A typical program starts with a DATA step

to create a SAS data set and then passes

the data to a PROC step for processing.

• For example: Raw data and/or a pre-existing

SAS data set are read into a SAS DATA

step, turned into a SAS data set, altered

or analyzed by a PROC step and then the

results are displayed in a report.

19

Page 21: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

DATA steps: Getting data into a SAS

There are three ways of getting data into a SAS

data set.

1. Including the data in the SAS command stream

– The data are like a card deck placed into

the stream of SAS commands.

– Use an INPUT command to list the

variables and a CARDS statement right

before the data to be read in.

– Example:

DATA CARDSIN;

INPUT IDNUM SEX AGE;

CARDS;

1 1 25

2 2 33

4 1 55

20

Page 22: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

DATA steps: Getting data into a SAS

2. Read the data in from a disk file.

– Use the INFILE command to name the

disk area with the data

– Then use the INPUT command to list the

variables.

– Example:

DATA DISKIN;

INFILE ‘RAWDATA.DAT’;

INPUT IDNUM SEX AGE;

21

Page 23: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

DATA steps: Getting data into a SAS

3. Create a new data set from an existing SAS

data set.

– Here, the SET command is used to name

the existing SAS data set.

– Example: creates two new SAS data sets

from an existing SAS data set:

DATA FATHERS MOTHERS; SET DISKIN;

IF SEX=1 THEN OUTPUT FATHERS;

ELSE OUTPUT MOTHERS;

22

Page 24: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

PROC steps: Data Management

• PROC SORT

Sorts a data set by one or more variables.

PROC SORT; BY ID; will sort the data set by

the values of the variable ID.

• PROC CONTENTS

Displays the contents of the data set.

• PROC DATASETS

Manages SAS data set libraries.

• PROC RANK

Rank orders one or more variables.

• PROC STANDARDIZE

Rescales variables to a specified mean and/or

standard deviation.

23

Page 25: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

PROC steps: Data Management

• PROC SCORE

Generates linear scores for certain procedures

like factor analysis and discriminant analysis.

• PROC TRANSPOSE

Transposes a data set.

24

Page 26: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

PROC steps: Descriptive Statistics

• PROC FREQ

Simple frequencies and contingency tables

for categorical variables.

• PROC MEANS

Number of observations, mean, standard

deviation, and minimum and maximum

values for continuous variables.

• PROC UNIVARIATE

More detailed descriptive statistics for

continuous variables.

• PROC TABULATE

Produces tables of frequencies and/or

descriptive statistics.

25

Page 27: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

PROC steps: Descriptive Statistics

• PROC SUMMARY

Descriptive statistics broken down by groups;

particularly useful for generating a data set

of descriptive statistics for input into other

procedures.

• PROC CORR

Parametric and nonparametric correlations.

26

Page 28: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

PROC steps: Regression

• PROC REG

General purpose linear regression and

multivariate regression.

• PROC GLM

General linear models, including regression,

analysis of variance/covariance, and

multivariate analysis of variance/covariance.

• PROC RSQUARE

All possible subsets of regression.

• PROC RSREG

Quadratic response surface regression.

• PROC LOGISTIC

Logistic regression.

• PROC PROBIT

Probit regression.

27

Page 29: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

PROC steps: ANOVA, Graphics

• Analysis of Variance

– PROC ANOVA

Analysis of variance for orthogonal data.

– PROC GLM

General linear models, including

regression, analysis of variance, and

multivariate analysis of variance.

– PROC NESTED

Nested analysis of variance.

– PROC VARCOMP

Variance components.

• Low Resolution Graphics

– PROC CHART

Pie, bar, and star charts.

– PROC PLOT

Two dimensional plots.

28

Page 30: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

PROC steps: Multivariate Analysis

• Discriminant Analysis

– PROC DISCRIM

General purpose parametric and

nonparametric discriminant analysis.

– PROC CANDISC

Canonical discriminant analysis.

• Principal Components and Factor Analysis

– PROC PRINCOMP

Principal components.

– PROC FACTOR

Factor analysis.

29

Page 31: STAT 3304/5304 Introduction to Statistical Computingfaculty.smu.edu/ngh/stat3304/class_sasintro.pdf · user to improve its product. ... ∗ SAS Enterprise Guide software runs only

PROC steps: Multivariate Analysis

• Cluster Analysis

– PROC CLUSTER

Clustering observations.

– PROC FASTCLUS

Disjoint clustering for large data sets.

– PROC VARCLUS

Clustering variables.

• Survival Analysis

– PROC LIFETEST

Nonparametric and life tables.

– PROC LIFEREG

Parametric survival analysis.

30