How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System....

16
How to start using SAS SARBAJIT MUKHERJEE

Transcript of How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System....

Page 1: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

How to start using SAS

SARBAJIT MUKHERJEE

Page 2: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

WHAT IS SAS?

SAS stands for Statistical Analysis System.

Useful for the following types of task:1.Data entry, retrieval, and management2.Report writing and graphics 3.Statistical and mathematical analysis

Page 3: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

SAS programs

A SAS program is a sequence of steps that the user submits for execution.

Data steps are typically used to create SAS data sets

PROC steps are typically used to process SAS data sets (that is, generate reports and graphs, edit data, sort data and analyze data

Page 4: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

SAS Data Libraries

A SAS data library is a collection of SAS files that are

recognized as a unit by SAS

A SAS data set is one type of SAS file stored in a data

library

Work library is temporary library, when SAS is closed, all

the datasets in the Work library are deleted; create a

permanent SAS dataset via your own library.

Page 5: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

SAS Data Libraries

Identify SAS data libraries by assigning each a library reference name (libref) with LIBNAME statement

LIBNAME libref “file-folder-location”;

Eg: LIBNAME readData 'C:\temp\sas class\readData‘;

Rules for naming a libref: The name must be 8 characters or less The name must begin with a letter or underscore The remaining characters must be letters, numbers or

underscores.

Page 6: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

Reading raw data set into SAS system

In order to create a SAS data set from a raw data file, you must

Start a DATA step and name the SAS data set being created (DATA statement)

Identify the location of the raw data file to read (INFILE statement)

Describe how to read the data fields from the raw data file (INPUT statement)

Page 7: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

Example 1

Reading raw data separated by spaces

/* Create a SAS permanent data set named HighLow1; Read the data file temperature1.dat using listing input */

DATA readData.HighLow1; INFILE ‘C:\sas class\readData\temperature1.dat’; INPUT City $ State $ NormalHigh NormalLow RecordHigh RecordLow; RUN;/* The PROC PRINT step creates a isting report of the

readData.HighLow1 data set */PROC PRINT DATA = readData.highlow1; TITLE ‘High and Low Temperatures for July’;RUN;

Nome AK 55 44 88 29

Miami FL 90 75 97 65

Raleign NC 88 68 105 50

temperature1.dat:

Page 8: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

Reading Delimited or PC Database Files with the IMPORT Procedure

If your data file has the proper extension, use the simplest form of the IMPORT procedure:

PROC IMPORT DATA FILE = ‘filename’ OUT = data-set

Type of File Extension DBMS Identifier

Comma-delimited .csv CSV Tab-delimited .txt TAB Excel .xls EXCEL Lotus Files .wk1, .wk3, .wk4 WK1,WK3,WK4 Delimiters other than commas or tabs DLM

Examples: 1. PROC IMPORT DATAFILE=‘c:\temp\sale.csv’ OUT=readData.money; RUN;

2. PROC IMPORT DATAFILE=‘c:\temp\bands.xls’ OUT=readData.music; RUN;

Page 9: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

SAS or R ?

I think there are several issues (in ascending order of possible validity): Tradition / habit: people are used to SAS, and don't want to have to learn something new. (Making it more difficult, the way you think in SAS and R is different.) This can apply to anyone who might have to send you code, or read / use your code, including managers and colleagues. Distrust of freeware: Several people say they aren't willing to accept results from R because you don't have a for-profit company vetting the code to ensure it gives correct results before it goes out to customers, lest they end up losing business. Big data: R performs operations with everything in memory, whereas SAS doesn't necessarily. Thus, if your data approaches the limits of your memory, there will be problems.Better documentation: R is getting better at this, but documentation, especially the official documentation, is often kind of terrible and opaque

Page 10: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

Usage of SAS and other Analytics S/W.

Page 11: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

Why use SAS ?

SAS is very efficient with data manipulation if you know what you're doing. It's been designed to work with sequential tapes so it is built with the assumption that data access is expensive. Makes wonders when you work truly massive datasets.

SAS is good at opening up gigantic data sets even on computer which do not have a lot of computing power. Essentially data sets that would crash most programs on a given computer in a heart beat can load in SAS.

SAS as a company is smart and designs its products at corporate cost centers. This includes doing things like company wide installations and setting up its platform in a way that makes it easy for corporate it departments to setup a company wide SAS infrastructure.

Page 12: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

Industry Usage

Page 13: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

SAS is really pricey !!!!

Well, there is a solution to that too !! SAS provides a free university edition

software that runs on a virtual machine.

Every details about the installation is in the documentation.

Page 14: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

Why the University Edition ?

Page 15: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

DEMO

Page 16: How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,

QUESTIONs ?

THANK YOU