How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System....
-
Upload
eleanore-osborne -
Category
Documents
-
view
223 -
download
0
Transcript of How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System....
How to start using SAS
SARBAJIT MUKHERJEE
WHAT IS SAS?
SAS stands for Statistical Analysis System.
Useful for the following types of task:1.Data entry, retrieval, and management2.Report writing and graphics 3.Statistical and mathematical analysis
SAS programs
A SAS program is a sequence of steps that the user submits for execution.
Data steps are typically used to create SAS data sets
PROC steps are typically used to process SAS data sets (that is, generate reports and graphs, edit data, sort data and analyze data
SAS Data Libraries
A SAS data library is a collection of SAS files that are
recognized as a unit by SAS
A SAS data set is one type of SAS file stored in a data
library
Work library is temporary library, when SAS is closed, all
the datasets in the Work library are deleted; create a
permanent SAS dataset via your own library.
SAS Data Libraries
Identify SAS data libraries by assigning each a library reference name (libref) with LIBNAME statement
LIBNAME libref “file-folder-location”;
Eg: LIBNAME readData 'C:\temp\sas class\readData‘;
Rules for naming a libref: The name must be 8 characters or less The name must begin with a letter or underscore The remaining characters must be letters, numbers or
underscores.
Reading raw data set into SAS system
In order to create a SAS data set from a raw data file, you must
Start a DATA step and name the SAS data set being created (DATA statement)
Identify the location of the raw data file to read (INFILE statement)
Describe how to read the data fields from the raw data file (INPUT statement)
Example 1
Reading raw data separated by spaces
/* Create a SAS permanent data set named HighLow1; Read the data file temperature1.dat using listing input */
DATA readData.HighLow1; INFILE ‘C:\sas class\readData\temperature1.dat’; INPUT City $ State $ NormalHigh NormalLow RecordHigh RecordLow; RUN;/* The PROC PRINT step creates a isting report of the
readData.HighLow1 data set */PROC PRINT DATA = readData.highlow1; TITLE ‘High and Low Temperatures for July’;RUN;
Nome AK 55 44 88 29
Miami FL 90 75 97 65
Raleign NC 88 68 105 50
temperature1.dat:
Reading Delimited or PC Database Files with the IMPORT Procedure
If your data file has the proper extension, use the simplest form of the IMPORT procedure:
PROC IMPORT DATA FILE = ‘filename’ OUT = data-set
Type of File Extension DBMS Identifier
Comma-delimited .csv CSV Tab-delimited .txt TAB Excel .xls EXCEL Lotus Files .wk1, .wk3, .wk4 WK1,WK3,WK4 Delimiters other than commas or tabs DLM
Examples: 1. PROC IMPORT DATAFILE=‘c:\temp\sale.csv’ OUT=readData.money; RUN;
2. PROC IMPORT DATAFILE=‘c:\temp\bands.xls’ OUT=readData.music; RUN;
SAS or R ?
I think there are several issues (in ascending order of possible validity): Tradition / habit: people are used to SAS, and don't want to have to learn something new. (Making it more difficult, the way you think in SAS and R is different.) This can apply to anyone who might have to send you code, or read / use your code, including managers and colleagues. Distrust of freeware: Several people say they aren't willing to accept results from R because you don't have a for-profit company vetting the code to ensure it gives correct results before it goes out to customers, lest they end up losing business. Big data: R performs operations with everything in memory, whereas SAS doesn't necessarily. Thus, if your data approaches the limits of your memory, there will be problems.Better documentation: R is getting better at this, but documentation, especially the official documentation, is often kind of terrible and opaque
Usage of SAS and other Analytics S/W.
Why use SAS ?
SAS is very efficient with data manipulation if you know what you're doing. It's been designed to work with sequential tapes so it is built with the assumption that data access is expensive. Makes wonders when you work truly massive datasets.
SAS is good at opening up gigantic data sets even on computer which do not have a lot of computing power. Essentially data sets that would crash most programs on a given computer in a heart beat can load in SAS.
SAS as a company is smart and designs its products at corporate cost centers. This includes doing things like company wide installations and setting up its platform in a way that makes it easy for corporate it departments to setup a company wide SAS infrastructure.
Industry Usage
SAS is really pricey !!!!
Well, there is a solution to that too !! SAS provides a free university edition
software that runs on a virtual machine.
Every details about the installation is in the documentation.
Why the University Edition ?
DEMO
QUESTIONs ?
THANK YOU