Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of...

51
Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida

description

Scenario Creating new variables Eliminating variables Writing detail report Writing summary report Subsetting report

Transcript of Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of...

Page 1: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Chapter 6 Concatenating SAS Data Sets and

Creating Summary Reports

Xiaogang Su

Department of Statistics

University of Central Florida

Page 2: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Section 6.1 Introduction

For P&E Web Site, University of Central Florida needs to

• combine three data sets to create a detailed three year report

• report entrance age and SAT score by semester and year

• subset the data to create a report for the fall semester.

Page 3: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Scenario

• Creating new variables• Eliminating variables• Writing detail report• Writing summary report• Subsetting report

Page 4: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Section 6.2 Concatenating SAS Data Set

• Objectives

• Define concatenation.

• Use the SET statement in a DATA step to concatenate two SAS data sets.

• Use the INT function, the SUBSTR function and the SUM function in an assignment statement to create new variables.

• Use the KEEP= option (or DROP = option) to write only selected variables to the output data set.

• Use the REPORT procedure to create a detail report.

Page 5: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Scenario

For its 2000-01 academic year report, University of Central Florida must include the enrolment data for all three semester.

Page 6: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Scenario

It will be necessary to

• combine the two SAS data sets

• create new variables based on calculated values

• eliminate unwanted variables from the newly created data set

• create a presentation-quality list report.

Page 7: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Scenario

Sales and Passenger Data for 1999 Sales Total TotalDate Flight Passengers Revenue_____________________________________________________

01JAN1999 IA00100 207 $123,948.0002JAN1999 IA00100 166 $117,695.0003JAN1999 IA00201 186 $127,197.0004JAN1999 IA00300 186 $140,170.0005JAN1999 IA00400 194 $141,826.0006JAN1999 IA00502 230 $15,110.0007JAN1999 IA00504 237 $15,516.0008JAN1999 IA00500 227 $14,700.0009JAN1999 IA00502 253 $16,326.0010JAN1999 IA00604 267 $15,570.00

Page 8: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Concatenating SAS Data Sets

A concatenation

combines two or more SAS data sets one after the other, into a single SAS data set

uses the SET statement

is one of several methods for combining SAS data sets.

Page 9: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Concatenating SAS Data Sets

Use the SET statement in a DATA step to concatenate SAS data sets.

General form of a DATA step concatenation:

DATA SAS-data-set; SET SAS-data-set-1 SAS-data-set-2;RUN;

Page 10: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Concatenating SAS Data Sets

Example program to concatenate several SAS data sets:

data sta4102.ch6data;

set sta4102.summer2000

sta4102.fall2000

sta4102.SPRING2001;

More SAS Statements;

run;

Page 11: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Concatenating Two SAS Data SetsConcatenate STA4102.summer2000 and STA4102.fall2000 :

Summer FallSTA4102.summer2000Date FlightID .. .. TotPassCap

STA4102.fall2000Date FlightID .. .. TotPassCap

STA4102.ch6data

Summer

Fall

Date FlightID .. .. TotPassCap

Page 12: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Variables in STA4102.CH6DATA

-----Alphabetic List of Variables and Attributes-----

# Variable Type Len Pos Format Informat Labelƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3 ACT_SCORE Char 2 7 $2. $2. ACT_SCORE 4 BIRTH_YEAR Char 4 9 $4. $4. BIRTH_YEAR 5 CLASSIF_LEVEL Char 1 13 $1. $1. CLASSIF_LEVEL 6 COUNTY_RESIDENCY Char 4 14 $4. $4. COUNTY_RESIDENCY 7 ETHNIC_ORIGIN Char 1 18 $1. $1. ETHNIC_ORIGIN 8 EXCEPTION_FLAG Char 1 19 $1. $1. EXCEPTION_FLAG 9 FINAL_ADM_ACTION Char 1 20 $1. $1. FINAL_ADM_ACTION10 QUANT_SCORE Char 3 21 $3. $3. QUANT_SCORE11 REG_CODE Char 1 24 $1. $1. REG_CODE12 SEX Char 1 25 $1. $1. SEX 2 STU_TYPE_APPL Char 1 6 $1. $1. STU_TYPE_APPL 1 TERM_CYM Char 6 0 $6. $6. TERM_CYM13 TEST_INDICATOR Char 1 26 $1. $1. TEST_INDICATOR14 UCF_COLLEGE Char 2 27 $2. $2. UCF_COLLEGE15 UCF_ENTRY_CLASIF Char 2 29 $2. $2. UCF_ENTRY_CLASIF16 UCF_FULL_PART_TIM Char 1 31 $1. $1. UCF_FULL_PART_TIM17 UCF_MAJOR Char 6 32 $6. $6. UCF_MAJOR18 VERBAL_OTHER_SCR Char 3 38 $3. $3. VERBAL_OTHER_SCR

Page 13: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Use the SET statement in a DATA step to perform the concatenation. Name the new data set STA4102.ch6data.

data sta4102.ch6data;

set sta4102.summer2000

sta4102.fall2000

sta4102.SPRING2001;

More SAS Statements;

run;

Concatenating Three SAS Data Sets

Page 14: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Using the SUBSTR Function in an Assignment Statement

Use the SUBSTR function in an assignment statement to return a value for the year. Assign those values to the new variable YEAR.

data sta4102.ch6data; set sta4102.summer2000

sta4102.fall2000

sta4102.SPRING2001;

year=substr(term_cym,1,4);More SAS Statements;

run;

Page 15: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Use the SUM function in an assignment statement to sum the values of the Verbal and the Quantitative portion to create the new variable SAT.

data sta4102.ch6data;

set sta4102.summer2000

sta4102.fall2000

sta4102.SPRING2001;

sat=sum(verbal,quant);

More SAS Statements;

run;

Using the SUM Function in an Assignment Statement

Page 16: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

data sta4102.ch6data;

set sta4102.summer2000

sta4102.fall2000

sta4102.SPRING2001;

verbal=int(substr(verbal_other_scr,1,3));

if verbal=999 then verbal=.;

More SAS Statements;

run;

Using the INT and SUBSTR Function in an Assignment Statement

Page 17: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Eliminating Variables with the DROP ststement

Use the DROP statement to drop the unwanted variables.

drop a b term_cym stu_type_Appl act_Score Birth_year Classif_level County_Residency ethnic_origin Exception_Flag Final_adm_action quant_score reg_code sex test_indicator ucf_college verbal_other_scr ucf_entry_clasif ucf_major ucf_full_part_tim;

Page 18: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Adding Options to Enhance Report Appearance

Selected PROC REPORT options:

HEADLINE underlines all column headers and the spaces between them.

HEADSKIP writes a blank line beneath allcolumn headers.

Page 19: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Write a PROC REPORT step. Use the

• HEADLINE and HEADSKIP options to underline the column headers and insert a blank line

• COLUMN statement to specify which variables to include.

proc report data=sta4102.ch6data nowd headline headskip;

column year semester gender sat ;

More SAS Statements;run;

Writing a PROC REPORT Step

Page 20: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Writing a PROC REPORT Step

Use DEFINE statements to define the variables as display variables. • Add column headers and specify column width.• Add formats and specify alignment. Add titles.

proc report data=sta4102.ch6data nowd headline headskip; column year semester gender sat; define year / display center 'Year'; define semester / display center 'Semester'; define gender / display center 'Gender' ; define sat / display center 'Average SAT Score';

title 'Academic 2000-2001 Applicants';run;

Page 21: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Concatenating Two SAS Data Sets and Reporting the ResultsFile: pg1-ch6-ex02.sas

Concatenate two SAS data sets and use SAS functions in assignment statements to create new variables. Use the DROP statement to eliminate unwanted variables. Use a PROC REPORT step to create a detail report.

Page 22: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Section 6.3 Creating a Summary Report with Report Procedure

Objectives• Explain the GROUP option used in the DEFINE

statement in a PROC REPORT step.

• Explain the MEAN option used in the DEFINE statement in a PROC REPORT step.

• Use the MEAN and GROUP options and other PROC REPORT features to create a summary report of presentation quality.

Page 23: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Scenario

Some faculty claim that the SAT score is lower for the applicants in Summer semester.

Page 24: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

The Goal Report

Academic 2000-2001 Applicants

Average Year Semester Gender SAT Score ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

2000 Fall Female 977.51 Male 1026.85 Not Report 960.83 Summer Female 923.87 Male 973.67 Not Report 730.00 2001 Spring Female 886.23 Male 936.86 Not Report 635.00

Page 25: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Defining Group Variables

Use the REPORT procedure to create a summary report by defining variables as group variables.

All observations whose group variables have the same values are collapsed into a single observation in the report.

Page 26: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Defining Group Variables

Semester as Group VariableSemester as Group Variable

Before GroupingSemester Avg. SAT

Fall 1080Fall 1260Summer 1210Summer 850

After GroupingSemester Avg. SAT

Fall 977.51Spring 886.23Summer 923.87

Page 27: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Defining Group Variables

You can define more than one variable as a group variable, but all group variables must precede other types of variables.

Nesting is determined by the order of the variables in the COLUMN statement.

Page 28: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Selecting the Variables

Use the COLUMN statement in a PROC REPORT step to specify the variables to be included in the report.

proc report data=sta4102.ch6data nowd headline headskip;

column semester sat;

other SAS statements

run;

Page 29: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Defining the Variable Attributes

Use the DEFINE statement to define Semester as a group variable and assign it a column header.

proc report data=sta4102.ch6data nowd headline headskip;

column semester sat;

define semester / group order=internal 'Semester‘;

other SAS statements

run;

Page 30: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Use the DEFINE statement to define SAT as mean variables and assign their attributes.

proc report data=sta4102.ch6data nowd headline headskip;

column semester sat;

define semester / group order=internal 'Semester‘; define sat / 'Average SAT Score' width=10

format=8.2 mean;

other SAS statements

run;

Defining the Variable Attributes

Page 31: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Use the HEADLINE option to underline the column headers and the HEADSKIP option to add a blank line between the column headers. Add titles.

Controlling Report Appearance

Page 32: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Creating a Summary ReportFile: pg1-ch6-ex03.sas

This demonstration illustrates how to create a summary report.

Page 33: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Section 6.4 Subsetting Observations from a SAS Data Set

• Objectives

• Subset data using the WHERE statement with comparison and logical operators.

• Define special operators and explain how they are used in the WHERE statement.

• Define SAS date constants.

• Define the YEARCUTOFF= option.

• Use a WHERE statement with a special operator to subset a SAS table.

Page 34: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Scenario

University wants to create a separate report for accepted students and enrolled students.

Page 35: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Scenario

Holiday Sales Report  Sales Total Total Date Flight Seats Passengers Revenue___________________________________________________________

24NOV1999 IA00300 207 186 $143,754.0025NOV1999 IA00301 207 190 $143,070.0026NOV1999 IA00300 207 176 $133,704.0027NOV1999 IA00301 207 194 $145,354.0028NOV1999 IA00300 207 191 $142,983.0029NOV1999 IA00301 207 169 $133,809.0030NOV1999 IA00300 207 193 $142,977.0001DEC1999 IA00301 207 167 $130,231.0002DEC1999 IA00300 207 207 $156,039.0003DEC1999 IA00301 207 190 $146,094.0004DEC1999 IA00300 207 175 $136,143.00

Page 36: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Subsetting Your Data with the WHERE StatementThe WHERE statement enables you to select observations

that meet a certain condition before SAS brings the observation into the PROC REPORT step.

proc report data=sta4102.ch6data nowd headline headskip; where action in ('A', 'P', 'X');

run;

Page 37: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Subsetting Data with the WHERE Statement

General syntax of the WHERE statement:

where-expression is a sequence of operands and operators. Operands include

• variables

• functions

• constants.

WHERE where-expression;

Page 38: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Subsetting Your Data with the WHERE Statement

The WHERE statement can be used with

• comparison operators

• logical operators.

You can also use the WHERE statement with special operators.

Page 39: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Comparison Operators

EQ = equal toNE ^= not equal toGT > greater thanLT < less thanGE >= greater than or equal toLE <= less than or equal toIN equal to one of a list

Mnemonic Symbol Definition

Page 40: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Comparison Operators

Examples:

where Salary>25000;

where EmpID='0082';

where Salary=.;

where LastName=' ';

Page 41: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Comparison Operator

You can use the IN comparison operator to see if a variable is equal to one of the values in a list.

Example:

where action in ('A', 'P', 'X'); Character comparisons are case sensitive.

Page 42: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Logical Operators

Logical operators include

AND if both expressions are true, then the compound expression is true

OR if either expression is true, then the compound expression is true

NOT can be combined with other operators to reverse the logic of a comparison.

Page 43: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Logical Operators

Examples:

where JobCode='FLTAT3' and Salary>50000;

where JobCode='PILOT1' or JobCode='PILOT2' or JobCode='PILOT3';

Page 44: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Special Operators

The following are special operators :

LIKE selects observations by comparing character values to

specified patterns. A percent sign (%) replaces any number

of characters and an underscore (_) replaces one character.

where Code like 'E_U%';

(E, a single character, U, followed by any characters.)

continued...

Page 45: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Special Operators

• The sounds-like (=*) operator selects observations that contain a spelling variation of the word or words specified.

where Name=*'SMITH';(Selects the names Smythe, Smitt, and so on.)

• CONTAINS or ? selects observations that include the specified substring.

where Word ? 'LAM';(BLAME, LAMENT, and BEDLAM are selected.)

continued...

Page 46: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

• IS NULL or IS MISSING selects observations in which the value of the variable is missing.

where Flight is missing;• BETWEEN-AND selects observations in which the value

of the variable falls within a range of values.

where Date between '01mar1999'd and '01apr1999'd;

Special Operators

Page 47: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Using SAS Date Constants

The constant 'ddMMMyyyy'd creates a SAS date value from the dates enclosed in quotes where

dd is a one- or two-digit value for the day

MMM is a three-letter abbreviation for the month (JAN, FEB, MAR, and so on)

yyyy is a two- or four-digit value for the year.

Page 48: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Using the YEARCUTOFF= Option

If you use a two-digit value for year, SAS will use the value of the YEARCUTOFF= system option to determine the complete year value.

YEARCUTOFF= specifies the first year of a 100-year span used to determine the century of a two-digit year.

options yearcutoff=1930;

...1900...1930…1980…2029.2030…2050...

100 year span

Page 49: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Subsetting a Report

Use a WHERE statement with the special operator BETWEEN-AND to create a subset based on dates. proc report data=sta4102.ch6data nowd headline headskip;

column semester sat;

where action in ('A', 'P', 'X') and apptype = '1';

define semester / group order=internal 'Semester' width=8 format=$semfmt.

style(header) = {just=center} style(column) = {just=left};

define sat / 'Average SAT Score' width=10 format=8.2 mean

style(header) = {just=center} style(column) = {just=right};

title 'Accepted FTIC Students SAT Summary Report';

title2 '(2000-2001)';

title3 'Gropu by Semester';

run;

Page 50: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

WHERE or Subsetting IF?

Step and Usage WHEREStatement

IFStatement

PROC step Yes NoDATA step (source of variable)

INPUT statement No YesAssignment statement No Yes

SET statement (single data set) Yes YesSET/MERGE

(multiple data sets)Variable in ALL data sets Yes Yes

Variable not in ALL data sets No Yes

Page 51: Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Subsetting Report Data with a WHERE StatementFile: pg1-ch6-ex04.sas

This demonstration illustrates how to use special operators in a WHERE statement to subset the data based on a range of dates.