Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

24
Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

Transcript of Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Page 1: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Creating Summary Data SetsRon Cody, Ed.D.

Robert Wood Johnson Medical School

Page 2: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Test data set (CLINIC)SUBJECT GENDER AGE_GROUP BLOOD_TYPE HR SBP DBP

1 M 1 A 80 130 80

2 M 1 B 68 128 70

3 M 2 O . 120 72

4 M 1 A 48 140 86

5 F 2 A 56 160 94

6 F 1 B 60 109 64

7 F 2 O 82 118 70

8 F 2 O 64 . 76

9 F 1 A 56 . 88

10 F 1 B 88 188 110

11 M 1 B 64 120 80

12 M 2 B 62 120 76

Page 3: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

PROC MEANS DATA=data_set_name NOPRINT;

Is equivalent to

PROC SUMMARY DATA=data_set_name;

PROC MEANS vs. PROC SUMMARY

Page 4: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Creating a SUMMARY Data Set Containing MEANS

PROC MEANS DATA=CLINIC NOPRINT;/****************************************Equivalent to PROC SUMMARY DATA=CLINIC;*****************************************/ CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; Listing of data set OUT1

Obs GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP

1 0 12 66.1818 133.300 80.5000 2 F 1 6 67.6667 143.750 83.6667 3 M 1 6 64.4000 126.333 77.3333

Page 5: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Using a BY statement Instead of a CLASS Statement

PROC SORT DATA=CLINIC; BY GENDER;RUN;PROC MEANS DATA=CLINIC NOPRINT; BY GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN;

Listing of data set OUT1

Obs GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP

1 F 0 6 67.6667 143.750 83.6667 2 M 0 6 64.4000 126.333 77.3333

Page 6: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Creating a SUMMARY Data Set Containing MEANS

Broken Down by GENDER and AGE_GROUP PROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP

. 0 12 66.1818 133.300 80.5000 1 1 7 66.2857 135.833 82.5714 2 1 5 66.0000 129.500 77.6000 F . 2 6 67.6667 143.750 83.6667 M . 2 6 64.4000 126.333 77.3333 F 1 3 3 68.0000 148.500 87.3333 F 2 3 3 67.3333 139.000 80.0000 M 1 3 4 65.0000 129.500 79.0000 M 2 3 2 62.0000 120.000 74.0000

Page 7: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Explaining the _TYPE_ Variable

Class Variables Representation

GENDER AGE_GROUP Binary Decimal

0 0 00 0

0 1 01 1

1 0 10 2

1 1 11 3

CLASS GENDER AGE_GROUP;

Page 8: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Demonstrating the NWAY Option

PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN;

AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP

F 1 3 3 68.0000 148.5 87.3333 F 2 3 3 67.3333 139.0 80.0000 M 1 3 4 65.0000 129.5 79.0000 M 2 3 2 62.0000 120.0 74.0000

Page 9: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Outputting More than One StatisticPROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN =M_HR M_SBP M_DBP N =N_HR N_SBP N_DBP MAX =MAX_HR MAX_SBP MAX_DBP MEDIAN =MED_HR MED_SBP MED_DBP;RUN; GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP N_HR N_SBP

0 12 66.1818 133.300 80.5000 11 10 F 1 6 67.6667 143.750 83.6667 6 4 M 1 6 64.4000 126.333 77.3333 5 6

N_DBP MAX_HR MAX_SBP MAX_DBP MED_HR MED_SBP MED_DBP

12 88 188 110 64 124 78 6 88 188 110 62 139 82 6 80 140 86 64 124 78

Page 10: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Partial List of Some Available StatisticsKeyword Description________________________________ MEAN MeanN Number of non-missing valuesNMISS Number of missing values MIN Smallest non-missing valueMAX Largest valueMEDIAN MedianRANGE Range - difference between the minimum and

maximum valuesQ1 25th percentileQ3 75th percentileQRANGE Interquartile range

(difference between 25th and 75th percentile)STD Standard deviationSTDERR Standard errorUCLM Upper bound of the 95% confidence interval LCLM Lower bound of the 95% confidence interval

Page 11: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Demonstrating the AUTONAME OUTPUT optionPROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN = N = MAX = MEDIAN = / AUTONAME;RUN;

GENDER _TYPE_ _FREQ_ HR_Mean SBP_Mean DBP_Mean HR_N SBP_N

0 12 66.1818 133.300 80.5000 11 10 F 1 6 67.6667 143.750 83.6667 6 4 M 1 6 64.4000 126.333 77.3333 5 6

SBP_ DBP_DBP_N HR_Max SBP_Max DBP_Max HR_Median Median Median

12 88 188 110 64 124 78 6 88 188 110 62 139 82 6 80 140 86 64 124 78

Page 12: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Another Way of Naming Output Variables

PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=;RUN;

Listing of Data Set OUT1

AGE_GENDER GROUP _TYPE_ _FREQ_ HR SBP DBP

F 1 3 3 68.0000 148.5 87.3333 F 2 3 3 67.3333 139.0 80.0000 M 1 3 4 65.0000 129.5 79.0000 M 2 3 2 62.0000 120.0 74.0000

Page 13: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Dropping Unneeded Variables in the Output Dataset

PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1(DROP= _:) MEAN=M_HR M_SBP M_DBP;RUN;

Listing of Data Set OUT1

AGE_GENDER GROUP M_HR M_SBP M_DBP

F 1 68.0000 148.5 87.3333 F 2 67.3333 139.0 80.0000 M 1 65.0000 129.5 79.0000 M 2 62.0000 120.0 74.0000

Page 14: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Demonstrating the CHARTYPE Procedure Option

PROC MEANS DATA=CLINIC NOPRINT CHARTYPE; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; Demonstrating CHARTYPE Option

AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP

. 00 12 66.1818 133.300 80.5000 1 01 7 66.2857 135.833 82.5714 2 01 5 66.0000 129.500 77.6000 F . 10 6 67.6667 143.750 83.6667 M . 10 6 64.4000 126.333 77.3333 F 1 11 3 68.0000 148.500 87.3333 F 2 11 3 67.3333 139.000 80.0000 M 1 11 4 65.0000 129.500 79.0000 M 2 11 2 62.0000 120.000 74.0000

Page 15: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Demonstrating the CHARTYPE Procedure Option

PROC PRINT DATA=OUT1 NOOBS; TITLE "Demonstrating CHARTYPE Option"; WHERE _TYPE_ EQ "10";RUN;

Demonstrating CHARTYPE Option

AGE_GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP

F . 10 6 67.6667 143.750 83.6667 M . 10 6 64.4000 126.333 77.3333

Page 16: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Another Way to Name Variables

(instead of using a VAR statement)PROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; ***VAR STATEMENT OPTIONAL; OUTPUT OUT=OUT1 MEAN(HR) =M_HR N(HR SBP DBP) =N_HR N_SBP N_DBP MAX(SBP) =MAX_SBP MEDIAN(SBP DBP) =MED_SBP MED_DBP;RUN;

GENDER _TYPE_ _FREQ_ M_HR N_HR N_SBP N_DBP MAX_SBP MED_SBP MED_DBP

0 12 66.1818 11 10 12 188 124 78 F 1 6 67.6667 6 4 6 188 139 82 M 1 6 64.4000 5 6 6 140 124 78

Page 17: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Multi-way Breakdowns Using a TYPES Statement

PROC MEANS DATA=CLINIC NOPRINT CHARTYPE; CLASS GENDER AGE_GROUP BLOOD_TYPE; VAR HR SBP DBP; TYPES GENDER AGE_GROUP*GENDER BLOOD_TYPE*GENDER; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP;RUN; AGE_ BLOOD_

GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP

F . 100 6 67.6667 143.750 83.6667 M . 100 6 64.4000 126.333 77.3333 F . A 101 2 56.0000 160.000 91.0000 F . B 101 2 74.0000 148.500 87.0000 F . O 101 2 73.0000 118.000 73.0000 M . A 101 2 64.0000 135.000 83.0000 M . B 101 3 64.6667 122.667 75.3333 M . O 101 1 . 120.000 72.0000 F 1 110 3 68.0000 148.500 87.3333 F 2 110 3 67.3333 139.000 80.0000 M 1 110 4 65.0000 129.500 79.0000 M 2 110 2 62.0000 120.000 74.0000

Page 18: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Using the _TYPE_ Values to Create Multiple Data Sets

DATA GENDER AGE_BY_GENDER BLOOD_BY_GENDER; SET OUT1; IF _TYPE_ = "100" THEN OUTPUT GENDER; ELSE IF _TYPE_ = "110" THEN OUTPUT AGE_BY_GENDER;RUN; Listing of Data Set GENDER

AGE_ BLOOD_GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP

F . 100 6 67.6667 143.750 83.6667 M . 100 6 64.4000 126.333 77.3333

Listing of Data Set AGE_BY_GENDER

AGE_ BLOOD_GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP

F 1 110 3 68.0000 148.5 87.3333 F 2 110 3 67.3333 139.0 80.0000 M 1 110 4 65.0000 129.5 79.0000 M 2 110 2 62.0000 120.0 74.0000

Page 19: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Examples of TYPES Statements

TYPES A A*C D*C; TYPES A*(B C D);TYPES () A A*C*D;

Page 20: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Using PROC FREQ to Count Frequencies

PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=NUMBER;RUN;

Listing of Data Set NUMBER

AGE_GROUP COUNT PERCENT

1 7 58.3333 2 5 41.6667

Page 21: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Renaming the COUNT Variable

PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=NUMBER(RENAME=(COUNT=N_AGE) DROP=PERCENT);RUN;

Listing of Data Set NUMBER

AGE_GROUP N_AGE

1 7 2 5

Page 22: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Using PROC MEANS to Count Frequencies

PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS AGE_GROUP; VAR HR; /* ANY NUMERIC VARIABLE */ OUTPUT OUT=COUNTS(RENAME=(_FREQ_ = N_AGE) DROP=_TYPE_ DUMMY) N=DUMMY;RUN; Listing of Data Set COUNTS

AGE_GROUP N_AGE

1 7 2 5

Page 23: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Using PROC FREQ to Count Frequencies in a Two-way Table

PROC FREQ DATA=CLINIC NOPRINT; TABLES GENDER*BLOOD_TYPE / OUT=FREQOUT(DROP=PERCENT

RENAME=(COUNT=NUMBER));RUN; Listing of Data Set FREQOUT

BLOOD_GENDER TYPE NUMBER

F A 2 F B 2 F O 2 M A 2 M B 3 M O 1

Page 24: Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Using PROC FREQ to Output More than One Data Set

PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=OUT1; TABLES GENDER / OUT=OUT2; TABLES GENDER*AGE_GROUP / OUT=OUT3;RUN; Listing of Data Set OUT1

AGE_GROUP COUNT PERCENT

1 7 58.3333 2 5 41.6667----------------------------------------------------------------Listing of Data Set OUT2

GENDER COUNT PERCENT

F 6 50 M 6 50----------------------------------------------------------------Listing of Data Set OUT3

GENDER AGE_GROUP COUNT PERCENT

F 1 3 25.0000 F 2 3 25.0000 M 1 4 33.3333 M 2 2 16.6667