courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web...

13
1 Edps/Soc 584, Psych 594 C.J. Anderson SAS for Graphics in Multivariate Analysis The goals of this “lab” are to be able to produce graphics that generally are scatter plots where different groups are identified. These may be useful for exploratory analysis, after doing a principal components analysis, or to display results of a discriminate analysis. Scatter Plot of Psych Test Data Showing all univariate and bivariate distributions: The data on course web-site DATA SCORES; INPUT sex Test1 Test2 Test3 Test4; datalines ; 1 15 17 24 14 1 17 15 32 26 1 15 14 29 23 1 13 12 10 16 title 'Univariate and Bivariate Distribution of Scores' ; proc sgscatter data =scores; matrix test1 test2 test3 test4; run ;

Transcript of courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web...

Page 1: courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web viewtitle 'PCA of Covariance Matrix of 4 Psychological Tests'; run; Example 2: Using

1

Edps/Soc 584, Psych 594C.J. Anderson

SAS for Graphics in Multivariate Analysis

The goals of this “lab” are to be able to produce graphics that generally are scatter plots where different groups are identified. These may be useful for exploratory analysis, after doing a principal components analysis, or to display results of a discriminate analysis.

Scatter Plot of Psych Test Data Showing all univariate and bivariate distributions:

The data on course web-site

DATA SCORES; INPUT sex Test1 Test2 Test3 Test4; datalines;1 15 17 24 14 1 17 15 32 26 1 15 14 29 23 1 13 12 10 16

title 'Univariate and Bivariate Distribution of Scores';proc sgscatter data=scores;matrix test1 test2 test3 test4;run;

Page 2: courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web viewtitle 'PCA of Covariance Matrix of 4 Psychological Tests'; run; Example 2: Using

2

Graph Post Principal Components Analysis

Example 1:

proc princomp data=scores cov out=prins ; var test1 test2 test3 test4;run;

/* Plot of first two component scores */goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext=2.25 ftext=swiss ctext= target= gaccess= gsfmode=;goptions device=win ;axis2 label=(angle=90 'Score on Component 2') ;axis1 label=('Score on Component 1') ;legend1 position=(top center outside) frame cshadow=pink label=none value=('Male' 'Female'); proc gplot data=prins; symbol1 v=dot i=none height=3; plot prin2*prin1=sex / legend=legend1 frame haxis=axis1 vaxis=axis2 href=0 vref=0; title 'PCA of Covariance Matrix of 4 Psychological Tests'; run;

Page 3: courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web viewtitle 'PCA of Covariance Matrix of 4 Psychological Tests'; run; Example 2: Using

3

Example 2: Using Text to identify Observations and getting Scree Plot and Variance explained from a PCA. This uses the European Jobs Data from 1979

data eurojobs; input Country $ Agr Min Man PS Con Ser Fin Soc TC; label Country= 'Name of country' Agr= 'Percent agriculture' Min= 'Percent mining ' Man= 'Percent manufacturing ' PS= 'Percent power supply industries ' Con= 'Percent construction ' Ser= 'Percent service industries ' Fin= 'Percent finance ' Soc= 'Percent social and personal services ' TC= 'Percent transport and communications '; datalines;Belgium 3.3 0.9 27.6 0.9 8.2 19.1 6.2 26.6 7.2Denmark 9.2 0.1 21.8 0.6 8.3 14.6 6.5 32.2 7.1France 10.8 0.8 27.5 0.9 8.9 16.8 6.0 22.6 5.7

ods graphics on; title 'PCA of Correlation Matrix(European Jobs)';proc princomp data=eurojobs out=compscor ; var Man Ser Soc ;run;ods graphics off;

/* Create Annotate Data Set */

Page 4: courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web viewtitle 'PCA of Covariance Matrix of 4 Psychological Tests'; run; Example 2: Using

4

data coor; set compscor; x = prin1; y = prin2; xsys = '2'; ysys = '2'; text = Country ; size = 1.3; label x = 'Score on Component 1' y = 'Score on Component 2'; keep x y text xsys ysys size;run;

/* Plot of first two component scores */goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext=2.25 ftext=swiss ctext= target= gaccess= gsfmode=;goptions device=win ;axis1 label=('Score on Component 1') ;axis2 label=(angle=90 'Score on Component 2') ;proc gplot data=coor; symbol1 v=none i=none; plot y*x=1 / annotate=coor frame haxis=axis1 vaxis=axis2 href=0 vref=0; title1 'Principal Components Analysis of Eurpean Jobs Data'; title2 'Covariance Matrix of 3 measures';run;

Page 5: courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web viewtitle 'PCA of Covariance Matrix of 4 Psychological Tests'; run; Example 2: Using

5

Display Discriminate Functions with Class Means and Data

We will use the GMAT data (from text) for this

data gmat; input decide $ person gpa gmat; datalines; admit 1 2.96 596 admit 2 3.14 473 admit 3 3.22 482.

ods graphics/ imagefmt=jpg;ods output LinearDiscFunc=line;proc candisc data=gmat distance anova out=outcan; class decide; var gpa gmat;run;

Page 6: courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web viewtitle 'PCA of Covariance Matrix of 4 Psychological Tests'; run; Example 2: Using

6

proc template; define statgraph scatter; begingraph; entrytitle 'GMAT'; layout overlayequated / equatetype=fit xaxisopts=(label='Canonical Variable 1') yaxisopts=(label='Canonical Variable 2'); scatterplot x=Can1 y=Can2 / group=Decide name='Decide'; layout gridded / autoalign=(topleft); discretelegend 'Decide' / border=false opaque=false; endlayout; endlayout; endgraph; end;run;

proc sgrender data=outcan template=scatter;run;

Compare the discrimant values with the data and with PCA.

* compare to--- data;proc sgplot data=gmat; scatter y=gmat x=gpa / group=decide;run;

* PCA;proc princomp data=gmat out=comscores; var gmat gpa;run;proc sgplot data=comscores; scatter y=prin2 x=prin1 / group=decide;run;

Page 7: courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web viewtitle 'PCA of Covariance Matrix of 4 Psychological Tests'; run; Example 2: Using

7

Taking a random sample: With large data sets there may be too many cases to a “see” what is what in a graph. In such cases, you can take a random sample and plot the sample. The code below would take a random sample of, for example, 100 cases from a data set “comscores” than contains principal components and saves them in a file named “selected”.

title ’A Sample of 100 cases are randomly selected’;proc surveyselect data=comscores method=srs n=100 out=selected; run;

General SAS Graphics commands:

ods graphics / imagefmt = jpg; or other format ods html path=” C:\Users\cja\Documents\edps584\SAS graphics”; Where to send

output

Plot for use with Profile Analysis or Similar Method

There will be two examples, basic simple one and a nicer one with standard errors as well as means.

data IQ; input sub dementia $ information simiarlities arithmetic picture; a1=1; a2=0; if senile='n' then a2=1; datalines; 1 n 7 5 9 8 2 n 8 8 5 6 3 n 16 18 11 9 4 n 8 3 7 9

Page 8: courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web viewtitle 'PCA of Covariance Matrix of 4 Psychological Tests'; run; Example 2: Using

8

proc sort data=IQ; by dementia;run;

* Need the means (& standard errors) to graph them;proc means data=IQ noprint; by dementia; var information simiarlities arithmetic picture; output out=mforplot mean=information similarties arithmetic picture stderr=seinformation sesimilarties searithmetic sepicture;run;

*The data came in a wide format, but we need a long format so Transpose data;proc transpose data=mforplot out=long; by dementia; var information similarties arithmetic picture;run;

* Optional: give new variables a name;data forplot; set long (rename=(COL1=mean _NAME_=subtest));run;

Page 9: courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web viewtitle 'PCA of Covariance Matrix of 4 Psychological Tests'; run; Example 2: Using

9

* Graph looks goofy without this;proc sort data=forplot; by subtest;run;

* This I got from using SAS/Solutions/ASSIST/GRAPHS/PLOTS & then edited;goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext=2 ftext=swiss ctext= target= gaccess= gsfmode= ;goptions device=WIN ctext=blue364 graphrc interpol=join;axis1 color=blue width=2.0 label=("WAIS Sub-test");axis2 color=blue width=2.0 label=(angle=90 "Sub-Test Means");legend1 label=('Dementia:') frame value=("No" "Yes") position=(inside top right);

title 'WAIS Sub-test means by Dementia Present/Absent';proc gplot data=WORK.FORPLOT; plot mean*subtest =dementia / haxis=axis1 vaxis=axis2 frame legend=legend1;run;

/**************************************************************** Plot with se bars—requires more data manipulation***************************************************************/

Page 10: courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web viewtitle 'PCA of Covariance Matrix of 4 Psychological Tests'; run; Example 2: Using

10

* Change data from a wide to a long format;proc transpose data=mforplot out=long2; by dementia; var seinformation sesimilarties searithmetic sepicture;run;

* Make sure order will be the same as in long;proc sort data=long2; by _NAME_ ;run;

* Put means and stderrors in single file;data forplot2; merge forplot long2 (rename=(COL1=stderr)); yvar= mean; output; yvar=mean-stderr; output; yvar=mean+stderr; output;run;

axis1 color=blue width=2.0 label=("WAIS Sub-test");axis2 color=blue width=2.0 label=(angle=90 "Sub-Test Means") order=0 to 15 by 5;

symbol1 interpol=hiloctj color=vibg line=1; symbol2 interpol=hiloctj color=depk line=2; symbol3 interpol=none color=vibg value=dot height=1.5; symbol4 interpol=none color=depk value=dot height=1.5;

title 'WAIS Sub-test means +/- 1 Standard dError'; proc gplot data=forplot2; plot yvar*subtest=dementia / haxis=axis1 vaxis=axis2 legend=legend1; plot2 mean*subtest=dementia / vaxis=axis2 noaxis nolegend; run;

--------

See the “Tips and Trick” document on course web-site.