courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web...
Transcript of courses.education.illinois.educourses.education.illinois.edu/.../SAS_Graphics_Notes.docx · Web...
1
Edps/Soc 584, Psych 594C.J. Anderson
SAS for Graphics in Multivariate Analysis
The goals of this “lab” are to be able to produce graphics that generally are scatter plots where different groups are identified. These may be useful for exploratory analysis, after doing a principal components analysis, or to display results of a discriminate analysis.
Scatter Plot of Psych Test Data Showing all univariate and bivariate distributions:
The data on course web-site
DATA SCORES; INPUT sex Test1 Test2 Test3 Test4; datalines;1 15 17 24 14 1 17 15 32 26 1 15 14 29 23 1 13 12 10 16
title 'Univariate and Bivariate Distribution of Scores';proc sgscatter data=scores;matrix test1 test2 test3 test4;run;
2
Graph Post Principal Components Analysis
Example 1:
proc princomp data=scores cov out=prins ; var test1 test2 test3 test4;run;
/* Plot of first two component scores */goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext=2.25 ftext=swiss ctext= target= gaccess= gsfmode=;goptions device=win ;axis2 label=(angle=90 'Score on Component 2') ;axis1 label=('Score on Component 1') ;legend1 position=(top center outside) frame cshadow=pink label=none value=('Male' 'Female'); proc gplot data=prins; symbol1 v=dot i=none height=3; plot prin2*prin1=sex / legend=legend1 frame haxis=axis1 vaxis=axis2 href=0 vref=0; title 'PCA of Covariance Matrix of 4 Psychological Tests'; run;
3
Example 2: Using Text to identify Observations and getting Scree Plot and Variance explained from a PCA. This uses the European Jobs Data from 1979
data eurojobs; input Country $ Agr Min Man PS Con Ser Fin Soc TC; label Country= 'Name of country' Agr= 'Percent agriculture' Min= 'Percent mining ' Man= 'Percent manufacturing ' PS= 'Percent power supply industries ' Con= 'Percent construction ' Ser= 'Percent service industries ' Fin= 'Percent finance ' Soc= 'Percent social and personal services ' TC= 'Percent transport and communications '; datalines;Belgium 3.3 0.9 27.6 0.9 8.2 19.1 6.2 26.6 7.2Denmark 9.2 0.1 21.8 0.6 8.3 14.6 6.5 32.2 7.1France 10.8 0.8 27.5 0.9 8.9 16.8 6.0 22.6 5.7
ods graphics on; title 'PCA of Correlation Matrix(European Jobs)';proc princomp data=eurojobs out=compscor ; var Man Ser Soc ;run;ods graphics off;
/* Create Annotate Data Set */
4
data coor; set compscor; x = prin1; y = prin2; xsys = '2'; ysys = '2'; text = Country ; size = 1.3; label x = 'Score on Component 1' y = 'Score on Component 2'; keep x y text xsys ysys size;run;
/* Plot of first two component scores */goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext=2.25 ftext=swiss ctext= target= gaccess= gsfmode=;goptions device=win ;axis1 label=('Score on Component 1') ;axis2 label=(angle=90 'Score on Component 2') ;proc gplot data=coor; symbol1 v=none i=none; plot y*x=1 / annotate=coor frame haxis=axis1 vaxis=axis2 href=0 vref=0; title1 'Principal Components Analysis of Eurpean Jobs Data'; title2 'Covariance Matrix of 3 measures';run;
5
Display Discriminate Functions with Class Means and Data
We will use the GMAT data (from text) for this
data gmat; input decide $ person gpa gmat; datalines; admit 1 2.96 596 admit 2 3.14 473 admit 3 3.22 482.
ods graphics/ imagefmt=jpg;ods output LinearDiscFunc=line;proc candisc data=gmat distance anova out=outcan; class decide; var gpa gmat;run;
6
proc template; define statgraph scatter; begingraph; entrytitle 'GMAT'; layout overlayequated / equatetype=fit xaxisopts=(label='Canonical Variable 1') yaxisopts=(label='Canonical Variable 2'); scatterplot x=Can1 y=Can2 / group=Decide name='Decide'; layout gridded / autoalign=(topleft); discretelegend 'Decide' / border=false opaque=false; endlayout; endlayout; endgraph; end;run;
proc sgrender data=outcan template=scatter;run;
Compare the discrimant values with the data and with PCA.
* compare to--- data;proc sgplot data=gmat; scatter y=gmat x=gpa / group=decide;run;
* PCA;proc princomp data=gmat out=comscores; var gmat gpa;run;proc sgplot data=comscores; scatter y=prin2 x=prin1 / group=decide;run;
7
Taking a random sample: With large data sets there may be too many cases to a “see” what is what in a graph. In such cases, you can take a random sample and plot the sample. The code below would take a random sample of, for example, 100 cases from a data set “comscores” than contains principal components and saves them in a file named “selected”.
title ’A Sample of 100 cases are randomly selected’;proc surveyselect data=comscores method=srs n=100 out=selected; run;
General SAS Graphics commands:
ods graphics / imagefmt = jpg; or other format ods html path=” C:\Users\cja\Documents\edps584\SAS graphics”; Where to send
output
Plot for use with Profile Analysis or Similar Method
There will be two examples, basic simple one and a nicer one with standard errors as well as means.
data IQ; input sub dementia $ information simiarlities arithmetic picture; a1=1; a2=0; if senile='n' then a2=1; datalines; 1 n 7 5 9 8 2 n 8 8 5 6 3 n 16 18 11 9 4 n 8 3 7 9
8
proc sort data=IQ; by dementia;run;
* Need the means (& standard errors) to graph them;proc means data=IQ noprint; by dementia; var information simiarlities arithmetic picture; output out=mforplot mean=information similarties arithmetic picture stderr=seinformation sesimilarties searithmetic sepicture;run;
*The data came in a wide format, but we need a long format so Transpose data;proc transpose data=mforplot out=long; by dementia; var information similarties arithmetic picture;run;
* Optional: give new variables a name;data forplot; set long (rename=(COL1=mean _NAME_=subtest));run;
9
* Graph looks goofy without this;proc sort data=forplot; by subtest;run;
* This I got from using SAS/Solutions/ASSIST/GRAPHS/PLOTS & then edited;goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext=2 ftext=swiss ctext= target= gaccess= gsfmode= ;goptions device=WIN ctext=blue364 graphrc interpol=join;axis1 color=blue width=2.0 label=("WAIS Sub-test");axis2 color=blue width=2.0 label=(angle=90 "Sub-Test Means");legend1 label=('Dementia:') frame value=("No" "Yes") position=(inside top right);
title 'WAIS Sub-test means by Dementia Present/Absent';proc gplot data=WORK.FORPLOT; plot mean*subtest =dementia / haxis=axis1 vaxis=axis2 frame legend=legend1;run;
/**************************************************************** Plot with se bars—requires more data manipulation***************************************************************/
10
* Change data from a wide to a long format;proc transpose data=mforplot out=long2; by dementia; var seinformation sesimilarties searithmetic sepicture;run;
* Make sure order will be the same as in long;proc sort data=long2; by _NAME_ ;run;
* Put means and stderrors in single file;data forplot2; merge forplot long2 (rename=(COL1=stderr)); yvar= mean; output; yvar=mean-stderr; output; yvar=mean+stderr; output;run;
axis1 color=blue width=2.0 label=("WAIS Sub-test");axis2 color=blue width=2.0 label=(angle=90 "Sub-Test Means") order=0 to 15 by 5;
symbol1 interpol=hiloctj color=vibg line=1; symbol2 interpol=hiloctj color=depk line=2; symbol3 interpol=none color=vibg value=dot height=1.5; symbol4 interpol=none color=depk value=dot height=1.5;
title 'WAIS Sub-test means +/- 1 Standard dError'; proc gplot data=forplot2; plot yvar*subtest=dementia / haxis=axis1 vaxis=axis2 legend=legend1; plot2 mean*subtest=dementia / vaxis=axis2 noaxis nolegend; run;
--------
See the “Tips and Trick” document on course web-site.