Stat 342 - Wk 7

Stat 342 - Wk 7

(short-ish lecture today, I'm sick and jetlagged :( )

proc iml

Defining matrices

Computations with matrices

Bringing in other datasets into IML

Optimization with IML

The mechanics of 'call's

Stat 342 Notes. Week 3, / 48

About the 'aggregation' SQL question on the midterm, whichused the maximum() function.

The INTENDED solution output was to have proc sql return the line corresponding to the row of data which had the maximum for the variable in maximum()


The ACTUAL output would have been an error (native SQL), or nonsense (proc sql) because I forgot to aggregate the other variables involved.

As such, everyone gets a free 5/5 for that question.

Sorry about the confusion.


The IML (Interactive Matrix Language) procedure, called by 'proc iml', is like a data step and an SQL procedure in its high flexibility in two ways.

- It uses highly flexible scripting instead of a recipe like proc means, proc print, and proc freq. (Closer to data step than proc SQL)

- It is designed to create/manipulate data as well as print output.


Unlike the data step, in which each variable only represents a single value at any given time, a variable in IML could represent an entire vector or matrix.

Here, x is a vector of length 4 (or a 4x1 matrix if you prefer),

and y is a 2x2 matrix.

X = {10,20,30,50,100,200}

Y = {2 4 , 6 8}


X = {10 20 30 50 100 200}

10 20 30 50 100 200

Y = {2 4 , 6 8}

2 4

6 8


Each comma represents the creation of a new row. So, if we define that vector X with all commas, it becomes arranged vertically.

X = {10,20,30,50,100,200}

10

20

30

50

100

200


Overhead camera: 'By row' vs 'By column'.


SAS demonstration: Ragged Arrays


Among all the procedures, this one is the most R-like. R operates on arrays in a very similar fashion, where a single variable name can represent an entire array or matrix.

It also allows ranges of values to be defined with the colon :

just like R does

X = 3:7

3 4 5 6 7


Missing data can be included as well. This is especially usefulif you need to define the dimensions of a matrix before you know what the values of it are going to be.

X = {-4 -12 . . 3}

-4 -12 . . 3


Let's start the procedures.

The very simplest IML procedure should have


1) A matrix defintion

2) A print command

proc iml;

X = {4 5, 6 7};

print X;

quit;


Print can also refer to multiple matrices, in which case they are output in the order listed.

proc iml;

X = {4 5, 6 7};

Y = {1 2 3, 700 888};

print Y, X;

quit;


Without a comma, print will combine the two tables in a single one, putting the second one on the right of the first one.

(Try this out with a few cases in SAS before moving on.)

proc iml;

X = {4 5, 6 7};

Y = {1 2 3, 700 888};

print Y X;quit;


Like the 'output' command, if 'print' is called twice, it's run twice. (As compared to calls like 'call streaminit()', which only runs once, even if it's in a do loop.

proc iml;

X = {4 5, 6 7};

print X;

print X;

quit;


Also take note of the 'quit' ending. This is used instead of 'run' because IML doesn't have to be compiled before use.

This is yet another similarity between R and IML.

proc iml;

X = {4 5, 6 7};

print X;

quit;


Calculations involving more than one matrix are done on every element within the matrix. There are different ways to define what this means.

There are two styles used in IML:Element-wise, a computationally straightforward method.

Matrix-wise, a method following common mathematical operations.


Let there be two matrices, X and Y, of the same dimensions. 'Element-wise' means...

- The first element of X and the first element of Y are calculated together.

- The result is put in the first element of the output, which is also a matrix.

- The second element of X and the second element of Y are calculated together.

- Repeat for each element.


Addition and subtraction are the same in element-wise and matrix-wise methods. Multiplication and division are not.

Method Operation Symbol in SAS/IML

Symbol in R

Element, Matrix Addition + +

Element, Matrix Subtraction - -

Element Multiplication # *

Element Division / /

Matrix Multiplication * %*%


Document camera: Element-wise operations (space 1/2)

Addition, multiplication, and scalars.


Space left for manipulating the matrix (and break).


For element-wise operations, missing values in one element only affect the results for that element.

Also, you can chain together more than two operations.

proc iml;

height = {14,10,8,6,4};

width = {8,2,-3,.,6};

depth = {4,3,5,2,.};

volume = height # width # depth;

quit;


For matrix operations, things are a lot more complex. In matrix multiplication, each element of the result is the innerproduct of a row from one matrix and a column of the other.

1x1 + 4x2 + 6x9 + 10x3 = 93

1 + 8 + 54 + 30 = 93


Each element in A and B is used in multiple elements of the resulting product, so even a single missing value can proliferate into several missing values in the product.

Try for yourself and see. (These are the matrices from the previous diagram


proc iml;

A = {1 4 6 10 , 2 . 5 3};

B = {1 4 6 , 2 7 5 , 9 0 11 , 3 1 0};

product = A * B;

print A, B, product;

quit;

For element-wise operations, the only requirement is that the matrices have the same dimensions.

Matrix multiplication has a different requirement, that

the number of columns in the first matrix match Stat 342 Notes. Week 3, / 48

the number of the rows in the second matrix.

In other terms, product = A * B; and

product = B * A; Produce very different answers and havedifferent requirements.

There are other matrix operations too, which are functions in SAS/IML. The t() function transposes a matrix, which is akin to flipping it on its side (bottom-left to top-right).

proc iml;


A = {1 4 6 10 , 2 . 5 3};

B = {1 4 6 , 2 7 5 , 9 0 11 , 3 1 0};

AT = t(A); BT = t(B);

print A, B, AT, BT;

quit;

The inv() function can be used to find the inverse of a squarematrix, if one exists. This is also how you do a division between two matrices; the matrix version of A/B is A*inv(B).

proc iml;


A = {1 2 , 3 4};

B = {0 1 , 2 3};

Ainv = inv(A); BoverA = inv(A) * B;

identity = inv(A) * A;

print A, B, Ainv, BoverA, identity;

quit;

The inverse function only makes sense for square matrices, and sometimes there simply is no inverse. It isn't a fault of the code.

proc iml;


A = {1 2 3, 3 4 5};

B = {1 1 , 1 1};

Ainv = inv(A); Binv = inv(B);

print A, B, Ainv, Binv;

quit;

Other functions include

solve(A,B) for matrices A and B, which returns the vector x, where Ax = B.


The details of matrix operations are much better covered in a linear algebra / matrix algebra course.

There are also hybrid-wise operations, which we won't coverin detail. These are element-wise operations where the same value is repeated several times.

These are really just computational and coding shortcuts for doing element-wise computations.


Example, document camera, then SAS:

proc iml;

xy = {-4 10,

2 6,

8 8};

row_mean = {3, 5, 8};

col_mean = {2 8};

center_x = xy – row_mean;

center_y = xy – col_mean;

print xy center_x center_y;

quit;


In the previous example, row_mean and col_mean could have been

row_mean = {3 3 ,

5 5 ,

8 8};

col_mean = {2 8 ,

2 8 ,

2 8};

and produced the same results.


Mind your commas, hybrids can get messy.


The shape() function.

shape() is used to quickly set the dimensions of a matrix without having to set the commas yourself. It works with anyrectangular matrix.

Shape() outputs a new matrix, it does not change the input matrix A. You can save the new one, but this allows you try different dimensions for A without changing the original data.


shape(A,n) formats (shapes) the values of A into n rows and m columns, where m is the number of elements in x divided by n.

proc iml;

x = 1:6;

onerow = shape(x,1);

onecol = shape(x,6);

other = shape(x,3);

print onerow onecol other;

quit;


shape(A,n,m) formats (shapes) the values of A into n rows and m columns, where m is user-defined.

proc iml;

x = 1:6;

m1 = shape(x,2,3);

m2 = shape(x,3,2);

m3 = shape(x,2);

print m1, m2, m3;

quit;


SAS Demo 1: So what happens when n*m is more than the number of elements in x? Or less?

proc iml;

x = 1:6;

more = shape(x,2,4);

less = shape(x,2,2);

print more, less;

quit;


SAS Demo 2: Can this be used to define a matrix before it's populated (Before it's filled with numbers)?

proc iml;

just0 = {0}; justNA = {.};

all0 = shape(just0,3,5);

allNA = shape(justNA,3,5);

print all0, allNA;

quit;


Reading other datasets.

If you want to use an existing dataset as a matrix, rather than setting your own manually, you can do this with the 'use' and 'close' commands, combined with the 'read' command.

'Use' is used to tell the system to bring a certain dataset into memory. 'Close' is to wipe it from memory (useful for keeping things running smooth).

Read is used to take that dataset in memory, and save it into a matrix.


The general syntax is

use <DATASET>;

read all var <VARIABLE NAMES> into <MATRIX NAME>;

close <DATASET>;


Use (and close) use the same syntax for specifying a dataset as other SAS procedures. The dataset is specificed by libname.dataset, and addition options like (obs = ) can be used to specify which rows to load into memory.

use ds1;

use work.ds1;

use somelib.cars;

use somelib.cars(OBS = 10);


Close is the same, except that the options are not necessary in the close statement.

close ds1;

close work.ds1;

close somelib.cars;

close somelib.cars;


With the read command, references to entire classes of variable names still work, including _ALL_ , _NUM_ , and _CHAR_ for all variables, numeric ones, and character ones, respectively.

proc iml;

use SASHELP.Cars(OBS=5);

read all var _NUM_ into m1;

read all var _NUM_ into m2[colname=NumericNames];

close SASHELP.Cars;

print m1, m2;


Matrices are saved as SAS datasets with the create command and the append command.

'Create' makes a new SAS data set of whatever library and name you specify. However, the create command alone only makes a blank dataset (with whatever formatting you specify).

'Append' takes data from some matrix in the IML environment and adds (appends) it to the particular dataset,such as the one you just made with 'create'.


The general syntax is...

create <DATASET> from <MATRIX>;

append from <MATRIX>;

close <DATASET>;

Like when loading, it's a good idea to close your datasets after you're done with them.


Example of saving data, and seeing with with a proc print;

proc iml;

m = {1 2 3 . , 5 6 7 999};

m = t(m);

create newDS from m[colname={"x","y"}];

append from m;

close work.newDS;

quit;

proc print (data=newDS); run;


Readings on proc iml

Textbook source: Pages 62-68 of 'SAS and R'.

Additional source for interest:

SAS Whitepaper 144-2013 - Getting started with the SAS/IML Language, by Rick Wilkin

https://support.sas.com/resources/papers/proceedings13/144-2013.pdf




Stat 342 - Wk 7

Documents

Transcript of Stat 342 - Wk 7