Stat 342 - Wk 7

48
Stat 342 - Wk 7 (short-ish lecture today, I'm sick and jetlagged :( ) proc iml Defining matrices Computaons with matrices Bringing in other datasets into IML Opmizaon with IML The mechanics of 'call's Stat 342 Notes. Week 3, Page 1 / 48

Transcript of Stat 342 - Wk 7

Page 1: Stat 342 - Wk 7

Stat 342 - Wk 7

(short-ish lecture today, I'm sick and jetlagged :( )

proc iml

Defining matrices

Computations with matrices

Bringing in other datasets into IML

Optimization with IML

The mechanics of 'call's

Stat 342 Notes. Week 3, Page 1 / 48

Page 2: Stat 342 - Wk 7

About the 'aggregation' SQL question on the midterm, whichused the maximum() function.

The INTENDED solution output was to have proc sql return the line corresponding to the row of data which had the maximum for the variable in maximum()

Stat 342 Notes. Week 3, Page 2 / 48

Page 3: Stat 342 - Wk 7

The ACTUAL output would have been an error (native SQL), or nonsense (proc sql) because I forgot to aggregate the other variables involved.

As such, everyone gets a free 5/5 for that question.

Sorry about the confusion.

Stat 342 Notes. Week 3, Page 3 / 48

Page 4: Stat 342 - Wk 7

The IML (Interactive Matrix Language) procedure, called by 'proc iml', is like a data step and an SQL procedure in its high flexibility in two ways.

- It uses highly flexible scripting instead of a recipe like proc means, proc print, and proc freq. (Closer to data step than proc SQL)

- It is designed to create/manipulate data as well as print output.

Stat 342 Notes. Week 3, Page 4 / 48

Page 5: Stat 342 - Wk 7

Unlike the data step, in which each variable only represents a single value at any given time, a variable in IML could represent an entire vector or matrix.

Here, x is a vector of length 4 (or a 4x1 matrix if you prefer),

and y is a 2x2 matrix.

X = {10,20,30,50,100,200}

Y = {2 4 , 6 8}

Stat 342 Notes. Week 3, Page 5 / 48

Page 6: Stat 342 - Wk 7

X = {10 20 30 50 100 200}

10 20 30 50 100 200

Y = {2 4 , 6 8}

2 4

6 8

Stat 342 Notes. Week 3, Page 6 / 48

Page 7: Stat 342 - Wk 7

Each comma represents the creation of a new row. So, if we define that vector X with all commas, it becomes arranged vertically.

X = {10,20,30,50,100,200}

10

20

30

50

100

200

Stat 342 Notes. Week 3, Page 7 / 48

Page 8: Stat 342 - Wk 7

Overhead camera: 'By row' vs 'By column'.

Stat 342 Notes. Week 3, Page 8 / 48

Page 9: Stat 342 - Wk 7

SAS demonstration: Ragged Arrays

Stat 342 Notes. Week 3, Page 9 / 48

Page 10: Stat 342 - Wk 7

Among all the procedures, this one is the most R-like. R operates on arrays in a very similar fashion, where a single variable name can represent an entire array or matrix.

It also allows ranges of values to be defined with the colon :

just like R does

X = 3:7

3 4 5 6 7

Stat 342 Notes. Week 3, Page 10 / 48

Page 11: Stat 342 - Wk 7

Missing data can be included as well. This is especially usefulif you need to define the dimensions of a matrix before you know what the values of it are going to be.

X = {-4 -12 . . 3}

-4 -12 . . 3

Stat 342 Notes. Week 3, Page 11 / 48

Page 12: Stat 342 - Wk 7

Let's start the procedures.

The very simplest IML procedure should have

Stat 342 Notes. Week 3, Page 12 / 48

Page 13: Stat 342 - Wk 7

1) A matrix defintion

2) A print command

proc iml;

X = {4 5, 6 7};

print X;

quit;

Stat 342 Notes. Week 3, Page 13 / 48

Page 14: Stat 342 - Wk 7

Print can also refer to multiple matrices, in which case they are output in the order listed.

proc iml;

X = {4 5, 6 7};

Y = {1 2 3, 700 888};

print Y, X;

quit;

Stat 342 Notes. Week 3, Page 14 / 48

Page 15: Stat 342 - Wk 7

Without a comma, print will combine the two tables in a single one, putting the second one on the right of the first one.

(Try this out with a few cases in SAS before moving on.)

proc iml;

X = {4 5, 6 7};

Y = {1 2 3, 700 888};

print Y X;quit;

Stat 342 Notes. Week 3, Page 15 / 48

Page 16: Stat 342 - Wk 7

Like the 'output' command, if 'print' is called twice, it's run twice. (As compared to calls like 'call streaminit()', which only runs once, even if it's in a do loop.

proc iml;

X = {4 5, 6 7};

print X;

print X;

quit;

Stat 342 Notes. Week 3, Page 16 / 48

Page 17: Stat 342 - Wk 7

Also take note of the 'quit' ending. This is used instead of 'run' because IML doesn't have to be compiled before use.

This is yet another similarity between R and IML.

proc iml;

X = {4 5, 6 7};

print X;

quit;

Stat 342 Notes. Week 3, Page 17 / 48

Page 18: Stat 342 - Wk 7

Calculations involving more than one matrix are done on every element within the matrix. There are different ways to define what this means.

There are two styles used in IML:Element-wise, a computationally straightforward method.

Matrix-wise, a method following common mathematical operations.

Stat 342 Notes. Week 3, Page 18 / 48

Page 19: Stat 342 - Wk 7

Let there be two matrices, X and Y, of the same dimensions. 'Element-wise' means...

- The first element of X and the first element of Y are calculated together.

- The result is put in the first element of the output, which is also a matrix.

- The second element of X and the second element of Y are calculated together.

- Repeat for each element.

Stat 342 Notes. Week 3, Page 19 / 48

Page 20: Stat 342 - Wk 7

Addition and subtraction are the same in element-wise and matrix-wise methods. Multiplication and division are not.

Method Operation Symbol in SAS/IML

Symbol in R

Element, Matrix Addition + +

Element, Matrix Subtraction - -

Element Multiplication # *

Element Division / /

Matrix Multiplication * %*%

Stat 342 Notes. Week 3, Page 20 / 48

Page 21: Stat 342 - Wk 7

Document camera: Element-wise operations (space 1/2)

Addition, multiplication, and scalars.

Stat 342 Notes. Week 3, Page 21 / 48

Page 22: Stat 342 - Wk 7

Space left for manipulating the matrix (and break).

Stat 342 Notes. Week 3, Page 22 / 48

Page 23: Stat 342 - Wk 7

For element-wise operations, missing values in one element only affect the results for that element.

Also, you can chain together more than two operations.

proc iml;

height = {14,10,8,6,4};

width = {8,2,-3,.,6};

depth = {4,3,5,2,.};

volume = height # width # depth;

quit;

Stat 342 Notes. Week 3, Page 23 / 48

Page 24: Stat 342 - Wk 7

For matrix operations, things are a lot more complex. In matrix multiplication, each element of the result is the innerproduct of a row from one matrix and a column of the other.

1x1 + 4x2 + 6x9 + 10x3 = 93

1 + 8 + 54 + 30 = 93

Stat 342 Notes. Week 3, Page 24 / 48

Page 25: Stat 342 - Wk 7

Each element in A and B is used in multiple elements of the resulting product, so even a single missing value can proliferate into several missing values in the product.

Try for yourself and see. (These are the matrices from the previous diagram

Stat 342 Notes. Week 3, Page 25 / 48

Page 26: Stat 342 - Wk 7

proc iml;

A = {1 4 6 10 , 2 . 5 3};

B = {1 4 6 , 2 7 5 , 9 0 11 , 3 1 0};

product = A * B;

print A, B, product;

quit;

For element-wise operations, the only requirement is that the matrices have the same dimensions.

Matrix multiplication has a different requirement, that

the number of columns in the first matrix match Stat 342 Notes. Week 3, Page 26 / 48

Page 27: Stat 342 - Wk 7

the number of the rows in the second matrix.

In other terms, product = A * B; and

product = B * A; Produce very different answers and havedifferent requirements.

There are other matrix operations too, which are functions in SAS/IML. The t() function transposes a matrix, which is akin to flipping it on its side (bottom-left to top-right).

proc iml;

Stat 342 Notes. Week 3, Page 27 / 48

Page 28: Stat 342 - Wk 7

A = {1 4 6 10 , 2 . 5 3};

B = {1 4 6 , 2 7 5 , 9 0 11 , 3 1 0};

AT = t(A); BT = t(B);

print A, B, AT, BT;

quit;

The inv() function can be used to find the inverse of a squarematrix, if one exists. This is also how you do a division between two matrices; the matrix version of A/B is A*inv(B).

proc iml;

Stat 342 Notes. Week 3, Page 28 / 48

Page 29: Stat 342 - Wk 7

A = {1 2 , 3 4};

B = {0 1 , 2 3};

Ainv = inv(A); BoverA = inv(A) * B;

identity = inv(A) * A;

print A, B, Ainv, BoverA, identity;

quit;

The inverse function only makes sense for square matrices, and sometimes there simply is no inverse. It isn't a fault of the code.

proc iml;

Stat 342 Notes. Week 3, Page 29 / 48

Page 30: Stat 342 - Wk 7

A = {1 2 3, 3 4 5};

B = {1 1 , 1 1};

Ainv = inv(A); Binv = inv(B);

print A, B, Ainv, Binv;

quit;

Other functions include

solve(A,B) for matrices A and B, which returns the vector x, where Ax = B.

Stat 342 Notes. Week 3, Page 30 / 48

Page 31: Stat 342 - Wk 7

The details of matrix operations are much better covered in a linear algebra / matrix algebra course.

There are also hybrid-wise operations, which we won't coverin detail. These are element-wise operations where the same value is repeated several times.

These are really just computational and coding shortcuts for doing element-wise computations.

Stat 342 Notes. Week 3, Page 31 / 48

Page 32: Stat 342 - Wk 7

Example, document camera, then SAS:

proc iml;

xy = {-4 10,

2 6,

8 8};

row_mean = {3, 5, 8};

col_mean = {2 8};

center_x = xy – row_mean;

center_y = xy – col_mean;

print xy center_x center_y;

quit;

Stat 342 Notes. Week 3, Page 32 / 48

Page 33: Stat 342 - Wk 7

In the previous example, row_mean and col_mean could have been

row_mean = {3 3 ,

5 5 ,

8 8};

col_mean = {2 8 ,

2 8 ,

2 8};

and produced the same results.

Stat 342 Notes. Week 3, Page 33 / 48

Page 34: Stat 342 - Wk 7

Mind your commas, hybrids can get messy.

Stat 342 Notes. Week 3, Page 34 / 48

Page 35: Stat 342 - Wk 7

The shape() function.

shape() is used to quickly set the dimensions of a matrix without having to set the commas yourself. It works with anyrectangular matrix.

Shape() outputs a new matrix, it does not change the input matrix A. You can save the new one, but this allows you try different dimensions for A without changing the original data.

Stat 342 Notes. Week 3, Page 35 / 48

Page 36: Stat 342 - Wk 7

shape(A,n) formats (shapes) the values of A into n rows and m columns, where m is the number of elements in x divided by n.

proc iml;

x = 1:6;

onerow = shape(x,1);

onecol = shape(x,6);

other = shape(x,3);

print onerow onecol other;

quit;

Stat 342 Notes. Week 3, Page 36 / 48

Page 37: Stat 342 - Wk 7

shape(A,n,m) formats (shapes) the values of A into n rows and m columns, where m is user-defined.

proc iml;

x = 1:6;

m1 = shape(x,2,3);

m2 = shape(x,3,2);

m3 = shape(x,2);

print m1, m2, m3;

quit;

Stat 342 Notes. Week 3, Page 37 / 48

Page 38: Stat 342 - Wk 7

SAS Demo 1: So what happens when n*m is more than the number of elements in x? Or less?

proc iml;

x = 1:6;

more = shape(x,2,4);

less = shape(x,2,2);

print more, less;

quit;

Stat 342 Notes. Week 3, Page 38 / 48

Page 39: Stat 342 - Wk 7

SAS Demo 2: Can this be used to define a matrix before it's populated (Before it's filled with numbers)?

proc iml;

just0 = {0}; justNA = {.};

all0 = shape(just0,3,5);

allNA = shape(justNA,3,5);

print all0, allNA;

quit;

Stat 342 Notes. Week 3, Page 39 / 48

Page 40: Stat 342 - Wk 7

Reading other datasets.

If you want to use an existing dataset as a matrix, rather than setting your own manually, you can do this with the 'use' and 'close' commands, combined with the 'read' command.

'Use' is used to tell the system to bring a certain dataset into memory. 'Close' is to wipe it from memory (useful for keeping things running smooth).

Read is used to take that dataset in memory, and save it into a matrix.

Stat 342 Notes. Week 3, Page 40 / 48

Page 41: Stat 342 - Wk 7

The general syntax is

use <DATASET>;

read all var <VARIABLE NAMES> into <MATRIX NAME>;

close <DATASET>;

Stat 342 Notes. Week 3, Page 41 / 48

Page 42: Stat 342 - Wk 7

Use (and close) use the same syntax for specifying a dataset as other SAS procedures. The dataset is specificed by libname.dataset, and addition options like (obs = ) can be used to specify which rows to load into memory.

use ds1;

use work.ds1;

use somelib.cars;

use somelib.cars(OBS = 10);

Stat 342 Notes. Week 3, Page 42 / 48

Page 43: Stat 342 - Wk 7

Close is the same, except that the options are not necessary in the close statement.

close ds1;

close work.ds1;

close somelib.cars;

close somelib.cars;

Stat 342 Notes. Week 3, Page 43 / 48

Page 44: Stat 342 - Wk 7

With the read command, references to entire classes of variable names still work, including _ALL_ , _NUM_ , and _CHAR_ for all variables, numeric ones, and character ones, respectively.

proc iml;

use SASHELP.Cars(OBS=5);

read all var _NUM_ into m1;

read all var _NUM_ into m2[colname=NumericNames];

close SASHELP.Cars;

print m1, m2;

Stat 342 Notes. Week 3, Page 44 / 48

Page 45: Stat 342 - Wk 7

Matrices are saved as SAS datasets with the create command and the append command.

'Create' makes a new SAS data set of whatever library and name you specify. However, the create command alone only makes a blank dataset (with whatever formatting you specify).

'Append' takes data from some matrix in the IML environment and adds (appends) it to the particular dataset,such as the one you just made with 'create'.

Stat 342 Notes. Week 3, Page 45 / 48

Page 46: Stat 342 - Wk 7

The general syntax is...

create <DATASET> from <MATRIX>;

append from <MATRIX>;

close <DATASET>;

Like when loading, it's a good idea to close your datasets after you're done with them.

Stat 342 Notes. Week 3, Page 46 / 48

Page 47: Stat 342 - Wk 7

Example of saving data, and seeing with with a proc print;

proc iml;

m = {1 2 3 . , 5 6 7 999};

m = t(m);

create newDS from m[colname={"x","y"}];

append from m;

close work.newDS;

quit;

proc print (data=newDS); run;

Stat 342 Notes. Week 3, Page 47 / 48

Page 48: Stat 342 - Wk 7

Readings on proc iml

Textbook source: Pages 62-68 of 'SAS and R'.

Additional source for interest:

SAS Whitepaper 144-2013 - Getting started with the SAS/IML Language, by Rick Wilkin

https://support.sas.com/resources/papers/proceedings13/144-2013.pdf

Stat 342 Notes. Week 3, Page 48 / 48