Are the Folded F-test and Bartlett’s equivalent?james/STAT579-F18/SAS14.pdf · Megadeth...

Are the Folded F-test and Bartlett’s equivalent?

I can think of two ways to solve this problem: (1) Look up what theyactually are. (2) Simulate under many conditions and look closely at theresults.

Looking at a UCLA website athttp://www.ats.ucla.edu/stat/sas/output/ttest.htm, the foldedF test is defined by

F ′ = max(s21 , s22 )/min(s

21 , s

22 )

where s2i refers to the sample variance for group i . For k groups, theBartlett test statistic ishttp://www.itl.nist.gov/div898/handbook/eda/section3/eda357.htm

T =(N − k) log s2p −

∑ki=1(Ni − 1) log s2i

1 + 13(k−1)

(∑ki=1

1Ni−1 −

1N−k

)SAS Programming November 20, 2014 1 / 76


As test statistics they look quite different, but it’s more difficult to tellwhether they will generate different p-values because. Bartlett’s test iscompared to a χ2 distribution with k − 1 degrees of freedom. The FoldedF test is based on an F distribution. It is called folded because the largervariance is always placed in the numerator, thus making the F statisticconstrained to be greater than or equal to 1.

Why are the p-values so close, you ask?

SAS Programming November 20, 2014 2 / 76


The F test and the χ2 distributions in that the F distribution has a

denominator and a numerator degrees of freedom and Fm,nd→ χ2m as

n→∞. The arrow means “convergence in distribution”, somethingslightly different from the usual limits you study in Calculus, butsomething you’ll study in Casella and Berger. Basically this means thatP(Fm,n > x) ≈ P(χ2m > x) for large n, which also means that theygenerate very similar p-values, but they aren’t equivalent for finite n, evenif they are close.

If you are graphing, they might appear to be identical due to limitedresolution, so it is good to look at the numbers themselves to see howclose they are.



How to investigate this with simulation? We might guess that as thesample size increases, then the tests are more likely to be similar, so we’lltry with a small sample size. Unequal sample sizes might also have aneffect, so I’ll try three observations in one group and 10 in the other.

Both definitions look like they easily generalize to more than two groups,for example you could use

max(s21 , . . . , s2k )/min(s

21 , . . . , s

2k )

as a test statistic. If all variances are the same, this should still be close to1. The distribution is now not clear, so you could either use theory toderive the distribution (might be hard!) or simulate the distribution



Let’s simulate to see how different the p-values are for the Folded-F versusBartlett’s



So the type 1 error rate was different between the two tests. Note thatalthough the Folded F was slightly elevated above 0.05, this was notsignificant since .056− 1.96 ∗ sqrt(.05 ∗ .95/1000) = .042.

I also tried equal samples sizes of 10 per group, and there the p-valuesdisagree only in the 4th decimal place. In 1000 iterations, it wouldn’t besurprising if they always agreed on whether or not to reject the null.Looking at the actual p-values instead of the power gives you moreresolution for what is going on.



If I use 3 observations in each group, then the tests appear to be closer,but still not identical.


HW hints: How to plot two identical curves on top of eachother?

Metallica recurrence data.


HW hints

Rolling Stones recurrence data.


HW hints

Megadeth recurrence data (much easier colors to read...)


HW hints: how to plot two nearly identical curves on topof each other


PROC IML! PROC IML! PROC IML!

SAS/IML is the Interactive Matrix Language. It is probably the last topicof programming to cover in the course. After this, we will look at othertopics using more statistical procedures from SAS.

SAS/IML, like macros, can be used To Boldly Go Where No SAS ProcHas Gone Before...that is, it is partly used to give SAS programmers moreflexibility and to implement new techniques that haven’t found their wayinto SAS procedures yet.


PROC IML! PROC IML! PROC IML!

Another use for learning SAS/IML is to program with matrices is toimprove your knowledge of statistics. This can be done with R orMATLAB just as (or perhaps more) easily than with SAS. It is often toodifficult to work out multiple regression estimates by hand for instance,but instead of just relying on a canned procedure to get the answers foryou, you can implement the matrices involved yourself and see if you canduplicate the results from programs like SAS or R or STATA. This can giveyou better appreciation for the methods and involved and deeperknowledge of what’s going on under the hood of these programs.


PROC IML basics

We’ll start by just doing some of the basics of how to use PROC IML.Despite the “interactive” title, it works pretty much like other SASprocedures, with typing in the Code editor and viewing the results in theResults viewer.

The syntax for entering vectors and matrices is more similar to MATLABthan R. Here are some basic examples that show assigning values, vectors,and matrices and displaying their properties.Something that might be annoying is that you have to tell it PROC IML;reset log print each time you run it. If you just highlight your latestbit of code and run it, it will not know that you are using IML or know thepreviously defined variables.


PROC IML basics


PROC IML basics

You can perform operations on matrices such as concatenation eitherhorizontally or vertically....


PROC IML basics

The transpose of a vector or matrix is handled by the left single-quote,which looks a little weird. Some references say you can use the transposefunction t(), which is how R does it, but it has generated errors for me.


PROC IML basics

You can create arrays of numbers similarly to R, and even arrays of stringsthat have consecutive suffixes. Try the following in the Code window andthen look at the Results viewer.

proc iml;

reset log print;

index=1:10;

rows = 1:3‘;

rindex=10:1;

series(0,100,20);

strings = "x1":"x8"; /* turns strings into a vector of ,

strings "x1", "x2", ..., "x8" */


PROC IML basics

Try these also to set up special matrices used often in statistics.

proc iml;

reset log print;

b=j(6,1); *6x1 matrix of ones;

c=j(2,3); *2x3 matrix of ones;

a = I(6); *6x6 identity matrix;

d = diag( {1 2 4} ); * diagonal matrix;

b = 1:3;

d2 = diag(b);

d3 = diag(1:5);


PROC IML basics


PROC IML basics

Some other functions and operators that can be useful:

B = A##2; /* squares A elementwise */

C = A B; /* element-wise maximum */

C = A < B; /* element-wise 0-1 test */

B = sqrt(A); /* element-wise square root */

D = block(A,B,C); /* creates a block diagonal matrix

using A, B, and C as matrices */

c = NCOL(A); /* number of columns of A */

d = NROW(A); /* number of rows of A */

E = exp(A); /* element-wise exponentiation */

F = log(A); /* element-wise log */

d = det(A); /* determinant */

B = inv(A); /* inverse of A */

v = vecdiag(A) /* v is a vector with the diagonal entries of A */


PROC IML basics

You can extract individual elements or submatrices from a matrix:

d1_12 = d1[1,2]; /* extact first row, second column */

d1_ = d1[1,]; /* extract first row */

d_2 = d1[,2]; /* extract second column */

d = d1[1:2,{1 3}]; /* extract a 2x2 submatrix */

mycols = {1 3};

d = d1[1:2,mycols]; /* alternative gives the same results */

Remember that to run this code you also need to run proc iml; resetlog print; each time.


PROC IML basics

To get column sums and column sums of squares, there is special notation

csum = d1[+,] #column sum is a row vector

rsum = d1[,+] #row sum is a column vector

rsum2 = d1‘[,+] #rsum2 is a row vector

a = d1[+,+]; /* sum of all elements in the matrix */

b = sum(d1); /* a and b are the same but b is

more efficient */


PROC IML: searching matrices

Often we want an index of where certain values occur. For example, wemight want to know which values in a vector or a matrix are less than 0,or where the maximum value occurs. This can be done using the LOCfunction. The LOC function is easier to understand with vectors. Here itreturns a vector of indices of the vector satisfying the condition.

For example, with d2 = {−4 0 − 4}, trying

m = max(d_2);

negvalue = loc(d_2

PROC IML searching matrices

If you apply the LOC function to an entire r × c , it counts the cells from 1to r × c and gives a single number indexing the cells rather than the rowand column number, and counts cell row-wise. Thus, for a 2× 3 matrixlike d1, d1[5] extracts element (2,2).


PROC IML: missing values

Matrices can be defined with missing values, which will sometimesappropriately create missing values when operated on, and other timesmissing values are ignored.


PROC IML: formatting and labels

You can format values to have dollar signs, commas or any other formatyou like and can also label rows and columns.


PROC IML: Reading data to and from SAS data sets

Typically, you’ll want to read in data from a SAS data set instead ofstarting from scratch and entering data into your matrix. You also mightwant to manipulate data using PROC IML then output it again to a SASdata set to use usual SAS PROCs.


PROC IML: Creating a SAS data set



PROC IML: Creating a SAS data set

Note that I’ve changed the variable name from temperature to temp andalso changed the 5th observation from age 73 to age 67. I needed thethree statements, CREATE, APPEND, and CLOSE to create the data set.If you don’t CLOSE, then the data set is created but is still empty.

Instead of creating a new data set, you can instead edit an existing dataset using the EDIT statement. However, you can only input one data setat a time into PROC IML.


PROC IML: more on reading in data

You can choose to read in less than all of the data in a data set. Thesyntax for the READ statement is

READ

;

where items within angled brackets are options. Typically for the rangeyou put all if you want to read in all of the data. You can also read inthe next n observations using NEXT n, or a list of specific observationsusing point {3,10,11} (for example to read in the 3rd, 10th, and 11thobservations only.


PROC IML: more on reading in data

VAR allows you to specify a list of which variables to read in if you don’twant to read them all in. The default is to only read in numeric variables,however you can use CHARACTER to specify that you only want to read incharacter variables.

The WHERE option works similarly to WHERE statements in data stepsin PROCs. A common example might be to only read in observationswithout missing data, e.g. where var1 ne ..

An alternative to using the READ statement to control what is read in isto create a new data set using data steps to first create the subset of thedata you want to manipulate in PROC IML, although this could be lessefficient than going directly through PROC IML for large data sets


PROC IML: logical expressions, loops

Logical expressions using IF THEN, ELSE, and loops using DO follow thesame syntax as in data steps. They allow you to loop through a matrix ina way that can be a bit easier than looping through a data set in a datastep. If your loop is indexing observation (row) i , then you can examinerow i − 1 without having to use lag functions and retain statements. Youalso have access to observation i + 1.


PROC IML: simple linear regression

As an exercise, let’s use PROC IML to find the regression coefficientswhen we do a simple linear regression of temperature on age. For now,we’ll ignore the sex of the individuals. We want to use the model

Y = Xβ + e, or yi = β0 + β1xi + ei , i = 1, . . . , 130

First, we’ll read in the data, separate the data into observations Y , andthe predictor, and create the design matrix. The design matrix will have acolumn of 1s and a column for the predictors. Note that X is 130× 2 andthe vector of coefficients in the regression model is β = (β0 β1)

′, which isa 2× 1 matrix (or column vector). If we set Y = Xβ + e, where we usethese matrix expressions, then the ith row of Y is yi = xi1β0 + xi2β1 + ei .



The idea is to solve for β̂ using the equation Y = X β̂. To do this we firstmultiply both sides by the transpose of X , so

X ′Y = X ′X β̂

⇒ (X ′X )−1X ′Y = (X ′X )−1X ′X β̂

⇒ β̂ = (X ′X )−1X ′Y

Now we can use PROC IML to do these matrix calculations and see if theyresult in β̂ that matches the output of SAS procedures.



output from PROC IML and PROC REG



How do you interpret the matrix X ′X? Think about how to multiply this. Thematrix is p × p where p is the number of parameters (β terms), so it is 2× 2 inthis case. The (1,1) entry is

1 · 1 + 1 · 1 + · · ·+ 1 · 1130 times, so it is 130. You can also think of this as the sample size. The (1,2)entry is

1 · x1 + 1 · x2 + · · ·+ 1 · · · x130 =130∑i=1

xi

The (2,1) entry is

x1 · 1 + x2 · 1 + · · ·+ x130 · 1 =130∑i=1

xi

The (2,2) entry is

x1 · x1 + x2 · x2 + · · · x130 · x130 =130∑i=1

x2i


PROC REG

You can also output the X ′X matrix from PROC REG using

model temperature = age / xpx;

which is an abbreviation for “x prime x”. Of course, PROC REG outputsmuch more than this, and you can use PROC IML to see if you canreproduce residuals, fitted values, F statistics, and so on.


Getting other quantities for regression

Other quantities of interest for a linear model include the hat matrix,X (X ′X )−1X ′, the predicted values, X ′X etc., all of which can easily beobtained from PROC IML. Some of these can also be obtained fromPROC REG. If a new diagnostic test is developed not implemented inPROC REG or PROG GLM, then it would be beneficial to have access tothese matrices directly from PROC IML.


Comparing shapes

An example where you might need something like PROC IML is if youhave to rotate your data, which can be accomplished through matrixmultiplication of your original data. This comes up in statistics when youwant to compare to sets of points (usually either in 2D or 3D, but couldbe higher-dimensional), and the points are not oriented the same way orscaled the same way.

This can come up particularly in shape analysis, where you want todetermine whether two shapes are roughly equivalent, or you want tocompare two photographs taken from slightly different positions. Inaddition to statistical testing, sometimes you just want to best visualizethe difference between two sets of points, and this is best accomplished bylining up the points as nearly as possible.


Comparing shapes

As an example, consider two photographs of hands.


Comparing shapes

We have reference points on the hands, and we want to line up thereference points as closely as possible by rotating the images, rescaling ifnecessary (suppose you have photos that are cropped or zoomed in andstill want to compare the shapes). In general we might also allow mirrorimages. In this case, we assume the points are two-dimensional so thatthey each have just an x and y coordinate. This would typically be thecase for analyzing photographic images, although in general you canimagine have three dimensional data as well.


Comparing shapes

Applications for this are widespread. In medical imaging, you might takean x-ray of a patient over time to compare how their spine is changingwith osteoporosis. The x-rays won’t be taken at identical distances, anglesand so forth, so you need to align the images by stretching and rotating.

Other examples will include MRI scans of the brains, where you mightwant to either compare the same individual at different time points, theleft versus the right hemisphere to look for asymmetries, or two separateindividuals to see how closely aligned two brains are. Here we want toignore the fact that one brain might be slightly larger than the other.If eyes or fingerprints are used for ID, again it will be easiest to comparetwo images by rotation and rescaling.


Comparing shapes

If you have satellite images of regions on earth, you might want tomeasure things like habitat loss. Successive photos of the same regionwon’t be exactly the same, so you might try to align two photos usingcertain geographical reference points. Once the photos are aligned, youcan use differences in the area that is green as a measure of vegetationloss, for example.

“The name Procrustes refers to a bandit from Greek mythology who madehis victims fit his bed either by stretching their limbs or cutting them off.”(Wikipedia)


Procrustes illustrations

You can find some interesting illustrations online....


Comparing shapes

Back to the hand example. Here we have two sets of coordinates. Wemight call them

X = {(x11, x12), (x21, x22), . . . , (xn1, xn2)}

andY = {(y11, y12), . . . , (yn1, yn2)}

How should we align the points? If we use the distances betweencorresponding points, we can minimize the distance between points overall possible angles of rotation, rescalings, and reflections. To deal withreflections, it might help to center the points so that (x1, x2) = 0 and(y1, y2) = 0. For many problems (like with satellite photographs or thesame patient over time), reflections won’t matter.


Comparing shapes

The distance between two individual points xi = (xi1, xi2) andyi = (yi1, yi2) is naturally defined as

d(xi , yi ) =√

(xi1 − yi1)2 + (xi2 − yi2)2

This is the Euclidean distance between two points in the plane. We mightdefine the overall squared distance from the set of points X to the set ofpoints Y as

d2(X ,Y ) =n∑

i=1

d2(xi , yi )


Comparing shapes

We can then minimize the sum of squared distances between points (thisis equivalent to minimizing the distances–why?). This is fairly similar as acriterion to what we do in regression, so hopefully doesn’t seem too weird.In other words, we want to minimize

d2(xi , yi ) = (xi1 − y ′i1)2 + (xi2 − y ′i2)2

over all choices of θ and c .To do this, we need to write y ′i as a function of yi , θ, and a scaling factorc .


Example

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23


Example

Lines connecting corresponding points.

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23


Example

Rotating π/8 radians = 22.5 degrees clockwise, we get...

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23


Example

How the squares were shifted by π/8 radians = 22.5 degrees

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23


Example

Shifting by another π/8 radians (45 degrees total), we get...

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23


Rotating your data

To rotate 2-dimensional data, you can use a rotation matrix. If a set ofpoints is in an n × 2 matrix, then we need a 2× 2 matrix to multiply thismatrix. The rotation matrix (from linear algebra) is

R =

[cos θ − sin θsin θ cos θ

]


Minimizing squared distances

Let X be the matrix for the first data set and Y the matrix for the seconddata set. To simplify the problem, we’ll consider only doing optimalrotations without worrying about scaling. We can think of rotating theobservations Y to match X using a rotation matrix R. Thus

X = RY ′

This usually can’t be solved exactly, so we want to find R for which

X − RY ′ ≈ 0

To minimize the sum of squared distances, we minimize

tr[(X − RY ′)′(X − RY ′)]

where tr is the trace, or sum of the diagonals. After some matrix algebra,this is equivalent to minimizing

tr(RY ′X )


Minimizing squared distances

A technique for solving this problem involves the singular valuedecomposition, which again comes from linear algebra and which we won’treview, but can be used in SAS, Matlab, and other matrix-orientedlanguages. However, the matrix Y ′X can be decomposed into UDV ′,where D is diagonal and U and V are orthogonal. The solution is

R = VU ′

which minimizes the sum of the squared distances. If you have softwarewhich can do the singular value decomposition, then you can use it get theoptimal rotation.


Using PROC IML for Procrustes Rotation



In this case, the rotation matrix has values near 1 and -1 for sin θ and− sin θ for the (1,2) and (2,1) entries, respectively, suggesting that theoptimal rotation is near 90 degrees or π/2 radians. This means 90 degreescounterclockwise, and if you look at the photos, moving the right-handphoto 90 degrees counterclockwise will indeed line up the wrists (from topleft to bottom left) and the rest of the hand. Since these photos lookidentical, the slight discrepancy might be due to truncating measurementsin the positions of the points.



Most of the work of the Procrustes rotation in terms of code was forcentering the data. Once the data was centered, there were only threelines of code needed. It is possible to do the optimal rotation withoutusing matrices, but this would involve more work in terms of coding.

For the centering itself, this could have been done outside of PROC IML,and PROC IML could have read in a SAS dataset with X and Y alreadycentered. If you had the original data in a SAS data set, how would youcenter the data using data step programming?


Centering the data

Often if something is very tedious, there might be a procedure to help youout. Googling “z-score SAS” quickly reveals PROC STANDARD, whichcan center your data. If you have a data set called temperature, you canuse something like

proc standard data=temperature mean=0 std=1 out=ztemp;

var degrees;

run;


Minimizing squared distances using calculus

To minimize this distance, we can use calculus, but in this case, we needto minimize over rotations (angles) and rescalings. It helps to think of onedata set as fixed, say X , and we rotate and rescale Y to match X asclosely as possible. If we rotate the Y data by an angle θ and stretch theirvalues by c , then[

cos θ − sin θsin θ cos θ

] [cyi1cyi2

]= c

[yi1 cos θ − yi2 sin θyi1 sin θ + yi2 cos θ

]So we use y ′i1 = c(yi1 cos θ − yi2 sin θ) and y ′i2 = cyi1 sin θ + cyi2 cos θ andand plug these values into

d2(X ,Y ) =n∑

i=1

(xi1 − y ′i1)2 + (xi2 − y ′i2)2

Then the idea is to minimize with respect to θ and c. This can be donetaking partial derivates with respect to θ and c and using the usualcalculus techniques of optimization.


The calculus approach

I’ll give the formulas in terms of summations for the optimal values for θand c. This will be equivalent whether you use the calculus approach orthe matrix approach. The optimal rotation angle is

θ = arctanD

B+ kπ

for integer k, and

c =B

Acos θ +

D

Asin θ

where k ∈ {0, 1} should be chosen to let c > 0. Here

A =n∑

i=1

y2i1 + y2i2 B =

n∑i=1

xi1yi1 + xi2yi2

C =n∑

i=1

xi1yi1 − xi2yi2



Part of the advantage of the matrix approach to Procrustes rotations isthat it generalizes more easily than the calculus approach. In particular,you can compare three-dimensional (or higher dimensional) shapes orpatterns in the data as well as two-dimensional, which makes the rotationsmore complicated.

We just illustrated rotations rather than scaling. Optimizing the scalingfactor as well as rotations is sometimes called “extended ProcrustesAnalysis”. Generalized Procrustes also more than two data sets at a timeto be used. For Generalized Procrustes, and “mean shape” is the set ofpoints that minimizes the sum of the Procrustes distances to each of theinput data sets.


Comparing data sets and Outlier detection

In addition to rotating your data, Procrustes rotations give you a way toquantify how different two shapes are. You might or might not want torescale (stretch) the data depending on the application, for example usingthe minimum sum of squared distances. Thus, given three data sets, youcan look at all pairwise distances to determine which two datasets aremost similar.

The Procrustes rotation can also be used to look for outliers. Tryremoving one observation at a time and recomputing the squared distanceseach time. (This means you will do Procrustes rotations with n − 1instead of n data points each time.) This gives you a measure of whichobservations have the biggest effect for one dataset not being able to berotated to match the other data set.


Writing functions in PROC IML


Are the Folded F-test and Bartlett’s equivalent?james/STAT579-F18/SAS14.pdf · Megadeth...

Documents

Transcript of Are the Folded F-test and Bartlett’s equivalent?james/STAT579-F18/SAS14.pdf · Megadeth...