AMcur(Curs de la tardor 2006): The practic with R...

48
AMcur(Curs de la tardor 2006): The practic with R fast Albert Satorra November 27, 2006 Contents 1 Introduction 1 2 Principal Component Analysis 1 2.0.1 Reading the function ’leverage’ (file: leverage.txt) .............. 3 3 Principal Coordinate Analysis 5 3.1 Reading the function ’mds’ (file:mds.txt ........................ 5 3.2 We do MDS using the isoMDS function ........................ 5 3.3 Inserting directions on the map ............................. 10 3.4 The case of a similarity matrix ............................. 11 3.5 with our mds function .................................. 17 4 Cluster Analysis 20 4.1 segmentation of cars ................................... 22 5 Segmentation of a sample: ratings of cars 22 5.1 hierarchical cluster analysis ............................... 22 5.2 K-means cluster analysis ................................. 24 6 Correspondence Analysis 25 6.1 A particular example: vot i barris a la ciutat de Barccelona ............. 25 6.2 Emprem el programa propi ............................... 27 7 Factor Analysis 33 8 Discriminant Analysis 43 8.1 Quadratic discriminant analysis ............................. 46 1 Introduction This is in the new address of folder R. In this document (AAAA) we consider the problem of doing multivariate analysis using the functions of R already ready to use. This is the fast version of the computations. See the documents AM1.Rnw that gives more detail. Everything is in ...Albert/A_A_A_Web/AnalisiMultivariant /SeaveAM/ 2 Principal Component Analysis 1

Transcript of AMcur(Curs de la tardor 2006): The practic with R...

AMcur(Curs de la tardor 2006): The practic with R fast

Albert Satorra

November 27, 2006

Contents

1 Introduction 1

2 Principal Component Analysis 12.0.1 Reading the function ’leverage’ (file: leverage.txt) . . . . . . . . . . . . . . 3

3 Principal Coordinate Analysis 53.1 Reading the function ’mds’ (file:mds.txt . . . . . . . . . . . . . . . . . . . . . . . . 53.2 We do MDS using the isoMDS function . . . . . . . . . . . . . . . . . . . . . . . . 53.3 Inserting directions on the map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4 The case of a similarity matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.5 with our mds function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Cluster Analysis 204.1 segmentation of cars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Segmentation of a sample: ratings of cars 225.1 hierarchical cluster analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.2 K-means cluster analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6 Correspondence Analysis 256.1 A particular example: vot i barris a la ciutat de Barccelona . . . . . . . . . . . . . 256.2 Emprem el programa propi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7 Factor Analysis 33

8 Discriminant Analysis 438.1 Quadratic discriminant analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

1 Introduction

This is in the new address of folder R.In this document (AAAA) we consider the problem of doing multivariate analysis using the

functions of R already ready to use. This is the fast version of the computations. See the documentsAM1.Rnw that gives more detail. Everything is in

...Albert/A_A_A_Web/AnalisiMultivariant /SeaveAM/

2 Principal Component Analysis

1

Albert Satorra, Analisi Multivariant, tardor 2006 2

> library(MASS)

> root = "http://www.econ.upf.es/~satorra/dades/"

> data = read.table(paste(root, "DataDeure1.txt", sep = ""), header = T)

> ana = princomp(data[, -1], cor = T)

> biplot(ana, xlabs = data[, 1], ylabs = names(data[-1]), cex = c(0.6,

+ 0.4), arrow.len = 0)

−0.8 −0.6 −0.4 −0.2 0.0 0.2

−0.

8−

0.6

−0.

4−

0.2

0.0

0.2

Comp.1

Com

p.2

Austria

BelgicaDinamar

Finland

France

Aleman

Greece

Irlanda

Italia

Luxemburg

Holanda

Noruega

Portugal

Espanya

Suecia

SuizaUK

USA

Japo

−4 −3 −2 −1 0 1 2

−4

−3

−2

−1

01

2

CREIX

INF

ATUR

INT1

INT2

Figure 1: PC biplot

Albert Satorra, Analisi Multivariant, tardor 2006 3

2.0.1 Reading the function ’leverage’ (file: leverage.txt)

The function leverage applied to the data matrix X

Albert Satorra, Analisi Multivariant, tardor 2006 4

1:n

lev

0.2

0.4

0.6

0.8

●●

●●

1 2 3 4 5 6 7 8 9 11 13 15 17 19

Figure 2: leverage of the various observation

Albert Satorra, Analisi Multivariant, tardor 2006 5

3 Principal Coordinate Analysis

3.1 Reading the function ’mds’ (file:mds.txt

3.2 We do MDS using the isoMDS function

> root = "http://www.econ.upf.es/~satorra/dades/"

[1] "http://www.econ.upf.es/~satorra/dades/"

> D = read.table("http://www.econ.upf.edu/~satorra/dades/Distciutats.txt",

+ header = T)

Albac Alican Alme Avil Badaj Barcel Bilbao Burgos Cacer Cadiz Madrid1 0 171 369 366 525 540 646 488 504 617 2512 171 0 294 537 696 515 817 659 675 688 4223 369 294 0 663 604 809 958 800 651 484 5634 366 537 663 0 318 717 401 243 229 618 1155 525 696 604 318 0 1022 694 538 89 342 4016 540 515 809 717 1022 0 620 583 918 1284 6217 646 817 958 401 694 620 0 158 605 1058 3958 488 659 800 243 538 583 158 0 447 900 2379 504 675 651 229 89 918 605 447 0 389 29710 617 688 484 618 342 1284 1058 900 389 0 66311 251 422 563 115 401 621 395 237 297 663 0

> library(MASS)

[1] "MASS" "methods" "stats" "graphics" "grDevices" "utils"[7] "datasets" "base"

> sol = isoMDS(as.matrix(D))

initial value 3.623227final value 3.621119converged$points

[,1] [,2]1 22.33494 -195.712612 27.88977 -392.839243 -231.49562 -401.259544 13.62934 178.273955 -302.16148 217.553846 614.37757 -274.125587 384.30438 338.526108 253.38963 236.665159 -204.09263 236.1071010 -642.38814 -12.2256611 64.21225 69.03649

$stress[1] 3.621119

> she = Shepard(as.dist(D), sol$points)

Albert Satorra, Analisi Multivariant, tardor 2006 6

$x[1] 89 115 158 171 229 237 243 251 294 297 318 342 366 369 389[16] 395 401 401 422 447 484 488 504 515 525 537 538 540 563 583[31] 604 605 617 618 620 621 646 651 659 663 663 675 688 694 696[46] 717 800 809 817 900 918 958 1022 1058 1284

$y[1] 99.80843 120.38045 165.87443 197.20488 225.27212 252.75966[7] 246.76817 268.04066 259.52203 316.06976 318.22437 410.55182[13] 374.08787 326.61828 503.75799 418.42995 403.83256 395.33164[19] 463.30175 457.48260 565.84452 490.24156 487.58354 598.38187[25] 525.43989 571.29120 555.87973 597.21274 555.53713 625.47539[31] 622.83518 597.24425 689.58265 683.11714 654.42780 648.41506[37] 645.31610 637.95545 668.67478 629.24186 711.25779 670.36507[43] 770.80424 697.04352 693.91176 752.03974 801.28747 855.37389[49] 813.58874 929.71202 964.48476 962.54477 1040.09253 1084.95360[55] 1283.76462

$yf[1] 99.80843 120.38045 165.87443 197.20488 225.27212 249.76391[7] 249.76391 263.78134 263.78134 316.06976 318.22437 370.41932[13] 370.41932 370.41932 430.33804 430.33804 430.33804 430.33804[19] 460.39218 460.39218 514.55654 514.55654 514.55654 561.91088[25] 561.91088 563.58547 563.58547 576.37493 576.37493 615.18494[31] 615.18494 615.18494 657.09135 657.09135 657.09135 657.09135[37] 657.09135 657.09135 657.09135 657.09135 690.81143 690.81143[43] 720.58651 720.58651 720.58651 752.03974 801.28747 834.48131[49] 834.48131 929.71202 963.51476 963.51476 1040.09253 1084.95360[55] 1283.76462

We represent the two-dimensional configuration of objects. This is shown in Figure 14

Albert Satorra, Analisi Multivariant, tardor 2006 7

−600 −400 −200 0 200 400 600

−40

0−

200

020

0

pcoord. 1

pcoo

rd. 2

Albac

AlicanAlme

Avil

Badaj

Barcel

Bilbao

BurgosCacer

Cadiz

Madrid

Figure 3: Two dimensional configuration

Albert Satorra, Analisi Multivariant, tardor 2006 8

Now we assess the fit (the stress) of this two-dimensional configuration.

Albert Satorra, Analisi Multivariant, tardor 2006 9

●●

●●

●●●● ●

●●

●●

●●●

●●

●●

●●

●●●

●●●●●●

●●

●● ●

200 400 600 800 1000 1200

200

400

600

800

1000

1200

pcoordinate 1

pcoo

rdin

ate

1

Figure 4: Stress of the configuration

Albert Satorra, Analisi Multivariant, tardor 2006 10

3.3 Inserting directions on the map

Now we interpret directions in the space of the representation. We consider the variables Norht-South and East-West that we plot onto the graph using the abline options.

Albert Satorra, Analisi Multivariant, tardor 2006 11

−600 −400 −200 0 200 400 600

−60

0−

400

−20

00

200

400

600

PrinCoord1

Prin

Coo

rd2

Albac

AlicanAlme

AvilBadaj

Barcel

Bilbao

BurgosCacer

Cadiz

Madrid

Figure 5: The Map with directions

3.4 The case of a similarity matrix

The MDS analysis is as follows. We read the distance matrix among some cities of Spain.

> root = "http://www.econ.upf.es/~satorra/dades/"

[1] "http://www.econ.upf.es/~satorra/dades/"

> D = read.table("http://www.econ.upf.edu/~satorra/dades/SemblancesPartits.txt",

+ header = T)

PSOE PP IU ERC CU PNV VERDS1 30 14 17 2 4 5 22 14 30 4 2 15 10 13 17 4 30 4 6 6 94 2 2 4 30 10 15 125 4 15 6 10 30 21 46 5 10 6 15 21 30 67 2 1 9 12 4 6 30

> library(MASS)

Albert Satorra, Analisi Multivariant, tardor 2006 12

[1] "MASS" "methods" "stats" "graphics" "grDevices" "utils"[7] "datasets" "base"

> D = as.matrix(D)

PSOE PP IU ERC CU PNV VERDS1 30 14 17 2 4 5 22 14 30 4 2 15 10 13 17 4 30 4 6 6 94 2 2 4 30 10 15 125 4 15 6 10 30 21 46 5 10 6 15 21 30 67 2 1 9 12 4 6 30

> n = dim(D)[1]

[1] 7

> cii = matrix(diag(D), n, 1) %*% matrix(1, 1, n)

[,1] [,2] [,3] [,4] [,5] [,6] [,7][1,] 30 30 30 30 30 30 30[2,] 30 30 30 30 30 30 30[3,] 30 30 30 30 30 30 30[4,] 30 30 30 30 30 30 30[5,] 30 30 30 30 30 30 30[6,] 30 30 30 30 30 30 30[7,] 30 30 30 30 30 30 30

> cjj = matrix(1, n, 1) %*% matrix(diag(D), 1, n)

[,1] [,2] [,3] [,4] [,5] [,6] [,7][1,] 30 30 30 30 30 30 30[2,] 30 30 30 30 30 30 30[3,] 30 30 30 30 30 30 30[4,] 30 30 30 30 30 30 30[5,] 30 30 30 30 30 30 30[6,] 30 30 30 30 30 30 30[7,] 30 30 30 30 30 30 30

> D = sqrt(cii + cjj - 2 * D)

PSOE PP IU ERC CU PNV VERDS1 0.000000 5.656854 5.099020 7.483315 7.211103 7.071068 7.4833152 5.656854 0.000000 7.211103 7.483315 5.477226 6.324555 7.6157733 5.099020 7.211103 0.000000 7.211103 6.928203 6.928203 6.4807414 7.483315 7.483315 7.211103 0.000000 6.324555 5.477226 6.0000005 7.211103 5.477226 6.928203 6.324555 0.000000 4.242641 7.2111036 7.071068 6.324555 6.928203 5.477226 4.242641 0.000000 6.9282037 7.483315 7.615773 6.480741 6.000000 7.211103 6.928203 0.000000

> sol = isoMDS(as.matrix(D))

initial value 6.365064iter 5 value 2.306125iter 10 value 1.616697

Albert Satorra, Analisi Multivariant, tardor 2006 13

iter 15 value 1.491503iter 20 value 1.422813final value 1.398928converged$points

[,1] [,2]1 -3.788557 -0.75535912 -1.873949 -3.64340793 -2.818111 1.65213214 3.682817 1.57921895 1.262559 -2.09390006 1.680810 -1.12210107 1.854431 4.3834170

$stress[1] 1.398928

> she = Shepard(as.dist(D), sol$points)

$x[1] 4.242641 5.099020 5.477226 5.477226 5.656854 6.000000 6.324555 6.324555[9] 6.480741 6.928203 6.928203 6.928203 7.071068 7.211103 7.211103 7.211103[17] 7.211103 7.483315 7.483315 7.483315 7.615773

$y[1] 1.057982 2.595723 3.498379 3.362315 3.465047 3.347614 4.358130 4.398801[9] 5.412261 5.539371 5.285514 5.508255 5.481648 5.225463 5.379051 6.501337[17] 6.504302 7.827623 7.632190 7.625843 8.850465

$yf[1] 1.057982 2.595723 3.418339 3.418339 3.418339 3.418339 4.358130 4.398801[9] 5.404509 5.404509 5.404509 5.404509 5.404509 5.404509 5.404509 6.501337[17] 6.504302 7.695219 7.695219 7.695219 8.850465

We represent the two-dimensional configuration of objects. This is shown in Figure 14

Albert Satorra, Analisi Multivariant, tardor 2006 14

−4 −2 0 2

−2

02

4

pcoord. 1

pcoo

rd. 2

PSOE

PP

IU ERC

CU

PNV

VERDS

Figure 6: Two dimensional configuration

Albert Satorra, Analisi Multivariant, tardor 2006 15

Now we assess the fit (the stress) of this two-dimensional configuration.

Albert Satorra, Analisi Multivariant, tardor 2006 16

●●

●●

●●

●●

●● ●

●●

●●

●●●

4.5 5.0 5.5 6.0 6.5 7.0 7.5

24

68

pcoordinate 1

pcoo

rdin

ate

1

Figure 7: Stress of the configuration

Albert Satorra, Analisi Multivariant, tardor 2006 17

3.5 with our mds function

We now do MDS with our mds function.

Albert Satorra, Analisi Multivariant, tardor 2006 18

−4 −2 0 2 4

−4

−2

02

4

PrinCoord1

Prin

Coo

rd2

PSOE

PP

IU

ERC

CU

PNV

VERDS

Figure 8: MDS with the mds function.

Albert Satorra, Analisi Multivariant, tardor 2006 19

> she = Shepard(as.dist(D), sol$PrincipCoordin)

> plot(she, xlab = "pcoordinate 1", ylab = "pcoordinate 1")

> lines(she$x, she$yf, type = "S")

●●

●●

●●

4.5 5.0 5.5 6.0 6.5 7.0 7.5

12

34

56

7

pcoordinate 1

pcoo

rdin

ate

1

Figure 9: Stress of the configuration (metric MDS)

Albert Satorra, Analisi Multivariant, tardor 2006 20

4 Cluster Analysis

Albert Satorra, Analisi Multivariant, tardor 2006 21

> plot(hclust(dist(data[, -1])), labels = data[, 1])

Esp

anya

Gre

ece

Nor

uega

US

A

Aus

tria

Ale

man

Luxe

mbu

rg

Sui

za

Japo

Italia

Por

tuga

l

Sue

cia

Fin

land

Irla

nda

Din

amar

Fra

nce UK

Bel

gica

Hol

anda

05

1015

20Cluster Dendrogram

hclust (*, "complete")dist(data[, −1])

Hei

ght

Figure 10: Hirarchical clustering

Albert Satorra, Analisi Multivariant, tardor 2006 22

4.1 segmentation of cars

5 Segmentation of a sample: ratings of cars

We have a sample of ratings of cars by a students population. We first consider hierarchical clusteranalysis of the data set.

5.1 hierarchical cluster analysis

We first read the data.

The following object(s) are masked from data ( position 4 ) :

X1 X1.1 X4 X4.1 X4.2 X5 X6 X7 X7.1 X8 X8.1

[1] "BMW328i" "FORDExplore" "Infiniti" "JeepCheroke" "Lexus"[6] "ChrislerTown" "Mercedes" "Saab" "Porsche" "Volvo"

[1] "BMW328i" "FORDExplore" "Infiniti" "JeepCheroke" "Lexus"[6] "ChrislerTown" "Mercedes" "Saab" "Porsche" "Volvo"

Albert Satorra, Analisi Multivariant, tardor 2006 23

20 5414

353 13

148 11

412

074 14

125

30 9751

9884

135

150

7556 14

660 5 69

88 145

142

6731 13

363 14

9 37

124

140

57 8589

82 90 36 8612

611

221 29

4128 73

106

130

129

116

70 113

4699 10

077

83 96 24 104 49 68 87

1 811

912

240 52

121

3 1942

111

117

3935

78 102

15 136

103

147

128

34 6155

58 139

92 125

148

12 105 11

107

108

144

81 9423 44

210

1 43 6512

313

2 13 80

18 127

27 4772

115

134

45 6664 11

89 17

626

32 5976 13

811

010 91

797

14 2671 93

3833 13

795

109

4 1622 50

05

1015

Cluster Dendrogram

hclust (*, "complete")D

Hei

ght

Figure 11: Dendogram

[1] 5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201 1 1 2 3 2 2 1 2 2 1 1 1 2 1 2 2 1 1 421 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 403 2 1 1 3 2 5 1 3 3 3 2 2 1 1 3 3 2 1 141 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 601 1 1 1 5 1 5 4 1 2 3 1 4 4 1 3 3 1 2 361 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 801 2 3 2 1 5 3 1 3 1 2 5 1 4 3 2 1 1 2 181 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 1001 3 1 3 3 3 1 3 3 3 2 1 2 1 2 1 3 3 1 1

101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 1201 1 1 1 1 1 1 1 2 2 1 3 1 4 5 1 1 2 1 4

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 1401 1 1 3 1 3 1 1 1 1 4 1 3 5 3 1 2 2 1 3

141 142 143 144 145 146 147 148 149 1504 3 4 1 3 3 1 1 3 3

Group.1 BMW328i FORDExplore Infiniti JeepCheroke Lexus ChrislerTown1 1 7.144928 4.362319 4.202899 5.043478 5.492754 1.492754

Albert Satorra, Analisi Multivariant, tardor 2006 24

2 2 3.964286 4.892857 3.964286 5.678571 4.857143 1.5357143 3 6.777778 4.888889 3.194444 5.888889 3.722222 1.4166674 4 7.400000 7.200000 5.600000 7.200000 7.100000 4.2000005 5 4.428571 3.857143 2.285714 6.142857 5.000000 2.428571Mercedes Saab Porsche Volvo

1 6.202899 5.884058 5.231884 3.7246382 4.357143 3.392857 4.714286 3.6071433 5.972222 3.861111 7.694444 2.2500004 7.000000 5.100000 7.200000 5.3000005 6.285714 5.428571 2.428571 7.285714

group1 2 3 4 569 28 36 10 7

5.2 K-means cluster analysis

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 205 1 3 4 4 2 3 5 3 5 5 1 1 3 2 4 3 1 3 321 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 404 4 1 4 4 3 3 5 4 4 4 2 4 2 3 4 4 4 3 541 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 602 3 1 1 2 2 5 5 1 4 4 5 5 5 2 3 4 2 2 461 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 802 2 4 3 1 3 5 5 4 1 4 2 2 5 4 5 2 3 3 181 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 1001 4 1 4 2 1 5 4 4 4 5 5 4 1 4 1 4 4 2 2

101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 1201 3 2 5 5 1 1 1 4 3 3 4 2 5 2 2 3 3 5 5

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 1403 5 2 4 5 4 1 2 2 1 5 1 4 2 4 4 4 3 2 4

141 142 143 144 145 146 147 148 149 1505 4 5 1 4 4 2 5 4 4

Group.1 BMW328i FORDExplore Infiniti JeepCheroke Lexus ChrislerTown1 1 7.791667 2.666667 5.125000 3.041667 5.666667 1.4166672 2 5.750000 4.142857 3.178571 5.928571 4.714286 1.3571433 3 5.833333 5.666667 3.583333 6.708333 5.125000 1.7500004 4 6.088889 4.755556 3.066667 5.533333 3.977778 1.4666675 5 6.586207 6.310345 5.241379 6.379310 6.379310 2.620690Mercedes Saab Porsche Volvo

1 6.833333 5.125000 4.916667 3.0416672 5.428571 7.500000 4.571429 5.3214293 5.125000 2.666667 2.416667 4.0833334 5.311111 3.911111 7.444444 2.0444445 6.931034 5.379310 7.586207 4.517241

Group.1 BMW328i FORDExplore Infiniti JeepCheroke Lexus ChrislerTown1 1 7.791667 2.666667 5.125000 3.041667 5.666667 1.4166672 2 5.750000 4.142857 3.178571 5.928571 4.714286 1.3571433 3 5.833333 5.666667 3.583333 6.708333 5.125000 1.7500004 4 6.088889 4.755556 3.066667 5.533333 3.977778 1.4666675 5 6.586207 6.310345 5.241379 6.379310 6.379310 2.620690Mercedes Saab Porsche Volvo

1 6.833333 5.125000 4.916667 3.041667

Albert Satorra, Analisi Multivariant, tardor 2006 25

2 5.428571 7.500000 4.571429 5.3214293 5.125000 2.666667 2.416667 4.0833334 5.311111 3.911111 7.444444 2.0444445 6.931034 5.379310 7.586207 4.517241

Albert Satorra, Analisi Multivariant, tardor 2006 26

0 2 4 6 8 10

02

46

810

perfils dels grups

x

y

12345

BMW328i FORDExplore Infiniti JeepCheroke Lexus ChrislerTown Mercedes Saab Porsche Volvo

Figure 12: Proflies of the segments

Here is a function of cutting a variable into several intervals.

(0.993,2.75] (2.75,4.5] (4.5,6.25] (6.25,8.01]32 63 40 15

[1] 2 2 2 2 1 3 3 2 2 3 3 4 4 3 2 2 2 4 2 3 1 1 3 3 2 3 1 3 2 2 2 2 1 2 3 3 1[38] 3 1 3 1 2 3 2 2 2 1 2 4 3 2 3 3 4 1 1 2 3 2 1 3 1 3 2 2 1 2 3 2 1 2 1 1 4[75] 1 3 3 2 4 4 2 2 3 2 2 4 2 1 1 1 4 4 1 3 1 3 1 2 2 2 1 2 2 3 3 2 3 4 2 4 1[112] 2 2 3 2 1 1 1 2 3 2 3 3 2 3 2 3 2 2 2 2 4 2 2 2 1 2 2 2 3 3 1 3 2 2 1 2 4[149] 2 3

6 Correspondence Analysis

6.1 A particular example: vot i barris a la ciutat de Barccelona

We read the data (a contingency matrix) to be analyzed.

NULL

[,1] [,2] [,3] [,4] [,5] [,6][1,] 20295 22770 12474 5346 3168 21582[2,] 56727 27621 33561 9702 7821 29106

Albert Satorra, Analisi Multivariant, tardor 2006 27

[3,] 28215 27918 16137 8217 4554 21879[4,] 15345 7920 12177 2277 1881 7623[5,] 28611 8217 23760 2772 3069 11682[6,] 25839 13662 11682 4950 4257 13563[7,] 27918 36630 19701 9405 4356 23958[8,] 17028 40986 16929 9504 2376 22770[9,] 22077 25740 13959 7227 3564 17325[10,] 25443 33264 17622 9009 4158 22374

[1] "CiuVella" "Eixample" "SantsMon" "Corts" "SarriaSG"[6] "Gracia" "HortGuin" "NBarris" "\nSanTAndreu" "SantMarti"

[1] "cu" "psc" "pp" "ic" "erc" "abs"

[1] "cu" "psc" "pp" "ic" "erc" "abs"

[1] "CiuVella" "Eixample" "SantsMon" "Corts" "SarriaSG"[6] "Gracia" "HortGuin" "NBarris" "\nSanTAndreu" "SantMarti"

Albert Satorra, Analisi Multivariant, tardor 2006 28

See Venables & Ripley (2002). If nf is two or more the biplot method is called, which plots the second and third columns of the matrices A = Dr^(-1/2) U L and B = Dc^(-1/2) V L where the singular value decomposition is U L V. Thus the x-axis is the canonical correlation times the row and column scores. Although this is called a biplot, it does not have any useful inner product relationship between the row and column scores. Think of this as an equally-scaled plot with two unrelated sets of labels.

6.2 Emprem el programa propi

[1] 10

[1] 6

[,1][1,] 1[2,] 1[3,] 1[4,] 1[5,] 1[6,] 1[7,] 1[8,] 1[9,] 1[10,] 1

[,1][1,] 1[2,] 1[3,] 1[4,] 1[5,] 1[6,] 1

[1] 989703

cu psc pp ic erc absCiuVella 0.02050615 0.02300690 0.01260378 0.00540162 0.00320096 0.02180654Eixample 0.05731720 0.02790837 0.03391017 0.00980294 0.00790237 0.02940882SantsMon 0.02850855 0.02820846 0.01630489 0.00830249 0.00460138 0.02210663Corts 0.01550465 0.00800240 0.01230369 0.00230069 0.00190057 0.00770231SarriaSG 0.02890867 0.00830249 0.02400720 0.00280084 0.00310093 0.01180354Gracia 0.02610783 0.01380414 0.01180354 0.00500150 0.00430129 0.01370411HortGuin 0.02820846 0.03701110 0.01990597 0.00950285 0.00440132 0.02420726NBarris 0.01720516 0.04141242 0.01710513 0.00960288 0.00240072 0.02300690\nSanTAndreu 0.02230669 0.02600780 0.01410423 0.00730219 0.00360108 0.01750525SantMarti 0.02570771 0.03361008 0.01780534 0.00910273 0.00420126 0.02260678

[,1]CiuVella 0.08652596Eixample 0.16624987SantsMon 0.10803241Corts 0.04771431SarriaSG 0.07892368Gracia 0.07472242HortGuin 0.12323697NBarris 0.11073322\nSanTAndreu 0.09082725SantMarti 0.11303391

[,1]cu 0.27028108

Albert Satorra, Analisi Multivariant, tardor 2006 29

[1] "MASS" "methods" "stats" "graphics" "grDevices" "utils"[7] "datasets" "base"

First canonical correlation(s): 0.22635929 0.07721567

Row scores:[,1] [,2]

CiuVella -0.4923746 -0.69004664Eixample 1.0079623 -0.52043297SantsMon -0.2704543 -0.76007414Corts 1.1843971 1.31176435SarriaSG 2.0058709 1.90593764Gracia 0.7271610 -1.92782558HortGuin -0.6286068 0.23228825NBarris -1.5331946 1.31973267\nSanTAndreu -0.4879330 -0.23514799SantMarti -0.6489242 0.05284114

Column scores:[,1] [,2]

cu 1.0399453 -0.7269588psc -1.3563040 0.3550647pp 0.9260450 1.8163692ic -0.8443450 -0.5210486erc 0.6482703 -2.0080104abs -0.4104489 -0.5284274

[,1] [,2]CiuVella -0.4923746 -0.69004664Eixample 1.0079623 -0.52043297SantsMon -0.2704543 -0.76007414Corts 1.1843971 1.31176435SarriaSG 2.0058709 1.90593764Gracia 0.7271610 -1.92782558HortGuin -0.6286068 0.23228825NBarris -1.5331946 1.31973267\nSanTAndreu -0.4879330 -0.23514799SantMarti -0.6489242 0.05284114

[,1] [,2]cu 1.0399453 -0.7269588psc -1.3563040 0.3550647pp 0.9260450 1.8163692ic -0.8443450 -0.5210486erc 0.6482703 -2.0080104abs -0.4104489 -0.5284274

Length Class Modecor 2 -none- numericrscore 20 -none- numericcscore 12 -none- numericFreq 60 -none- numeric

[1] 0.22635929 0.07721567

[,1] [,2]CiuVella -0.4923746 -0.69004664Eixample 1.0079623 -0.52043297SantsMon -0.2704543 -0.76007414Corts 1.1843971 1.31176435SarriaSG 2.0058709 1.90593764Gracia 0.7271610 -1.92782558HortGuin -0.6286068 0.23228825NBarris -1.5331946 1.31973267\nSanTAndreu -0.4879330 -0.23514799SantMarti -0.6489242 0.05284114

[,1] [,2]cu 1.0399453 -0.7269588psc -1.3563040 0.3550647pp 0.9260450 1.8163692ic -0.8443450 -0.5210486erc 0.6482703 -2.0080104abs -0.4104489 -0.5284274

NULL

−0.2 0.0 0.2 0.4

−0.

20.

00.

20.

4

CiuVellaEixample

SantsMon

Corts

SarriaSG

Gracia

HortGuin

NBarris

SanTAndreu

SantMarti

−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 0.4

−0.

3−

0.2

−0.

10.

00.

10.

20.

30.

4

cu

psc

pp

ic

erc

abs

Figure 13: CA: biplot produced by corresp

Albert Satorra, Analisi Multivariant, tardor 2006 30

psc 0.24727418pp 0.17985396ic 0.06912074erc 0.03961188abs 0.19385816

cu psc pp ic erc abs[1,] 0.2369942 0.2658960 0.1456647 0.06242775 0.03699422 0.2520231[2,] 0.3447653 0.1678700 0.2039711 0.05896510 0.04753309 0.1768953[3,] 0.2638889 0.2611111 0.1509259 0.07685185 0.04259259 0.2046296[4,] 0.3249476 0.1677149 0.2578616 0.04821803 0.03983229 0.1614256[5,] 0.3662864 0.1051965 0.3041825 0.03548796 0.03929024 0.1495564[6,] 0.3493976 0.1847390 0.1579652 0.06693440 0.05756359 0.1834003[7,] 0.2288961 0.3003247 0.1615260 0.07711039 0.03571429 0.1964286[8,] 0.1553749 0.3739837 0.1544715 0.08672087 0.02168022 0.2077687[9,] 0.2455947 0.2863436 0.1552863 0.08039648 0.03964758 0.1927313[10,] 0.2274336 0.2973451 0.1575221 0.08053097 0.03716814 0.2000000

[1] "perfils fila"

cu psc pp ic erc abs[1,] 0.237 0.266 0.146 0.062 0.037 0.252[2,] 0.345 0.168 0.204 0.059 0.048 0.177[3,] 0.264 0.261 0.151 0.077 0.043 0.205[4,] 0.325 0.168 0.258 0.048 0.040 0.161[5,] 0.366 0.105 0.304 0.035 0.039 0.150[6,] 0.349 0.185 0.158 0.067 0.058 0.183[7,] 0.229 0.300 0.162 0.077 0.036 0.196[8,] 0.155 0.374 0.154 0.087 0.022 0.208[9,] 0.246 0.286 0.155 0.080 0.040 0.193[10,] 0.227 0.297 0.158 0.081 0.037 0.200

[,1]CiuVella 0.2941529Eixample 0.4077375SantsMon 0.3286828Corts 0.2184361SarriaSG 0.2809336Gracia 0.2733540HortGuin 0.3510512NBarris 0.3327660\nSanTAndreu 0.3013756SantMarti 0.3362052

[,1]cu 0.5198856psc 0.4972667pp 0.4240919ic 0.2629082erc 0.1990273abs 0.4402933

[,1] [,2] [,3] [,4] [,5] [,6][1,] 0.13409208 0.15728799 0.1010340 0.06984683 0.05467570 0.16837265[2,] 0.27039362 0.13764627 0.1961053 0.09144744 0.09737870 0.16381548

Albert Satorra, Analisi Multivariant, tardor 2006 31

[3,] 0.16683621 0.17258895 0.1169717 0.09607872 0.07033935 0.15275784[4,] 0.13653055 0.07367270 0.1328162 0.04006172 0.04371664 0.08008564[5,] 0.19793230 0.05943132 0.2015013 0.03792106 0.05545946 0.09542598[6,] 0.18371200 0.10155342 0.1018185 0.06959382 0.07906068 0.11386320[7,] 0.15456141 0.21201771 0.1337066 0.10296254 0.06299408 0.15661492[8,] 0.09945164 0.25026626 0.1212069 0.10976361 0.03624848 0.15702801[9,] 0.14237026 0.17354264 0.1103523 0.09215967 0.06003603 0.13192231[10,] 0.14707919 0.20103692 0.1248780 0.10298243 0.06278595 0.15271874

cu psc pp ic ercCiuVella -0.018833814 0.011015515 -0.02371387 -0.007488404 -3.868783e-03Eixample 0.058416743 -0.065108015 0.02318712 -0.015750108 1.622779e-02SantsMon -0.004041283 0.009145919 -0.02242001 0.009665293 4.922479e-03Corts 0.022968775 -0.034948287 0.04017924 -0.017366916 2.418946e-04SarriaSG 0.051878963 -0.080267601 0.08235966 -0.035938687 -4.540092e-04Gracia 0.041599176 -0.034376432 -0.01410869 -0.002273199 2.465576e-02HortGuin -0.027945085 0.037451615 -0.01517138 0.010668277 -6.874716e-03NBarris -0.073548630 0.084792795 -0.01991645 0.022276691 -2.998105e-02\nSanTAndreu -0.014310589 0.023678586 -0.01745867 0.012925547 5.404864e-05SantMarti -0.027709047 0.033853286 -0.01770389 0.014591330 -4.128069e-03

absCiuVella 0.0388590904Eixample -0.0157085992SantsMon 0.0080409999Corts -0.0160902879SarriaSG -0.0282671853Gracia -0.0064927319HortGuin 0.0020494227NBarris 0.0105133804\nSanTAndreu -0.0007713361SantMarti 0.0046898723

[1] "CiuVella" "Eixample" "SantsMon" "Corts" "SarriaSG"[6] "Gracia" "HortGuin" "NBarris" "\nSanTAndreu" "SantMarti"

[1] "cu" "psc" "pp" "ic" "erc" "abs"

[1] "contrast de independencia, ji-quadrat = "

[1] 58242.17

[1] "graus de llibertat"

[1] 45

[,1] [,2] [,3] [,4] [,5][1,] -0.14483344 -0.20297925 -0.9031405822 0.110175953 0.04124512[2,] 0.41098402 -0.21220004 0.0889659211 -0.517476567 0.36672954[3,] -0.08889368 -0.24982333 0.0214532382 -0.355066899 -0.36903907[4,] 0.25871504 0.28653664 0.0007902774 0.385571269 -0.02732200[5,] 0.56351650 0.53544189 -0.1308609961 0.003414407 -0.18434354[6,] 0.19877237 -0.52697886 0.2243113252 0.410549354 0.18362577[7,] -0.22067320 0.08154508 0.1501097458 0.462070399 0.25235176[8,] -0.51019505 0.43916218 0.0730028116 -0.248121526 0.40647200[9,] -0.14705109 -0.07086787 0.2500882242 0.020561037 -0.13566630

Albert Satorra, Analisi Multivariant, tardor 2006 32

[10,] -0.21817166 0.01776546 0.1345906783 0.035819338 -0.64176690[,6]

[1,] 0.006713044[2,] 0.237640664[3,] 0.351145031[4,] -0.180296023[5,] 0.163690382[6,] -0.094143105[7,] 0.724248177[8,] -0.088601683[9,] -0.429507402[10,] 0.187490284

[1] 2.263593e-01 7.721567e-02 3.990056e-02 5.665915e-03 4.814842e-03[6] 7.197060e-17

[,1] [,2] [,3] [,4] [,5] [,6][1,] 0.5406526 -0.3779354 0.19568627 -0.12495975 0.4905817 -0.5198856[2,] -0.6744449 0.1765619 0.23835629 0.31090900 0.3364487 -0.4972667[3,] 0.3927282 0.7703075 -0.03489001 0.04735557 -0.2628253 -0.4240919[4,] -0.2219852 -0.1369880 0.45544752 -0.64740238 -0.4860801 -0.2629082[5,] 0.1290235 -0.3996490 0.17219813 0.66329173 -0.5607260 -0.1990273[6,] -0.1807179 -0.2326630 -0.81645037 -0.16245720 -0.1623800 -0.4402933

[,1] [,2] [,3] [,4] [,5][1,] -0.03278439 -0.015673180 -3.603582e-02 6.242475e-04 0.0001985888[2,] 0.09303005 -0.016385169 3.549790e-03 -2.931978e-03 0.0017657449[3,] -0.02012191 -0.019290277 8.559962e-04 -2.011779e-03 -0.0017768650[4,] 0.05856255 0.022125120 3.153251e-05 2.184614e-03 -0.0001315511[5,] 0.12755719 0.041344506 -5.221427e-03 1.934574e-05 -0.0008875851[6,] 0.04499397 -0.040691028 8.950148e-03 2.326138e-03 0.0008841292[7,] -0.04995143 0.006296558 5.989463e-03 2.618051e-03 0.0012150339[8,] -0.11548739 0.033910204 2.912853e-03 -1.405835e-03 0.0019570986[9,] -0.03328638 -0.005472110 9.978660e-03 1.164971e-04 -0.0006532119[10,] -0.04938518 0.001371772 5.370243e-03 2.029493e-04 -0.0030900065

[,6][1,] -0.2941529[2,] -0.4077375[3,] -0.3286828[4,] -0.2184361[5,] -0.2809336[6,] -0.2733540[7,] -0.3510512[8,] -0.3327660[9,] -0.3013756[10,] -0.3362052

[,1] [,2] [,3] [,4] [,5] [,6][1,] -0.11145356 -0.053282416 -0.1225070679 2.122187e-03 0.0006751208 -1[2,] 0.22816162 -0.040185583 0.0087060671 -7.190847e-03 0.0043305923 -1[3,] -0.06121984 -0.058689637 0.0026043228 -6.120729e-03 -0.0054060168 -1[4,] 0.26809928 0.101288769 0.0001443558 1.000116e-02 -0.0006022409 -1[5,] 0.45404751 0.147168260 -0.0185859841 6.886232e-05 -0.0031594126 -1[6,] 0.16459964 -0.148858352 0.0327419651 8.509616e-03 0.0032343741 -1[7,] -0.14229099 0.017936294 0.0170615066 7.457747e-03 0.0034611299 -1

Albert Satorra, Analisi Multivariant, tardor 2006 33

[8,] -0.34705283 0.101904048 0.0087534573 -4.224696e-03 0.0058813055 -1[9,] -0.11044816 -0.018157111 0.0331103796 3.865512e-04 -0.0021674345 -1[10,] -0.14689002 0.004080164 0.0159731144 6.036472e-04 -0.0091908359 -1

[,1] [,2] [,3] [,4] [,5] [,6][1,] 1.0399453 -0.7269588 0.37640253 -0.2403601 0.9436339 -1[2,] -1.3563040 0.3550647 0.47933289 0.6252359 0.6765961 -1[3,] 0.9260450 1.8163692 -0.08226992 0.1116635 -0.6197366 -1[4,] -0.8443450 -0.5210486 1.73234410 -2.4624653 -1.8488583 -1[5,] 0.6482703 -2.0080104 0.86519834 3.3326664 -2.8173314 -1[6,] -0.4104489 -0.5284274 -1.85433311 -0.3689750 -0.3687995 -1

[1] "coordenades principal fila"

[1] "cu" "psc" "pp" "ic" "erc" "abs"

[1] "CiuVella" "Eixample" "SantsMon" "Corts" "SarriaSG"[6] "Gracia" "HortGuin" "NBarris" "\nSanTAndreu" "SantMarti"

[,1] [,2] [,3] [,4] [,5] [,6]CiuVella -0.11 -0.05 -0.12 0.00 0.00 -1Eixample 0.23 -0.04 0.01 -0.01 0.00 -1SantsMon -0.06 -0.06 0.00 -0.01 -0.01 -1Corts 0.27 0.10 0.00 0.01 0.00 -1SarriaSG 0.45 0.15 -0.02 0.00 0.00 -1Gracia 0.16 -0.15 0.03 0.01 0.00 -1HortGuin -0.14 0.02 0.02 0.01 0.00 -1NBarris -0.35 0.10 0.01 0.00 0.01 -1\nSanTAndreu -0.11 -0.02 0.03 0.00 0.00 -1SantMarti -0.15 0.00 0.02 0.00 -0.01 -1

[1] "coordenades estandard columna"

[,1] [,2] [,3] [,4] [,5] [,6]cu 1.04 -0.73 0.38 -0.24 0.94 -1psc -1.36 0.36 0.48 0.63 0.68 -1pp 0.93 1.82 -0.08 0.11 -0.62 -1ic -0.84 -0.52 1.73 -2.46 -1.85 -1erc 0.65 -2.01 0.87 3.33 -2.82 -1abs -0.41 -0.53 -1.85 -0.37 -0.37 -1

[,1] [,2] [,3] [,4] [,5]cu 0.23540127 -0.05613261 0.015018672 -0.0013618596 0.004543448psc -0.30701201 0.02741656 0.019125651 0.0035425332 0.003257704pp 0.20961887 0.14025217 -0.003282616 0.0006326756 -0.002983934ic -0.19112532 -0.04023312 0.069121500 -0.0139521180 -0.008901961erc 0.14674199 -0.15504987 0.034521899 0.0188826030 -0.013565007abs -0.09290892 -0.04080288 -0.073988930 -0.0020905807 -0.001775712

[,6]cu -7.19706e-17psc -7.19706e-17pp -7.19706e-17ic -7.19706e-17erc -7.19706e-17abs -7.19706e-17

Albert Satorra, Analisi Multivariant, tardor 2006 34

Note that the biplot of corresp (the R function of library(MASS)) uses the standard coordinatesboth for rows and columns as seen below.

[1] "compare"

[,1] [,2]CiuVella -0.11145356 -0.053282416Eixample 0.22816162 -0.040185583SantsMon -0.06121984 -0.058689637Corts 0.26809928 0.101288769SarriaSG 0.45404751 0.147168260Gracia 0.16459964 -0.148858352HortGuin -0.14229099 0.017936294NBarris -0.34705283 0.101904048\nSanTAndreu -0.11044816 -0.018157111SantMarti -0.14689002 0.004080164

[1] "with"

[,1] [,2]CiuVella -0.11145356 -0.053282416Eixample 0.22816162 -0.040185583SantsMon -0.06121984 -0.058689637Corts 0.26809928 0.101288769SarriaSG 0.45404751 0.147168260Gracia 0.16459964 -0.148858352HortGuin -0.14229099 0.017936294NBarris -0.34705283 0.101904048\nSanTAndreu -0.11044816 -0.018157111SantMarti -0.14689002 0.004080164

[1] "also, comparare"

[,1] [,2]cu 1.0399453 -0.7269588psc -1.3563040 0.3550647pp 0.9260450 1.8163692ic -0.8443450 -0.5210486erc 0.6482703 -2.0080104abs -0.4104489 -0.5284274

[1] "with"

[,1] [,2]cu 1.0399453 -0.7269588psc -1.3563040 0.3550647pp 0.9260450 1.8163692ic -0.8443450 -0.5210486erc 0.6482703 -2.0080104abs -0.4104489 -0.5284274

7 Factor Analysis

Now we consider factor analysis using R. We consider the data of deure 1, which we analyzedusing principal component analysis, for the purpose of the two-dimensional representation of thedata set. We will now assume that we do not have the row data, only the covariance (correlation)matrix. From R, we can still reproduce the graph of variables. This is the following.

Albert Satorra, Analisi Multivariant, tardor 2006 35

> xx = CPr[, 1]

> yy = CPr[, 2]

> min1 = min(c(xx, yy))

> max1 = max(c(xx, yy))

> plot(xx, yy, type = "n", xlim = c(min1, max1), ylim = c(min1,

+ max1), xlab = "Cp1", ylab = "Cp2")

> text(xx, yy, rownames(N), cex = 0.8)

> abline(v = 0, col = "blue")

> abline(h = 0, col = "blue")

> xx = CPc[, 1]

> yy = CPc[, 2]

> text(xx, yy, colnames(N), cex = 1.2)

−0.2 0.0 0.2 0.4

−0.

20.

00.

20.

4

Cp1

Cp2

CiuVella EixampleSantsMon

Corts

SarriaSG

Gracia

HortGuin

NBarris

SanTAndreuSantMarti

cu

psc

pp

ic

erc

abs

Figure 14: Barris and partits (coorden. principals)

Albert Satorra, Analisi Multivariant, tardor 2006 36

> plot(hclust(dist(CPr[, 1:2])), labels = rownames(N))

NB

arris

Hor

tGui

n

San

tMar

ti

San

tsM

on

Ciu

Vel

la

San

TA

ndre

u

Eix

ampl

e

Gra

cia Cor

ts

Sar

riaS

G

0.0

0.2

0.4

0.6

0.8

Cluster Dendrogram

hclust (*, "complete")dist(CPr[, 1:2])

Hei

ght

Figure 15: Hirarchical clustering

Albert Satorra, Analisi Multivariant, tardor 2006 37

[1] "http://www.econ.upf.es/~satorra/dades/"

pais CREIX INF ATUR INT1 INT21 Austria 1.5 3.1 6.1 5.21 5.742 Belgica 2.5 2.0 10.4 5.44 6.253 Dinamar 3.5 2.0 11.7 6.25 7.404 Finland 1.2 1.7 16.4 5.83 7.135 France 1.5 1.6 12.7 6.06 6.946 Aleman 1.5 2.7 8.2 5.46 6.867 Greece 0.0 11.9 4.6 17.63 15.438 Irlanda 2.2 2.5 15.0 6.31 7.449 Italia 0.7 3.8 12.5 9.19 10.6910 Luxemburg 0.0 2.1 2.5 5.44 6.2511 Holanda 1.7 2.5 9.6 5.43 5.9412 Noruega 5.2 1.8 5.8 5.92 6.4713 Portugal -0.4 4.8 7.1 10.56 11.1614 Espanya -1.0 4.3 24.1 8.27 9.5615 Suecia 1.4 2.2 8.1 8.15 9.2116 Suiza 1.5 0.6 4.4 4.19 4.6917 UK 3.5 2.6 9.4 6.37 7.7518 USA 3.8 2.7 5.8 6.25 7.5019 Japo 0.2 1.1 3.0 2.36 2.68

CREIX INF ATUR INT1 INT2CREIX 1.0000000 -0.37541964 -0.15157719 -0.32767954 -0.3057296INF -0.3754196 1.00000000 -0.03680024 0.94061306 0.8692789ATUR -0.1515772 -0.03680024 1.00000000 0.04858309 0.1871245INT1 -0.3276795 0.94061306 0.04858309 1.00000000 0.9730311INT2 -0.3057296 0.86927894 0.18712448 0.97303115 1.0000000

CREIX INF ATUR INT1 INT2CREIX 1.0000000 -0.37541964 -0.15157719 -0.32767954 -0.3057296INF -0.3754196 1.00000000 -0.03680024 0.94061306 0.8692789ATUR -0.1515772 -0.03680024 1.00000000 0.04858309 0.1871245INT1 -0.3276795 0.94061306 0.04858309 1.00000000 0.9730311INT2 -0.3057296 0.86927894 0.18712448 0.97303115 1.0000000

$values[1] 3.034857173 1.064308060 0.792513136 0.100548589 0.007773042

$vectors[,1] [,2] [,3] [,4] [,5]

[1,] 0.2794084 0.379427906 0.876517523 -0.0983969 -0.0006928348[2,] -0.5452106 0.184856962 0.006543058 -0.7788244 0.2489238495[3,] -0.0750025 -0.897644955 0.392678918 -0.1759944 -0.0586294199[4,] -0.5620776 0.126849115 0.143138063 0.1736668 -0.7857031645[5,] -0.5505631 0.002281196 0.238723654 0.5679934 0.5632668688

[,1] [,2] [,3] [,4] [,5][1,] 3.034857 0.000000 0.0000000 0.0000000 0.000000000[2,] 0.000000 1.064308 0.0000000 0.0000000 0.000000000[3,] 0.000000 0.000000 0.7925131 0.0000000 0.000000000[4,] 0.000000 0.000000 0.0000000 0.1005486 0.000000000[5,] 0.000000 0.000000 0.0000000 0.0000000 0.007773042

Albert Satorra, Analisi Multivariant, tardor 2006 38

[,1] [,2] [,3] [,4] [,5][1,] 0.2794084 0.379427906 0.876517523 -0.0983969 -0.0006928348[2,] -0.5452106 0.184856962 0.006543058 -0.7788244 0.2489238495[3,] -0.0750025 -0.897644955 0.392678918 -0.1759944 -0.0586294199[4,] -0.5620776 0.126849115 0.143138063 0.1736668 -0.7857031645[5,] -0.5505631 0.002281196 0.238723654 0.5679934 0.5632668688

[1] 2

[,1] [,2] [,3] [,4] [,5][1,] 0.4867530 0.391437964 0.78030401 -0.03120107 -6.108368e-05[2,] -0.9498028 0.190708253 0.00582484 -0.24696051 2.194634e-02[3,] -0.1306607 -0.926058175 0.34957537 -0.05580675 -5.169054e-03[4,] -0.9791864 0.130864279 0.12742609 0.05506868 -6.927141e-02[5,] -0.9591273 0.002353403 0.21251945 0.18010728 4.966034e-02

[,1] [,2][1,] 0.4867530 0.391437964[2,] -0.9498028 0.190708253[3,] -0.1306607 -0.926058175[4,] -0.9791864 0.130864279[5,] -0.9591273 0.002353403

Albert Satorra, Analisi Multivariant, tardor 2006 39

> ma = max(abs(A))

> plot(A[, 1], A[, 2], xlim = c(-ma, ma), ylim = c(-ma, ma), type = "n")

> arrows(rep(0, 6), rep(0, 6), A[, 1], A[, 2], length = 0)

> abline(h = 0, lty = 3)

> abline(v = 0, lty = 3)

> text(A[, 1], A[, 2], rownames(S), cex = 0.7)

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

A[, 1]

A[,

2]

CREIX

INF

ATUR

INT1

INT2

Figure 16: Variable plot

Albert Satorra, Analisi Multivariant, tardor 2006 40

We now consider the possibility of estimating the factor model

Xj = αj1F1 + αj2F2 + Ej

(see the help of ’factanal’ of R)We use the function factanal, first without rotation.

Call:factanal(factors = 2, covmat = R, n.obs = 19, rotation = "none")

Uniquenesses:CREIX INF ATUR INT1 INT20.883 0.060 0.636 0.005 0.005

Loadings:Factor1 Factor2

CREIX -0.322 0.115INF 0.916 -0.317ATUR 0.114 0.593INT1 0.993INT2 0.991 0.118

Factor1 Factor2SS loadings 2.924 0.487Proportion Var 0.585 0.097Cumulative Var 0.585 0.682

Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 2.77 on 1 degree of freedom.The p-value is 0.0959

Call:factanal(factors = 2, covmat = R, n.obs = 19, rotation = "none")

Uniquenesses:CREIX INF ATUR INT1 INT20.883 0.060 0.636 0.005 0.005

Loadings:Factor1 Factor2

CREIX -0.322 0.115INF 0.916 -0.317ATUR 0.114 0.593INT1 0.993INT2 0.991 0.118

Factor1 Factor2SS loadings 2.924 0.487Proportion Var 0.585 0.097Cumulative Var 0.585 0.682

Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 2.77 on 1 degree of freedom.The p-value is 0.0959

Albert Satorra, Analisi Multivariant, tardor 2006 41

> W = fa$loadings

> m = 2

> A = W[, 1:m]

> ma = max(abs(A))

> plot(A[, 1], A[, 2], xlim = c(-ma, ma), ylim = c(-ma, ma), type = "n")

> arrows(rep(0, 6), rep(0, 6), A[, 1], A[, 2], length = 0)

> abline(h = 0, lty = 3)

> abline(v = 0, lty = 3)

> text(A[, 1], A[, 2], rownames(S), cex = 0.7)

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

A[, 1]

A[,

2]

CREIX

INF

ATUR

INT1

INT2

Figure 17: variable plot from ML factor analysis, no rotation

Albert Satorra, Analisi Multivariant, tardor 2006 42

Now we do the some, but applying rotation.

Call:factanal(factors = 2, covmat = R, n.obs = 19, rotation = "varimax")

Uniquenesses:CREIX INF ATUR INT1 INT20.883 0.060 0.636 0.005 0.005

Loadings:Factor1 Factor2

CREIX -0.338INF 0.959 -0.139ATUR 0.603INT1 0.993INT2 0.951 0.302

Factor1 Factor2SS loadings 2.926 0.486Proportion Var 0.585 0.097Cumulative Var 0.585 0.682

Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 2.77 on 1 degree of freedom.The p-value is 0.0959

Call:factanal(factors = 2, covmat = R, n.obs = 19, rotation = "varimax")

Uniquenesses:CREIX INF ATUR INT1 INT20.883 0.060 0.636 0.005 0.005

Loadings:Factor1 Factor2

CREIX -0.338INF 0.959 -0.139ATUR 0.603INT1 0.993INT2 0.951 0.302

Factor1 Factor2SS loadings 2.926 0.486Proportion Var 0.585 0.097Cumulative Var 0.585 0.682

Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 2.77 on 1 degree of freedom.The p-value is 0.0959

Albert Satorra, Analisi Multivariant, tardor 2006 43

> W = fa$loadings

> m = 2

> A = W[, 1:m]

> ma = max(abs(A))

> plot(A[, 1], A[, 2], xlim = c(-ma, ma), ylim = c(-ma, ma), type = "n")

> arrows(rep(0, 6), rep(0, 6), A[, 1], A[, 2], length = 0)

> abline(h = 0, lty = 3)

> abline(v = 0, lty = 3)

> text(A[, 1], A[, 2], rownames(S), cex = 0.7)

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

A[, 1]

A[,

2] CREIX

INF

ATUR

INT1

INT2

Figure 18: variable plot from ML factor analysis, with varimax rotation

Albert Satorra, Analisi Multivariant, tardor 2006 44

8 Discriminant Analysis

NULL

A B C D Grou1 1692 4968 29 139 12 3244 6710 31 85 13 2551 6895 41 121 14 2363 7164 28 100 15 1762 6734 14 58 16 1376 5241 16 80 17 739 3087 20 61 18 1323 4418 3 60 19 1002 13270 77 210 210 1038 11245 83 154 211 623 12338 93 122 212 903 11987 112 146 213 1068 11583 87 103 214 810 11691 85 92 215 1994 7569 55 133 216 604 13614 119 131 217 1828 9769 26 60 318 822 9283 13 139 319 962 6368 18 88 320 1708 10896 25 71 321 1247 8040 21 76 322 1450 6760 10 121 323 1085 8110 19 77 324 1300 8461 19 90 3

A B C D8 1323 4418 3 6016 604 13614 119 13124 1300 8461 19 90

A B C D Grou1 1692 4968 29 139 12 3244 6710 31 85 13 2551 6895 41 121 14 2363 7164 28 100 15 1762 6734 14 58 16 1376 5241 16 80 17 739 3087 20 61 19 1002 13270 77 210 210 1038 11245 83 154 211 623 12338 93 122 212 903 11987 112 146 213 1068 11583 87 103 214 810 11691 85 92 215 1994 7569 55 133 217 1828 9769 26 60 318 822 9283 13 139 319 962 6368 18 88 320 1708 10896 25 71 321 1247 8040 21 76 3

Albert Satorra, Analisi Multivariant, tardor 2006 45

22 1450 6760 10 121 323 1085 8110 19 77 3

[1] 24 5

[1] 21 5

The following object(s) are masked from ndata ( position 4 ) :

A B C D Grou

<environment: 0x18c673c>attr(,"name")[1] "ndata"

[1] "MASS" "methods" "stats" "graphics" "grDevices" "utils"[7] "datasets" "base"

Call:lda(Grou ~ A + B + C + D, data = ndata, prior = c(1/3, 1/3, 1/3))

Prior probabilities of groups:1 2 3

0.3333333 0.3333333 0.3333333

Group means:A B C D

1 1961.000 5828.429 25.57143 92.000002 1062.571 11383.286 84.57143 137.142863 1300.286 8460.857 18.85714 90.28571

Coefficients of linear discriminants:LD1 LD2

A 3.088050e-04 -0.0010990240B 6.440719e-05 0.0006804682C -8.528876e-02 -0.0517612426D -8.552957e-03 -0.0015027848

Proportion of trace:LD1 LD2

0.8364 0.1636

Grou 1 2 31 6 0 12 1 6 03 0 0 7

Albert Satorra, Analisi Multivariant, tardor 2006 46

> plot(predict(ld)$x, type = "n")

> text(predict(ld)$x, as.character(Grou), col = Grou)

> predict(ld, datagdesconegut)$class

[1] 1 2 3Levels: 1 2 3

−6 −4 −2 0 2

−2

−1

01

2

LD1

LD2

1

11

1

1

1

1

2

2

2

2

2

2

2

3

3

3

3

3

3

3

Figure 19: discriminant space

Albert Satorra, Analisi Multivariant, tardor 2006 47

Call:manova(cbind(A, B, C, D) ~ Grou)

Terms:Grou Residuals

A 1527902 7623070B 24253881 132389541C 158 20658D 10 29411Deg. of Freedom 1 19

Residual standard error: 633.4147 2639.672 32.97384 39.34393Estimated effects may be unbalanced

Response A :Df Sum Sq Mean Sq F value Pr(>F)

Grou 1 1527902 1527902 3.8082 0.0659 .Residuals 19 7623070 401214---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Response B :Df Sum Sq Mean Sq F value Pr(>F)

Grou 1 24253881 24253881 3.4808 0.0776 .Residuals 19 132389541 6967871---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Response C :Df Sum Sq Mean Sq F value Pr(>F)

Grou 1 157.8 157.8 0.1451 0.7075Residuals 19 20658.2 1087.3

Response D :Df Sum Sq Mean Sq F value Pr(>F)

Grou 1 10.3 10.3 0.0066 0.9359Residuals 19 29411.0 1547.9

8.1 Quadratic discriminant analysis

Call:qda(Grou ~ A + B + C + D, data = ndata, prior = c(1/3, 1/3, 1/3))

Prior probabilities of groups:1 2 3

0.3333333 0.3333333 0.3333333

Group means:A B C D

1 1961.000 5828.429 25.57143 92.000002 1062.571 11383.286 84.57143 137.142863 1300.286 8460.857 18.85714 90.28571

Call:qda(Grou ~ A + B + C + D, data = ndata, prior = c(1/3, 1/3, 1/3))

Albert Satorra, Analisi Multivariant, tardor 2006 48

Prior probabilities of groups:1 2 3

0.3333333 0.3333333 0.3333333

Group means:A B C D

1 1961.000 5828.429 25.57143 92.000002 1062.571 11383.286 84.57143 137.142863 1300.286 8460.857 18.85714 90.28571

Grou 1 2 31 6 0 12 1 6 03 0 0 7

[1] 1 2 3Levels: 1 2 3