Dimension Reduction Methods. statistical methods that provide information about point scatters in...

Post on 17-Jan-2016

216 views 0 download

Tags:

Transcript of Dimension Reduction Methods. statistical methods that provide information about point scatters in...

Dimension Reduction Methods

• statistical methods that provide information about point scatters in multivariate space

• “factor analytic methods”– simplify complex relationships between cases

and/or variables– makes it easier to recognize patterns

• identify and describe ‘dimensions’ that underlie the input data– may be more fundamental than those directly

measured, and yet hidden from view

• reduce the dimensionality of the research problem– benefit = simplification; reduce number of variables

you have to worry about

• identifying sets of variables with similar “behaviour”

How?

Basic Ideas

• imagine a point scatter in multivariate space:– the specific values of the numbers used to describe the

variables don’t matter

– we can do anything we want to the numbers, provided they don’t distort the spatial relationships that exist among cases

• some kinds of manipulations help us think about the shape of the scatter in more productive ways

• imagine a two dimensional scatter of points that show a high degree of correlation …

x

y

bar-x

bar-y

orthogonal regression…

Why bother?

• more “efficient” description– 1st var. captures max. variance – 2nd var. captures the max. amount of residual

variance, at right angles (orthogonal) to the first

• the 1st var. may capture so much of the information content in the original data set that we can ignore the remaining axis

• other advantages…

• you can score original cases (and variables) in new space, and plot them…

• spatial arrangements may reveal relationships that were hidden in higher dimension space

• may reveal subsets of variables based on correlations with new axes…

length

width

“size”

“shape”

Storage / Cooking

Cooking

PUBLIC PRIVATE

DO

ME

STIC

RIT

UA

L

Ritual

candelero

Service?

Principal Components Analysis (PCA)

why:• clarify relationships among variables• clarify relationships among cases

when:• significant correlations exist among variables

how:• define new axes (components)• examine correlation between axes and variables• find scores of cases on new axes

r = 0r = -1r = 1

x4

x3

x2

x1

pc2

pc1

component loading

eigenvalue: sum of all squared loadings on one component

eigenvalues

• the sum of all eigenvalues = 100% of variance in original data

• proportion accounted for by each eigenvalue = ev/n (n = # of vars.)

• correlation matrix; variance in each variable = 1– if an eigenvalue < 1, it explains less variance than one

of the original variables

– but .7 may be a better threshold…

• ‘scree plots’ – show trade-off between loss of information, and simplification

Mandara Region knife morphology

J. Yellen – San ethnoarchaeology (1977)

• CAMP: the camp identification number (1-16.)• LENGTH: the total number of days the camp was occupied.• INDIVID: the number of individuals in the principal period of

occupation of the camp. Note that not all individuals were at the camp for the entire LENGTH of occupation.

• FAMILY: the number of families occupying the site.• ALS: the absolute limit of scatter; the total area (m²) over

which debris was scattered.• BONE: the number of animal bone fragments recovered from the site.• PERS_DAY: the actual number of person-days of occupation (not

the product of INDIVID*LENGTH—not all individuals were at the camp for the entire time.)

Correspondence Analysis (CA)

• like a special case of PCA — transforms a table of numerical data into a graphic summary

• hopefully a simplified, more interpretable display deeper understanding of the fundamental

relationships/structure inherent in the data

• a map of basic relationships, with much of the “noise” eliminated

• usually reduces the dimensionality of the data…

• derived from methods of contingency table analysis

most suited for analysis of categorical data: counts, presence-absence data

• possibly better to use PCA for continuous (i.e., ratio) data

• but, CA makes no assumptions about the distribution of the input variables…

CA – basic ideas

• simultaneously R and Q mode analysis

• derives two sets of eigenvalues and eigenvectors ( CA axes; analogous to PCA components)

• input data is scaled so that both sets of eigenvectors occupy very comparable spaces

• can reasonably compare both variables and cases in the same plots

CA output

• CA (factor) scores– for both cases and variables

• percentage of total inertia per axis– like variance in PCA; relates to dispersal of points

around an average value– inertia accounted for by each axis distortion in a

graphic display

• loadings– correlations between rows/columns and axes– which of the original entities are best accounted for by

what axis?

“mass”

• as in PCA new axes maximize the spread of observations in rows / columns– spread is measured in inertia, not variance

– based on a “chi-squared” distance, and is assessed separately for cases and variables (rows and columns)

• contributions to the definition of CA axes is weighted on the basis of row/column totals– ex: pottery counts from different assemblages; larger

collections will have more influence than smaller ones

“Israeli political economic concerns”

residential codes:

As/Af (Asia or Africa)

Eu/Am (Europe or America)

Is/AA (Israel, dad lives in Asia or Africa)

Is/EA (Israel, dad lives in Europe or America)

Is/Is (Israel, dad lives in Israel)

“Israeli political economic concerns”

“worry” codesENR Enlisted relativeSAB SabotageMIL Military situationPOL Political situationECO Economic situationOTH OtherMTO More than one worryPER Personal economics

As/Af Eu/Am Is/AA Is/EA Is/IsENR 61 104 8 22 5SAB 70 117 9 24 7MIL 97 218 12 28 14POL 32 118 6 28 7ECO 4 11 1 2 1OTH 81 128 14 52 12MTO 20 42 2 6 0PER 104 48 14 16 9

Ksar Akil – Up. Pal., Lebanon

Data> Frequency> COUNT

Statistics> Data Reduction> CA

Correspondence Plot

-1.0 -0.5 0.0 0.5 1.0Dim(1)

-1.0

-0.5

0.0

0.5

1.0D

im(2

)

10

9

3

45

2

61

8

7

PC

BT

NC

BL

FB

Correspondence Plot

-1.0 -0.5 0.0 0.5 1.0Dim(1)

-1.0

-0.5

0.0

0.5

1.0

Dim

(2)

10

9

3

45

2

61

8

7

PC

BT

NC

BL

FB

partCor

nonCor

flak eB d

blade

bladelet

-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5

Dim ens ion 1; E igenvalue: .07609 (59.41% of Inert ia)

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

Dim

ension 2; Eigenvalue: .04095 (31.97%

of Inertia)

1

2

3

4

5

6

7

89

10

-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5

Dim ens ion 1; E igenvalue: .07609 (59.41% of Inert ia)

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Dim

ension 2; Eigenvalue: .04095 (31.97%

of Inertia)

2D P lot of Row and Colum n Coordinates ; Dim ens ion: 1 x 2

Input Table (Rows x Colum ns ): 10 x 5

S tandardiz ation: Row and c olum n profiles

1

2

3

45

6

7

89

10

partCor

nonCor

flak eB d

bladebladelet

-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5

Dim ens ion 1; E igenvalue: .07609 (59.41% of Inert ia)

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Dim

ension 2; Eigenvalue: .04095 (31.97%

of Inertia)

Multidimensional Scaling (MDS)

• aim: define low-dimension space that preserves the distance between cases in original high-dimension space…

• closely related to CA/PCA, but with an iterative location-shifting procedure…– may produce a lower-dimension solution than

CA/PCA– not simultaneously Q and R mode…

0 10 20 30 40 50 60 70EAST

10

20

30

40

50

60

NO

RT

H

A

B

D

C

A B C D

A B C D

‘non-metric’ MDS

‘metric’ MDS

Tree Diagram for 21 Cases

W ard`s method

Euc lidean distances

0 50 100 150 200 250 300

Linkage D istance

AtsinnaLPescadoSpr

CienegaPuebloMuerto

PescadoWRainbowSpr

TinajaNAtsinna

MirabalRuinGigantes

DayRanchJacksLake

UpperSoldadoBoxS

ScribeSMillerCanyon

Spier61UPescadoSpr

Spier81YellowHouseHeshYalawa

2D Plot of Row Coordinates; D imensions: 1 x 2

Input Table (Rows x Columns): 21 x 10

Standardization: Row and column profiles

Hes hY alawa

RainbowS pr

S pier61

Y ellowHous e

S pier81

P es c adoW

LP es c adoS pr

UP es c adoS pr

B ox S

Day Ranc h

P uebloM uerto

CienegaM irabalRuinTinaja

G igantesNA ts inna

A ts inna

M illerCany on

UpperS oldadoJac k s Lak e

S c ribeS

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5

Dimension 1; Eigenvalue: .43072 (45.49% of Inertia)

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

Dim

ension 2; Eigenvalue: .23744 (25.08%

of Inertia)

2D Plot of Row Coordinates; D imensions: 1 x 2

Input Table (Rows x Columns): 21 x 10

Standardization: Row and column profiles

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5

Dimension 1; Eigenvalue: .43072 (45.49% of Inertia)

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

Dim

ension 2; Eigenvalue: .23744 (25.08%

of Inertia)

Scatterplot 2D

Final Configuration, dimension 1 vs. dimension 2

Hes hY alawa

RainbowS pr

S pier61 Y ellowHous e

S pier81

P es c adoW

LP es c adoS pr

UP es c adoS prB ox S

Day Ranc h

P uebloM uerto

Cienega

M irabalRuin

Tinaja

G igantes

NA ts inna

A ts inna

M illerCany on

UpperS oldadoJac k s Lak e

S c ribeS

-1.0 -0.5 0.0 0.5 1.0 1.5 2.0

Dimension 1

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Dim

ension 2

S hepard Diagram

0 20 40 60 80 100 120

Data

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Distances/D

-Hats

“Shepard Diagram”

Discriminant Analysis (DFA)

• aims: – calculate a function that maximizes the ability

to discriminate among 2 or more groups, based on a set of descriptive variables

– assess variables in terms of their relative importance and relevance to discrimination

– classify new cases not included in the original analysis

var A

var

B

DFA

• DFs = groups-1– each subsequent function is orthogonal to the

last– associated with eigenvalues that reflect how

much ‘work’ each function does in discriminating between groups

• stepwise vs. complete DFA

Figure 6.5: Factor structure coefficients: These values show the correlation between Miccaotli ceramic categories and the first two discriminant functions. Categories exhibiting high positive or negative values are the most important for discriminating among A-clusters.

Function 1

Funct

ion 2

-4

-3

-2

-1

0

1

2

3

4

5

6

7

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

MC:acls4/1

MC:acls4/2

MC:acls4/3

MC:acls4/4

+ outcurving bowl+ cazuela/crater

(comales, other fine-wares)

+ olla

+ outcurving bowl (ollas, other fine-wares)

+ cazuela/crater

Figure 6.4: Case scores calculated for the first two functions generated by discriminant analysis, using Miccaotli A-cluster membership as the grouping variable and posterior estimates of ceramic category proportions as discriminating variables.

Figure 6.6: Factor structure coefficients generated by four separate DFA analyses using binary grouping variables derived from Miccaotli A-cluster memberships. A single discriminant function is associated with each A-cluster.