Principal Component Analysis: Preliminary Studies
Émille E. O. IshidaIF - UFRJ
First Rio-Saclay Meeting: Physics Beyond the Standard ModelRio de Janeiro - dec/2006
The main objective of:
PhysicsStatistics
ScienceSimplification
Statistics is the art of extracting simple comprehensible facts that tell us what we want to know for practical reasons
Principal Component Analysis (PCA) is a tool for simplifying one particular class of data......
–astro-ph/9905079
For example...nn objects and pp things we know about them...
-height;-n° publications;-flier miles;-fuel consumption;
-height;-n° publications;-flier miles;-fuel consumption;
-height;-n° publications;-flier miles;-fuel consumption;
-height;-n° publications;-flier miles;-fuel consumption;
-height;-n° publications;-flier miles;-fuel consumption;
-height;-n° publications;-flier miles;-fuel consumption;
n=6n=6 objects and p=4p=4 things we know about them...
How this parameters are related to each other?
–astro-ph/9905079
For example...
Do people who spend most of their lives in airports publish more?
Do people with inefficient cars fly more..... or just the ones with lots of publications do?
Do these correlations represent any real causal connection?
or..... once you buy a car, stop publishing and give lots of talks in exotic foreign locations?
–astro-ph/9905079
First try: Plot everything against everything else...
...as the number of parameters increases this becomes impossibly complicated!
PCA looks for sets of parameters that always correlate togheter
The first application of PCA was in social science....
Ex: give a sample of n people a set of p exams testing their creativity, memory, math skills....And look for correlations.....
Result: nearly all tests correlates to each other, indicating that one underlying variable could predict the performancesin all tests
IQ.....an infamous begginig...!!
–astro-ph/9905079
General Idea:
Given a sample of: n objects;p measured quantities - xi (i=1,2,3,....,p)
Find a new set of p orthogonal variables i , ... peach a linear combination of the original ones
pipjijii xaxaxa .......11
Determine aij such that the smallest number of new variables account for as much of the sample variance as possible.
Principal Components
–astro-ph/9905079
Basic Statistics n1 x,...,xx 1 sample
},{ nn11 yx,...},y,{x 2 sample
Mean Value:
n
i
i
n
xx
1
n
i
i
n
xx
1
22
1
Covariance:
n
i
ii
n
yyxxyx
1 1),cov(
Variance:
http://csnet.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Covariance Matrix in 2-D
),cov(),cov(
),cov(),cov(
yyxy
yxxxC
Eigenvectors New axes (new uncorrelated variables)
Eigenvalues variances in the direction of the Principal Components
The largest eigenvalue First Principal Component
p
jjji bx
1
http://csnet.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
But.....that´s not our case....
We want to make inferences about a model using a sample of data....
Parameter EstimationParameter Estimation
:Estimator
ˆlim
data
2
12
222min
;ln1
n
i
ixfEIb
I
Consistency:
Bias:
Efficiency:
Robusteness:
ˆEb
(noise) pdf the in sassumption initial of ceIndependen
http://pdg.lbl.gov/)
The Method of Maximum Likelihood
;
,....,1
i
n
m1
xf
x,...,xx
parameters
sample
Function Likelihood the maximizethat of values i ˆ
;1
i
m
ixfL
http://pdg.lbl.gov/)
For an unbiased estimator....
ˆ
1 lnˆji
ij
LC
We can calculate the covariance between We can calculate the covariance between the the parameters parameters of the theoryof the theory
Fisher MatrixFisher Matrix ji
ijij
LCF
ln21
http://pdg.lbl.gov/)
What about Cosmology?
Direct evidence for an accelerated expansion:
Can we get information out of SN Ia Can we get information out of SN Ia observations without the assumption of observations without the assumption of
General Relativity?General Relativity?
2 2 2 2 2 2 2 22
1( )1
ds dt a t dr r d sen dkr
2
0 0
1( ) 1
( ) exp 1 ( ) ln(1 )z
a dq z
aH dt H
H z H q u d u
z u
vduqduMpcH
zz
0 00
1ln1exp1
log525
Definitions....
25log5
10
MpcdMzm
uH
duzzd
LBB
z
L
As proposed by Shapiro & Turner (2006)...As proposed by Shapiro & Turner (2006)...
z u
vduqduMpcH
zz
0 00
1ln1exp1
log525
N
i ii
iiii zzzc
zzzcwherezczq
1 ,0
,1
Modulus DistanceModulus Distance
z = 0.05;Data from Gold Sample
(Riess et al.)
2
2
2exp
2
1,;
i
if
Gaussian probabilitydistribution in each bin...
N
j
x
i lk
j
j
ji
l
j
k
j
j
kl
j
F1 1
2
2
1
bin. th- the inside SN of
number the denotes and where
j
xxxxx jN,...,, 21
The Fisher Matrix
Observation about ...
jx
i
ibinthj N1
2
PC1
PC2
PC3
PC4
PC5
PC6
Reconstruction of q(z)
We need more data!
–arXiv:astro-ph/0512586
Next Steps....
Small corrections in the present codeSmall corrections in the present code(optimization);(optimization);
Change the observable;Change the observable;
Get used to this procedure and be able to Get used to this procedure and be able to handle large data sets in a model handle large data sets in a model
independent wayindependent way
References- D. Huterer e G. Starkman, Parametrization of dark energy
properties: A Principal-Component Approach, Physical Review Letters, 90 (3), Janeiro/2003
– C. Shapiro e M. S. Turner, What do we really know about cosmic acceleration?, arXiv:astro-ph/0512586
– G. Cowan, Statistical Data Analysis, Clarendon Press, Oxford (1998)
– P. J. Francis and B. J. Wills, Introduction to Principal Component Analysis, arXiv: astro-ph/9905079
– W.-M. Yao et al., Journal of Physics G 33, 1 (2006)available on the PDG WWW pages (URL: http://pdg.lbl.gov/)
–arXiv:astro-ph/0512586
Shapiro & Turner (2006) Principal Components
Top Related