machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf ·...

120
Machine Learning SS: Kyoto U. Information Geometry and Its Applications to Machine Learning Shun-ichi Amari RIKEN Brain Science Institute

Transcript of machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf ·...

Page 1: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Machine Learning SS: Kyoto U.

Information Geometryand Its Applications toMachine Learning

Shun-ichi AmariRIKEN Brain Science Institute

Page 2: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Information Geometry

-- Manifolds of Probability Distributions

{ ( )}M p= x

Page 3: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Information GeometryInformation Geometry

Systems Theory Information Theory

Statistics Neural Networks

Combinatorics PhysicsInformation Sciences

Riemannian ManifoldDual Affine Connections

Manifold of Probability Distributions

Math. AIVision

Optimization

Page 4: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

( ){ } ( ) ( )2

2

1; , ; , exp22

xS p x p x

μμ σ μ σ

σπσ

⎧ ⎫−⎪ ⎪= = −⎨ ⎬⎪ ⎪⎩ ⎭

Information Geometry ?Information Geometry ?

( ){ }p xσ

μ

( ){ };S p x= θ

Gaussian distributions

( , )μ σ=θ

Page 5: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Manifold of Probability DistributionsManifold of Probability Distributions

( )1 2 3 1 2 3

1, 2,3 ={ ( )}, , 1

nx S p xp p p p p p

=

= + + =

3p

2p1p

p

( ){ };M p x= θ

Page 6: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

InvarianceInvariance ( ){ },S p x= θ

Invariant under different representation

( ) ( ), ,=y y x p ysufficient statistics

θ( ) ( )

2

1 2

21 2

, ,

| ( , ) ( , ) |

p x p x dx

p y p y dy

θ θ

θ θ

≠ −

∫∫

Page 7: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Two Geometrical StructuresTwo Geometrical StructuresRiemannian metric affine connection --- geodesic

( )2ij i jds g d dθ θ= ∑ θ

Fisher information

log logiji j

g E p pθ θ

⎡ ⎤∂ ∂= ⎢ ⎥

∂ ∂⎢ ⎥⎣ ⎦

Orthogonality: innner product

1 2 1 2, Td d d Gdθ θ θ θ< >=

Page 8: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Affine Connectioncovariant derivative; parallel transport

,

geodesic X=X X=X(t)

( )

X c

i jij

Y X Y

s g d dθ θ θ

∇ Π =

Π

= ∑∫minimal distance

& &

straight line

Page 9: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Duality: two affine connectionsDuality: two affine connections

, , , i jijX Y X Y X Y g X Y∗= Π Π < >= ∑

Riemannian geometry: ∗∏ = ∏

X

Y

X

{ , , , *}S g ∇ ∇

Page 10: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Dual Affine Connections

e-geodesic

m-geodesic

( ) ( ) ( ) ( ) ( )log , log 1 logr x t t p x t q x c t= + − +

( ) ( ) ( ) ( ), 1r x t tp x t q x= + −

( ), ∗∇ ∇

( )q x

( )p x

*( , )Π Π

( ) 0* ( ) 0x

x

x tx t

∇ =∇ =

&

&

&

&

Page 11: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Mathematical structure of ( ){ },S p x= ξ

( )( )

ij i j

ijk i j k

g E l l

T E l l l

ξ

ξ

⎡ ⎤= ∂ ∂⎣ ⎦⎡ ⎤= ∂ ∂ ∂⎣ ⎦

( )log , ; i il p xξ∂

= ∂ =∂

ξ

-connection

{ }, ;ijk ijki j k Tα αΓ = −

α α−∇ ↔ ∇ : dually coupled

, , ,X XX Y Z Y Z Y Z∗= ∇ + ∇

α

{M,g,T}

Page 12: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Divergence: [ ]:D z y

[ ]

[ ]

[ ]

: 0

: 0, iff

: ij i j

D

D

D d g dz dz

= =

+ = ∑

z y

z y z y

z z z

positive‐definite

Z

Y

M

Page 13: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Kullback-Leibler Divergencequasi-distance

( )[ ( ) : ( )] ( ) log( )

[ ( ) : ( )] 0 =0 iff ( ) ( )[ : ] [ : ]

x

p xD p x q x p xq x

D p x q x p x q xD p q D q p

=

≥ =≠

Page 14: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

[ ]: 0if i

i

qD p fp

⎛ ⎞= ≥⎜ ⎟

⎝ ⎠∑ %

% % %%

p q [ ]: 0fD = ⇔ =% % % %p q p q

( ) ( ) ( )not invariant under 1f u f u c u= − −%

divergence of f S%

{ }, 0 : ( 1 holds)i iS p p nn= > =∑% % % %p

Page 15: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

divergence

α

1 12 21 1[ : ] { }

2 2i i i iD p q p q p qα α

αα α − +− +

= + −∑% % % % % %

[ : ] { log }ii i i

i

pD p q p p qq

= + −∑ %% % % % %

%

KL-divergence

α

Page 16: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

( , ) divergenceα β −

, [ : ] { }i i i iD p q p q p qα β α β α βα β

α βα β α β

+ += + −+ +∑

: divergence1: -divergence

β α αα β

= − −=

Page 17: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Metric and Connections Induced by Divergence(Eguchi)

( ) [ ] [ ] ( )

( ) [ ]

( ) [ ]

'

' '

1: : : = 2

:

:

=

=

∗=

= ∂ ∂ +

Γ = −∂ ∂ ∂

Γ = −∂ ∂ ∂

ij i j ij i j

ijk i j k

ijk i j k

g D D d g dz dz

D

D

y z

y z

y z

z z y z z z z

z z y

z z y

*

'

{ , }

, i ii iz y

∇ ∇

∂ ∂∂ = ∂ =

∂ ∂

Riemannian metric

affine connections

Page 18: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Duality:

{ }, ,

k ij kij kji

ijk ijk ijk

g

T

M g T

∂ = Γ + Γ

Γ = Γ −

*, , ,X XX Y Z Y Z Y Z< >=< ∇ > + < ∇ >

Page 19: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Dually flat manifoldexponential family; mixture family; {p(x); x discrete}

-coordinates : affine coordinates, flat, geodesics

-coordinates: (dual) affine coordinates, flat, geodesics

canonical divergence D(P: P') :

θ

η

b

KL divergencenot Riemannian flat

( ) ( ){ }, exp : exponential familyθ θ ψ θ= −∑ i ip x x

Page 20: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

( ) ( )

( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( ){ }

2 2

potential functions ,

; :

0

, exp : exponential family

: cumulant generating function: negative en

ψ θ ϕ η

η ψ θ θ ϕ θθ η

ψ θ ϕ η θη

θ ψ θ ϕ θθ θ η η

θ θ ψ θ

ψϕ

∂ ∂= =

∂ ∂

+ − =

∂ ∂= =

∂ ∂ ∂ ∂

= −

L

  i ii i

i i

ijij

i j i j

i i

g g

p x x

Legendre transformation

( ) ( )tropy

canonical divergence D(P: P')= ' 'ψ θ ϕ η θη+ − ∑ i i

Dually flat manifold

Page 21: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Manifold with Convex Function

S : coordinates ( )1 2, , , nθ θ θ= Lθ

( )ψ θ : convex function

negative entropy ( ) ( ) ( )logp p x p x dxϕ = ∫

( ) ( )212

iψ θ= ∑θ

Page 22: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Riemannian metric and flatness(affine structure)Bregman divergence

( ) ( ) ( ) ( ), grad D ψ ψ′ ′ ′= − − ⋅θ θ θ θ θ θ

( ) ( )1,2

i jijD d g d dθ θ+ = ∑θ θ θ θ

( ) , ij i j i ig ψθ∂

= ∂ ∂ ∂ =∂

θ

θ : geodesic (not Levi-Civita)Flatness (affine)

{ , ( ), }S ψ θ θ

Page 23: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Legendre Transformation

( )i iη ψ= ∂ θ

↔θ η one-to-one

( ) ( ) 0iiϕ ψ θ η+ − =η θ

( ) ,i i i

i

θ ϕη∂

= ∂ ∂ =∂

η

( ) ( ) ( ),D ψ ϕ′ ′ ′= + − ⋅θ θ θ η θ η

( ) max { ( )}iiθϕ η θ η ψ θ= −

Page 24: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Two affine coordinate systems ( ),θ η

θ : geodesic (e-geodesic)

η : dual geodesic (m-geodesic)

“dually orthogonal”,

,

j ji i

ii i

i

δ

θ η

∂ ∂ =

∂ ∂∂ = ∂ =

∂ ∂

*, , ,X XX Y Z Y Z Y Z< >=< ∇ > + < ∇ >

Page 25: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Pythagorean Theorem (dually flat manifold)

[ ] [ ] [ ]: : :D P Q D Q R D P R+ =

Euclidean space: self-dual =θ η

( ) ( )212 iψ θ θ= ∑

Page 26: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Projection Theorem

[ ]min :Q M

D P Q∈

Q = m-geodesic projection of P to M

[ ]min :Q M

D Q P∈

Q’ = e-geodesic projection of P to M

Page 27: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Two Types of DivergenceInvariant divergence (Chentsov, Csiszar)

f-divergence: Fisher- structure

Flat divergence (Bregman) – convex function

KL-divergence belongs to both classes: flat and invariant

∫q(x)D[p : q] = p(x)f{ }dxp(x)

α

Page 28: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

dually flat space

convex functionsBregman

divergence

invariance

invariant divergence Flat divergence

KL‐divergenceF-divergenceFisher inf metricAlpha connection

: space of probability distributions}{p=S

log∫p(x)D[p : q] = p(x) { }dxq(x)

Page 29: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

{ }, 0 : ( 1 holds)i iS p p nn= > =∑% % % %p

Space of positive measures :  vectors, matrices, arrays

f‐divergence

α‐divergence

Bregman divergence

Page 30: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Applications of Information Geometry

Statistical InferenceMachine Learning and AIComputer VisionConvex ProgrammingSignal Processing (ICA; Sparse)Information Theory, Systems TheoryQuantum Information Geometry

Page 31: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Applications to Statisticscurved exponential family:

( ) ( ) ( )( ){ }, expp x u u uθ ψ θ= ⋅ −x

1

1=

= ∑n

kk

x xn

: estimator

u

ˆ xη =

1, 2( , ) ,... np x u x x x ( , ) exp{ ( )}p x xθ θ ψ θ= ⋅ −

1ˆ( ,..., )nu x x

Page 32: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

x : discrete X = {0, 1, …, n}

0 1

0 0

{ ( ) | }:

( ) ( ) exp[ ( )]

log log ; ( ); ( ) log

( , )

n

n ni

i i ii i

ii i i

S p x x X

p x p x x

p p x x p

p x u

δ θ ψ θ

θ δ ψ θ= =

= ∈

= = −

= − = = −

∑ ∑

exponential family

statistical model :

Page 33: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

High-Order AsymptoticsHigh-Order Asymptotics

( )( )

1

1

, (u) : , ,

u u , ,n

n

p x x x

x x=

L

L

θ

( )( )ˆ ˆ Te E u u u u⎡ ⎤= − −⎣ ⎦

1 22

1 1e G Gn n

= +

11G G−≥ :Cramér-Rao: linear theory

( ) ( ) ( )2 2 2

2e m m

M AG H H= + + Γ

:

u

ˆ xη =

quadratic approximation

Page 34: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Information Geometryof

Belief Propagation

• Shun-ichi Amari (RIKEN BSI)• Shiro Ikeda (Inst. Statist. Math.)• Toshiyuki Tanaka (Kyoto U.)

Page 35: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Stochastic Reasoning

( , , , , )p x y z r s

( , , | , ), , ,... 1, 1p x y z r sx y z = −

Page 36: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Stochastic Reasoningq(x1,x2,x3,…| observation)

X= (x1 x2 x3 …..) x = 1, -1

X= argmax q(x1, x2 ,x3 ,…..) maximum likelihood

Xi = sgn E[xi] least bit error rate estimator

Page 37: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Mean Value Marginalization: projection to independent distributions

0 1 1 2 2 0( ) ( ) ( )... ( ) ( )n nq q x q x q x qΠ = =x x

1, 1( ) ( ..., ) .. ..i i n i nq x q x x dx dx dx= ∫(

0[ ] [ ]q qη = =E x E x

Page 38: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

( ) ( )

( ) ( ){ } ( )

( ) { }

1

1

1

1 2

exp

,

1, 1

e p

,

x

s

ij

L

i j i

i i r qr

r r i i s

i

i

q k x c

c c x x r

q w x x h x

i i

x r i i

ψ

ψ

=

⎧ ⎫= ⋅ + −⎨ ⎬⎩ ⎭

=

= −

=

+

=

= −

L L

x

x x

x

Boltzmann machine, spin glass, neural networksTurbo Codes, LDPC Codes

Page 39: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Computationally DifficultComputationally Difficult

( ) [ ]( ) ( ){ }exp r q

q E

q c

η

ψ

→ =

= −∑x x

x x

mean-field approximation

belief propagation

tree propagation, CCCP (convex-concave)

Page 40: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Information Geometry ofMean Field Approximation

• m-projection• e-projection

D[q:p]=

q = argmin D[q:p]q = argmin D[p:q]

( )( )log( )x

q xq xp x∑

0eΠ

0mΠ

0 { ( )}i i iM p x= Π0( ) ∈p x M

Page 41: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Information GeometryInformation Geometry

( ){ } { }( ) ( ){ }{ }

0 0 0, exp

, exp

ψ

ψ

= = ⋅ −

= = + ⋅ −r r r r r r

M p

M p cξ ξ

x x

x x x

θ θ

1, ,r L= L

( )q x

rM

'rM

0Mθ

( ) exp{ ( )rq x c x φ= − }∑

Page 42: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

0

1 t0

1 1

( , )

( , ) : belief for ( )+

+ +

Π

= Π −

= ∑

tr r

t tr r r r r

t tr

p x

p x c x

ξ

θ ξ ξ

θ θ

Belief PropagationBelief Propagation

( ) ( ){ }: , exp ψ= + ⋅ −r r r r r rM p cξ ξx x x

Page 43: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Belief Prop Algorithm

0M

rM

'rM

'rς

rςΠ

'rς

Page 44: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Equilibrium of BPEquilibrium of BP ( ),∗ ∗rξθ

1) m-condition

( )*0 ,∗ = Π r rp ξθ x

( )-flat submanifold m M θ∗

rM

'rM

0M

rM

'rM

0M

q2) e-condition

*11

∗ ∗=− ∑ r rL

θ ξ

( ) -flat submanifold q e∈x

ξ1( ')ξ θ

Page 45: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

( ) [ ] [ ]1 0 0, , , : :L rF D p q D p pθ ζ ζ = −∑L

critical point

0 : -condition

0 : -conditionr

F e

F m

∂=

∂∂

=∂

θ

ζnot convex

Free energy:Free energy:

Page 46: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Belief Propagatione-condition OK

CCCP m-condition OK

1 2

1 2 1 2

1( ; , , , ' '1

( , , ) ( ' , ' , ' )

, ... ) =−

, ... → ,...

∑L r

L L

Lθ ξ ξ ξ θ ξ

ξ ξ ξ ξ ξ ξ

( ) ( )1 1 1

0 0

'( '), ( '),..., ( ')' , ' : ' , '

= Π = Πr r rp p

θ θξ θ ξ θ ξ θθ ξ ξ θx x

Page 47: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

( )1 1 10

1 1

,

ξ

+ + +

+ +

= Π = Π

= − ∑

t t tr r r

t t tr

p

L

ξ θ θ

θ θ

x

Page 48: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Convex-Concave ComputationalProcedure (CCCP) Yuille

1 21

1 2

( ) ( ) ( )

( ) ( )

θ θ θ

θ θ+

= −

∇ = ∇t t

F F F

F FElimination of double loops

Page 49: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Boltzmann Machine

( ) ( )1i ij j ip x w x hϕ= = −∑

( ) ( ){ }exp ij i j i ip x w x x h x wψ= − −∑ ∑

2x

3x

4x

1x

( )q x

( )p x B

Page 50: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Boltzmann machine---hidden units

• EM algorithm• e-projection• m-projection

D

M

Page 51: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

EM algorithm

hidden variables

( ), ;p x y u

{ }1, , ND = Lx x

( ){ }, ;M p= x y u

( ) ( ) ( ){ },M DD p p p= =x y x x

( )ˆmin , :KL p p M∈⎡ ⎤⎣ ⎦x y m-projection to M

De-projection to( )ˆmin : , ;KL p D p∈⎡ ⎤⎣ ⎦x y u

Page 52: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

SVM : support vector machine

Embedding

Kernel

Conformal change of kernel

( )( ) ( ) ( , )

i i

i i i i i

z xf x w x y K x x

φ

φ α

=

= =∑ ∑

( , ') ( ) ( ')i iK x x x xφ φ= ∑

2

( , ') ( ) ( ') ( , ')( ) exp{ | ( ) | }

ρ ρ

ρ κ

⎯⎯→

= −

K x x x x K x xx f x

Page 53: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Signal ProcessingICA : Independent Component Analysis

t t t tA= →x s x s

sparse component analysis

positive matrix factorization

Page 54: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

mixture and unmixtureof independent signals

2x

1s

ns2smx

1x1

n

i ij jj

x A s=

=

=

x As

Page 55: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Independent Component Analysis

1

i ij jA x A s

W W A−

= =

= =

∑x s

y x

s A W y

x

observations: x(1), x(2), …, x(t)recover: s(1), s(2), …, s(t)

Page 56: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information
Page 57: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Space of Matrices : Lie group

-1d d=X WW

( ) ( )2 1tr trT T T

T

d d d d d

ll

− −= =

∂∇ =

W X X WW W W

W WW

:dX

I I d+ X

Wd+W W

non-holonomic basis

1W −

Page 58: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Information Geometry of ICA

natural gradientestimating functionstability, efficiency

S ={p(y)}

1 1 2 2{ ( ) ( )... ( )}n nI q y q y q y=

{ ( )}p Wx

r q

( ) [ ( ; ) : ( )] ( )

l KL p qr

=W y W yy

Page 59: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Semiparametric Statistical Model

( ; , ) | | ( )p r r=x W W Wxunknown

x(1), x(2), …, x(t)

ir r= Π, ( ) :r s−= 1W A

Page 60: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Natural Gradient

( ), Tlη

∂Δ = −

∂y W

W W WW

Page 61: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Basis Given: overcomplete caseSparse Solution

many solutionsmany 0

ˆ

i i

i

t t

A s

s

A

= =

=

∑x s a

x s

ˆ ˆ ˆ: =A x Assparse

Page 62: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

generalized inverse

ˆmin Σ 2is

sparse solution

ˆmin ii

∑ s

ˆ ˆ ˆ: x =A Assparse

2 :-normL

1 :-normL

Page 63: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information
Page 64: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Overcomplete Basis and Sparse Solution

1

'

min

min

i i

i

p p

s A

s

A α

= =

=

− +

∑∑

x a s

s

s x s

non-linear denoising

Page 65: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Sparse Solution( )

( )

min

penalty : Bayes priorpp iF

ϕ

= ∑

β

β β

( )

( )

( )

( )

0

1 1

22

#1[ 0] :

:

: 0 1

:

i

i

p

i

F

F L

F p

F

β

β

β

= ≠

=

≤ ≤

=

sparsest solution

solution

generalized inverse solution

β

β

β

β

Sparse solution: overcomplete case

Page 66: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Optimizationunder Spasity Condition

( )( )

min : convex functionconstraint F c

ϕ⎧⎪⎨ ≤⎪⎩

ββ

typical case: ( )

( )

21 1 ( *) ( *)2 21 ;

ϕ

β

= − =

= ∑

T

pi

X G

Fp

β β β − β β − β

β

y

2, 1, 1/ 2p p p= = =

Page 67: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

L1-constrained optimizationLASSO

LARS

( ) ( )( )

min under

solution : 0

0c

F c

c c

ϕ∗

∗ ∗

= → ∞

= →

β β

β

β β

( ) ( )( )

min

solution 0

ϕ λ

λ λ∗

∗ ∗

+

= ∞ →

= →

Fβ β

β

β β

( )( )

, ,

: ,

≥* *c λsolutions β and β : coincide λ = λ c p 1

p < 1 λ = λ c multiple noncontinuous stability different

 λP Problem

 cP Problem

Page 68: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Projection from to F = c (information geometry)*β

Page 69: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Convex Cone Programming

P : positive semi-definite matrix

convex potential function

dual geodesic approach

, minA = ⋅x b c x

Support vector machine

Page 70: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

a) : 2, 1cR n p= > b) : 2, 1cR n p= =

c) : 2, 1cR n p= <

Fig. 1

non-convex

Page 71: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

( ) ( ) : ϕ ⎡ ⎤= ⎣ ⎦*min β

orthogonal projection, dual

D β :β , F β = c dual geodesic

projec

projec

tion

tion

η η η∗ ∗ ∗− ∝ ∇ ( )c cF

dual

Page 72: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

n

F= ∇n

Fig. 5 subgradient

η η∗ ∗∝ ∇ ( )c cF

Page 73: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

LASSO path and LARS path(stagewise solution)

( ) ( )

( ) ( )

min :

min

F c

F

ϕ

ϕ λ

=

+

β β

β β

( ) ( ),c λ∗ ∗ ⇔c λ correspondenceβ β

Page 74: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Active set and gradient

( ) { }

( )( ) ( )

( )[ ]

1

0

sgn ,, ,

1,1

i

pi i

p

A i

i AF i A

β

β β − −

= ≠

⎧ ∈⎪

∇ = −∞ ∞ ∉⎨⎪ −⎩

β

β

Page 75: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Solution path( ) ( )

( ) ( ){ } ( )

( )1

0,

;

ϕ λ

λ

λϕ λ

∗ ∗ ∗

∗ ∗

− ∗

∇ + ∇ =

∇ ∇ + ∇ ∇ ⋅ = − ∇

= − ∇ =&

&&

&&c c

A c c A c c

A A c c A A c c A c

c cA

c

cK

F

F F

ddc

F

β β β

β β β

ββ β

β

β

( ) ( )c c cK G Fλ∗ ∗= + ∇∇β β

( )1 1 10; (sgn ) : β∇∇ = ∇ = iF F L

Page 76: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Solution pathin the subspace of the active set

( ) ( )

( )1

0 : active directionλ λ

λ λ

ϕ λ∗ ∗

∗ − ∗

∇ + ∇ = ∇

= − ∇&

A A A

A A

F

K F

β β

β β

′→turning point A A

Page 77: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Gradient Descent Method

{ ( )}: covariant

{ ( )}: contravariant

i

ji

i

L L xx

L g L xx

∂∇ =

∂∂

∇ =∂∑%

2min L(x+a): i jijg a a ε=

1 ( )t t tx x c L x+ = − ∇

Page 78: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Extended LARS (p = 1) and Minkovskian grad

( )

( )

( ) { }

( )

11

norm

max under 1

1

sgn , max , ,

0, otherwise

pip

p

p

i i NA

a

p

ψ ε

ψ ε λ

η η η ηψ

ψ

+

=

+ =

+ −

=

⎧ =⎪∇ = ⎨⎪⎩

= ∇

1L

a

a a

a a

β

β

β

η β

Page 79: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

arg max ii f∗ =

max i i jf f f∗ ∗= =

( ) 1, for and ,0 otherwise.i

i i jF

∗ ∗⎧ =∇ = ⎨

⎩%

1t t Fη+ = − ∇%β β LARS

Page 80: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

F f∇ = ∇%

( )1

1sgn pi iF c f f −∇ =%

( )

0

1sgn

0

0

iF c f ∗

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥

∇ = ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

M

%

M

Euclidean case

1α →

Page 81: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

λ c-trajectory and-trajectory

Ex. 1-dim ( ) ( )21 *2

ϕ β β β= −

( ) ( )21 2 22λ β φ λ β λ β= + = − +f F

L1/2 constraint: non-convex optimization

Page 82: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

( )2: min , cP cβ β β∗− ≤

c cβ =

: 0P fλ λ∇ = 0λβ ββ

∗− + = ( )ˆ : Xu Zongben's operatorλβ β ∗= R

( )c c cλ β ∗= −

0 c β ∗β

β ∗

( )Rλ β ∗

c

λ

Page 83: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

ICCN-Huangshan(黄山)

Sparse Signal AnalysisShun-ichi Ammari (甘利俊一)

RIKEN Brain Science Institute

(Collaborator: Masahiro Yukawa, Niigata University)

Page 84: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Solution Path :

not continuous, not-monotonejump

cλ ↔

cλβ β⇔

Page 85: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

An Example of the greedy path

β1

β2

Page 86: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Linear Programming

( ) ( )inner met

max

lo

ho

g

d

ij j i

i i

ij j ii

A x b

c x

A x bψ

= −

∑∑

∑ ∑x

Page 87: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Convex Programming━ Inner Method

: , 0LP A ≥ ⋅ ≥x b c x

min ⋅c x

( ) ( )log

logij j i

i

A x b

x

ψ = −

+

∑ ∑∑

x

( )iψ= ∂ xη

Simplex method ; inner method

Page 88: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Polynomial-Time Algorithm

curvature : step-size( ) 2mH

( ) ( )min : geodesict tψ ∗⋅ + = ∇ −c x x x δ

Page 89: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Neural Networks

Higher-order correlations

Synchronous firing

Multilayer Perceptron

Page 90: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Multilayer Perceptrons

( )i iy v nϕ= ⋅ +∑ w x

( ) ( )( )

( ) ( )

21; exp ,2

, i i

p y c y f

f v ϕ

⎧ ⎫= − −⎨ ⎬⎩ ⎭

= ⋅∑

x x

x w x

θ θ

θ

x y

1 2( , ,..., )nx x x x=

1 1( ,..., ; ,..., )m mw w v vθ =

Page 91: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Multilayer Perceptron

( )( )

( )1 1,

,

, ; ,i i

m m

y f

v

v v

ϕ

=

= ⋅

=

∑L L

x θ

w x

θ w w

neuromanifold ( )xψ

space of functions

Page 92: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

singularities

Page 93: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Geometry of singular model

( )y v nϕ= ⋅ +w x

v| | 0v =w

Page 94: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Backpropagation ---gradient learningBackpropagation ---gradient learning

( ) ( )

( ) ( )

1 1

2

examples : , , ,1 , log , ;2

t ty y

E y f p y= − = −

L

θ θ

x x

x x

( ) ( ),

t t

i i

E

f v

η

ϕ

∂Δ = −

∂= ⋅∑x w x

θθ

θ

1

natural gradient (Riemannian)

--steepest descentE G E−∇ = ∇%

Page 95: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information
Page 96: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information
Page 97: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

q ‐Fisher information

( )( ) ( )( )

q Fij ij

q

qg p g ph p

=

conformal transformation

11[ ( ): ( )] (1 ( ) ( ) )(1 ) ( )

q qq

q

q divergence

D p x r x p x r x dxq h p

−= −−

Page 98: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Total Bregman Divergence and its Applications to

Shape Retrieval

•Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari, Frank Nielsen

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010

Page 99: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Total Bregman Divergence

[ ] [ ]2

::

1

DTD

f=

+ ∇

x yx y

•rotational invariance

•conformal geometry

Page 100: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Total Bregman divergence (Vemuri)

2( ) ( ) ( ) ( )TBD( : )

1 | ( ) |

p q q p qp qq

ϕ ϕ ϕ

ϕ

− − ∇ ⋅ −=

+ ∇

Page 101: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Clustering : t-center

{ }1, , mE x x= L

[ ]arg min , ii

TD∗ = ∑x x x

E

y

T-center of E

∗x

Page 102: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

t-center

( ) ( )

( ) 2

1

1

i i

i

i

i

w ff

w

wf

∗ ∇∇ =

=+ ∇

∑∑

xx

x

∗x

Page 103: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

q ‐super‐robust estimator (Eguchi)

( ) ( )

( ) ( ) ( ){ }

( )

( ) ( )

1

1

1 1

0 1

,ˆmax , max

bias-corrected -estimating function

ˆ, , log

1 log1

1 ˆ, 0 max ,

q

q i q

q q

N

q i ii q

p xp x

h

q

s x p x p c

c hq

s x p xh

ξξ

ξ ξ ξ

ξ

ξ ξ

+

+

+ +

= +

= ∂ −

= ∂+

= ⇔∑ ∑

Page 104: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Conformal change of divergence

( ) ( ) [ ]: :D p q p D p qσ=%

( )ij ijg p gσ=%

( )

logijk ijk k ij j ik i jk

i i

T T s g s g s g

s

σ

σ

= + + +

= ∂

%

Page 105: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

t-center is robust{ }

( )

1, , ;

1; ,

nE

nε ε

∗ ∗ ∗

=

= + =

L

%

x x y

x x z x y

( )influence fun ;ction ∗z x y

robust as : c< → ∞z y

Page 106: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

( ) ( ) ( )( )

( ) ( )

1,

1i i

f fG

w

G w fn

∗∗ −

∇ − ∇=

= ∇∇∑

y xz x y

y

x x

( )( )

( )( )

( )

21

1

f fw f

w

∇ ∇= < ∞

+ ∇

>

y yy y

y

Robust: is boundedz

21Euclidean case 2

f = x

( )

( )

1

2,

1

,

G∗ −

=+

=

yz x yy

z x y y

Page 107: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

MPEG7 database• Great intraclass variability, and small

interclass dissimilarity.

Page 108: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Other TBD applicationsDiffusion tensor imaging (DTI) analysis

[Vemuri]• Interpolation• Segmentation

Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari and Frank Nielsen, Total Bregman Divergence and its Applications to DTI Analysis, IEEE TMI, to appear

Page 109: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

TBD application-shape retrieval

• Using MPEG7 database;• 70 classes, with 20 shapes each class

(Meizhu Liu)

Page 110: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Multiterminal Information & Statistical InferenceMultiterminal Information & Statistical Inference

: nX x

: nY y

XR

YRθ

( ), ;

2 2X YR n R nX r

p x y

M M

θ

= =

Page 111: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

marginal

correlation

0-rate Slepian-WolfM CG G G= +

p(x,y)

p(x,y,z)

Page 112: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Linear SystemsLinear Systems

ARMAARMA

( )( )

11

1 11

1 1

11

11

, , : , ,

,

pq

t tpp

p q

t t

b z b zx u

a z a z

a a b b

f z u

θ

− −

+ − −

−+

+ + +=

+ + +

=

=

L

L

L L

x θ,

tu tx

AR---e-flat

MA---m-flat

Page 113: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Machine LearningBoosting : combination of weak learners

( ) ( ) ( ){ }1 1 2 2, , , , , ,N ND y y y= Lx x x

1iy = ± −

( ) ( ) ( ), : , sgn ,f y h f= =x u x u x u

Page 114: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Weak Learners

( ) ( )( )sgn t tH hα= ∑x x

( ){ }: Prob t t i i th y Wε ≠x

( ) ( ) ( ){ }1 expt t t i t iW i cW i y h xα+ = −

weight distribution

Page 115: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Boosting ━generalization

( ) ( ) ( ){ }{ }1 expt t t t tQ Q y x Q y x yh x fα−= = − %

( ) ( ){ }, constt tF P y x Eyh x= =

: min :t tD P Qα ⎡ ⎤⎣ ⎦%

( ) ( )1, ,t tD P Q D P Q+ <% %

Page 116: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Integration of evidences:

arithmetic meangeometric meanharmonic mean

-meanα

1 2, ,... mx x x

Page 117: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Various Means

2a b+

ab2

1 1a b

+:

arithmetic geometric

Any other mean?

:

harmonic

Page 118: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Generalized mean: f-meanf(u): monotone; f-representation of u

1 ( ) ( )( , ) { }2f

f a f bm a b f − +=

12( ) , 1

log ,

( , )

1

( ,

)

=

f fm ca cb cm a

f u u

b

u

α

α αα

= ≠

=scale free

α-representation

Page 119: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

-mean : α

2

1 :

1: 2

10 : ( )4 2

min( , ) max( , )

aba b

a ba b ab

m a bm a b

α

α

α

α

α

αα

=+

= −

+= + = +

= ∞ == −∞ =

1 2( ( ), ( ))m p s p sα

Page 120: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information

Family of Distributions{ }1( ), , ( )kp s p sL

1

( ) ( ), 1k

mix i i ii

p s t p s t=

= =∑ ∑

explog ( ) log ( )i ip s t p s ψ= −∑

α −

mixture family :

exponential family :

1( ; ) { ( ( ))}i ip x f f p xα αθ θ−= ∑