Exponential family by representation theory · Motivation G=H-method Examples and related work...
Transcript of Exponential family by representation theory · Motivation G=H-method Examples and related work...
MotivationG/H-method
Examples and related work
Exponential family by representation theory
Koichi Tojo1, joint work with Taro Yoshino2
1RIKEN Center for Advanced Intelligence Project, Tokyo, Japan,
2Graduate School of Mathematical Science, The University of Tokyo
July 29, 2020
1 / 36
MotivationG/H-method
Examples and related work
Demonstration
Our aim
One of our aims is to suggest “good” exponential families onimportant spaces.
We proposed a method to construct exponential families by usingrepresentation theory in [TY18].First, we demonstrate a family of distributions on upper half planewith Poincare metric obtained by using our method.
[TY18]: K. Tojo, T. Yoshino, A method to construct exponentialfamilies by representation theory, arXiv:1811.01394v3.
2 / 36
MotivationG/H-method
Examples and related work
Contents
1 MotivationExponential familyBackground
2 G/H-methodMethod to construct familiesG -invariance of our familyClassification of G -invariant families
3 Examples and related workSphereRelated workHyperbolic space
3 / 36
MotivationG/H-method
Examples and related work
Exponential familyBackground
A family of probability measures and machine learning
Learning by using a family of probability measures is one ofimportant methods in the field of machine learning.
Learning=to optimize the parameters in the family ofprobability measures
Families of probability measures
Exponential families
4 / 36
MotivationG/H-method
Examples and related work
Exponential familyBackground
Exponential family
Exponential family
Exponential families are important subject in the field ofinformation geometry.
Exponential families are useful for Bayesian inference.
Exponential families include many widely used families.
Families of probability measures
Exponential families
5 / 36
MotivationG/H-method
Examples and related work
Exponential familyBackground
Examples (exponential families)
Table: Examples of exponential families
distributions sample sp. X
Normal RMultivariate normal Rn
Bernoulli ±1Categorical 1, · · · , nGamma R>0
Inverse gamma R>0
Wishart Sym+(n,R)Von Mises S1
GeneralizedInverse Gauss.
R>0
6 / 36
MotivationG/H-method
Examples and related work
Exponential familyBackground
Exponential family
X : manifold, R(X ): the set of all Radon measures on X .
Definition 1.1 (exponential family).
∅ = P ⊂ R(X ) is an exponential family on X if there exists atriple (µ,V ,T ) such that
1 µ ∈ R(X ),
2 V is a finite dimensional vector space over R,3 T : X → V , x 7→ T (x) is a continuous map,
4 For any p ∈ P, there exists θ ∈ V ∨ such that
dp(x) = exp(−⟨θ,T (x)⟩ − φ(θ))dµ(x),
where φ(θ) = log∫x∈X exp(−⟨θ,T (x)⟩)dµ(x) (log
normalizer).
We call the triple (µ,V ,T ) a realization of P.
7 / 36
MotivationG/H-method
Examples and related work
Exponential familyBackground
Example: a family of normal distributions
Example 1.2.
The following family of normal distributions is an exponentialfamily:
1√2πσ2
exp
(−(x −m)2
2σ2
)dx
(σ,m)∈R>0×R
1 µ =Lebesgue measure,
2 V = R2,
3 T : X = R → R2, x 7→(x2
x
).
8 / 36
MotivationG/H-method
Examples and related work
Exponential familyBackground
Remark on exponential family
We can make too many exponential families.In fact, for given
1 manifold X with a measure µ,
2 finite dimensional real vector space V ,
3 T : X → V continuous,
we obtain an exponential family P := pθθ∈Θ on X :
dpθ(x) := exp(−⟨θ,T (x)⟩)dµ(x),
θ ∈ Θ := θ ∈ V ∨ |∫Xdpθ < ∞,
φ(θ) := log
∫Xdpθ (θ ∈ Θ),
pθ := e−φ(θ)pθ.
9 / 36
MotivationG/H-method
Examples and related work
Exponential familyBackground
Background
Remark 1.3.
By definition, there are too many exponential families.
Only a small part of them are widely used.
Widely used exp. families
Normal dists
Gamma dists
von Mises dists
Exponential families
Poisson dists
Bernoulli dists
10 / 36
MotivationG/H-method
Examples and related work
Exponential familyBackground
Background
Remark 1.3.
By definition, there are too many exponential families.
Only a small part of them are widely used.
Widely used exp. families
Exponential families enlarge
Normal distsGamma dists
von Mises distsBernoulli dists
Poisson dists
Normal distsGamma dists
von Mises dists
Bernoulli dists
Poisson dists
10 / 36
MotivationG/H-method
Examples and related work
Exponential familyBackground
Motivation
We can expect there exist “good” exponential families.We want a framework to understand “good” exponential familiessystematically.
Poisson dists
Widely used exp. families
Gauss dists
Gamma dists von Mises dists
Exponential families
Bernoulli dists
good↓
11 / 36
MotivationG/H-method
Examples and related work
Exponential familyBackground
Observation 1.4.
Useful exp. families have the same symmetry as the sample spaces.
Sample space : homogeneous space G/H
Family : invariant under the induced G -action
Idea: use representation theory (theory of symmetry)
”X = G/H” is description focusing on symmetry of the space X .
Poisson dists
Widely used exp. families
Gauss dists
Gamma dists von Mises dists
Exponential families
Bernoulli dists
12 / 36
MotivationG/H-method
Examples and related work
Exponential familyBackground
Our method (G/H-method)
We proposed a method to construct exponential families.
The method generate many well-known families.
Families obtained by the method can be classified.
Poisson dist
Widely used exp. families
Normal distsGamma dists
von Mises dists
Exponential families
Our method
Bernoulli dists
13 / 36
MotivationG/H-method
Examples and related work
Exponential familyBackground
Our method (G/H-method)
We proposed a method to construct exponential families.
The method generate many well-known families.
Families obtained by the method can be classified.
Widely used exponential families
Normal distsGamma dists
von Mises dists
Exponential families G/H-method
Poisson dists
Bernoulli dists
New distributions are
expected to be “good”
13 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
G/H-method: overview
G/H-method = a method to construct a family ofprobability measures on G/H from
a finite dim. real representation ρ : G → GL(V ),
a nonzero H-fixed vector v0.
Step 1 Take Lie groups G ⊃ H and put X := G/H (sample space)
Step 2 Inputs : finite dim. real rep. (ρ,V ) and H-fixed vector
Step 3 Consider the set Ω(G ,H) of all relatively G -invariantmeasures on X
Step 4 Make measures parameterized by V ∨ × Ω(G ,H) on X
Step 5 Normalize them.
14 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
Examples obtained by our method
Table: Examples and inputs (G ,H ,V , v0) for them
distributions sample sp. X G H V v0Normal R R× ⋉R R× Sym(2,R) E22
Multi. normal Rn GL(n,R)⋉ RnGL(n,R)Sym(n + 1,R)En+1,n+1
Bernoulli ±1 ±1 1 Rsgn 1Categorical 1, · · · , n Sn Sn−1 W wGamma R>0 R>0 1 R 1
Inverse gamma R>0 R>0 1 R−1 1Wishart Sym+(n,R) GL(n,R) O(n) Sym(n,R) In
Von Mises S1 SO(2) I2 R2 e1GeneralizedInv. Gauss.
R>0 R>0 1 R2 12
(11
)Here W = (x1, · · · , xn) ∈ Rn|
∑ni=1 xi = 0,
w = (−(n − 1), 1, · · · , 1) ∈ W .
Interpretation
15 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
G/H-method Step 1: sample space
Representation theory: Take asymmetry G of the sample space.
Setting 2.1.
G : a Lie group,H: a closed subgroup of G ,X := G/H: homogeneous space.
We construct a familyP := pθθ∈Θ of probabilitymeasures on the homogeneousspace X .
Normal distributions on R:R has symmetry of scaling andtranslation.
Setting 2.2.
G = R× ⋉R, (scaling, translation)H = R×,X = G/H ≃ R.
We identify G/H with R by
G/H ∋ (t, x)H 7→ x ∈ R
0 R
x
We construct1√2πσ2
exp(− (x−m)2
2σ2
)dx(σ,m)∈R>0×R
.
16 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
G/H-method Step 2: inputs
Representation theory:
inputs
1 V : a finite dimensional realvector space.
2 ρ : G → GL(V ) is arepresentation.
3 H-fixed vector v0 ∈ VH
Normal distributions:
inputs
1 V := Sym(2,R)2 ρ : R×⋉R → GL(Sym(2,R))
ρ(t, x)S :=
(t x
1
)S
(tx 1
)3 v0 :=
(0 00 1
)∈ VH
17 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
G/H-method Step 3: relatively G -invariant measures
We consider measures compatible the the symmetry.
Definition 2.3.
A measure µ ∈ R(X ) is relatively G -invariantdef⇐⇒ ∃χ : G → R>0 continuous map s.t. dµ(gx) = χ(g)dµ(x).
Representation theory:We putΩ(G ,H) := µ ∈ R(X ) | µ isrelatively G -invariant /R>0
Normal distributions:Ω(G ,H) = dx. Here dx is theLebesgue measure, which isrelatively G -invariant.
18 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
G/H-method Step 4: measure pθ on X
Representation theory:Define mesures on Xparameterized by(ξ, µ) ∈ V ∨ × Ω(G ,H) by
dpθ(x) = dpξ,µ(x)
:= exp(−⟨ξ, xv0⟩)dµ(x)
Normal distributions:By taking an inner product onV = Sym(2,R), we obtaindpθ(x)= exp(−(θ1x
2 + 2θ2x + θ3))dx .
19 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
G/H-method Step 5: normalizing the measure pθ
Representation theory:Θ := θ = (ξ, µ) ∈V ∨ × Ω(G ,H) |
∫X dpθ < ∞
pθ := e−φ(θ)pθ,φ(θ) := log
∫X dpθ (log
normalizer)We obtain a family of probabilitymeasures pθθ∈Θ.
Normal distributions:
Θ = θ =
(θ1 θ2θ2 θ3
)∈
Sym(2,R) | θ1 > 0dpθ =
√θ1π exp(−θ1(x + θ2
θ1)2)dx
φ(θ) =θ22−θ1θ3
θ1+ 1
2 logπθ1
20 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
G/H-method Step 5: Normal distributions
We obtain a family of probability measures on R√θ1π
exp
(−θ1
(x +
θ2θ1
)2)dx
(θ1,θ2)∈R>0×R
.
By a change of variables
m = −θ2θ1
, σ =1√2θ1
,
we obtain the family of normal distributions1√2πσ2
exp
(−(x −m)2
2σ2
)dx
(σ,m)∈R>0×R
.
21 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
P is a G -invariant exponential family
Theorem 2.4.
Any family obtained by our method is a G-invariant exponentialfamily on G/H.
normal dist normal dist
−→
R× ⋉R-action(scalingandtranslation)
We obtain a family with the symmetry of G/H !22 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
Question
Conversely,
Question 2.5.
Are any G -invariant exponential families on G/H obtained by ourmethod?
Yes, under a mild assumption. Roughly speaking,
G -invariant exponential family on G/H“=”families obtained by G/H-method
23 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
Answer to the question
Setting 2.6.
P := pθθ∈Θ is a G -invariant exponential family on G/H. Here Θis the parameter space.
Theorem 2.7.
Assume
1 G/H admits a nonzero relatively G-invariant measure,
2 Θ is open.
Then, P is a subfamily of a certain family obtained byG/H-method.
For the details, see our paper that will appear in Informationgeometry, Affine Differential Geometry and Hesse Geometry: ATribute and Memorial to Jean-Louis Koszul.
24 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
Classification problem of G -invariant exponential families
Let us consider an important homogeneous space G/H such as asphere and a hyperbolic space, more generally symmetric spaces.
Aim
One of our aims is to classify “good” exponential families on G/H.
Problem 2.8.
Classify G-invariant exponential families on G/H.
By Theorem 2.7, this problem above is reduced to the following:
Question 2.9.
Classify families obtained by G/H-method on G/H.
25 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
First step for the classification
Question 2.10.
When do distinct pairs (V , v0) and (V ′, v ′0) generate the samefamily?
equivalence relation on the pairs of a rep. and an H-fixedvector.
Proposition 2.11.
Equivalent elements in V(G ,H) generate the same family by ourmethod.
26 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
Definition 2.12.
V(G ,H) := (V , v0) | V is a fin. dim. real rep. of G ,
v0 ∈ VH is cyclic
Here v0 ∈ V is said to be cyclic if
spang · v0 | g ∈ G = V .
Definition 2.13.
(V , v0) ∼ (V ′, v ′0)def⇐⇒ ∃φ : V → V ′ G -linear isomorphism
satisfying φ(v0) = v ′0.
For the details, see “On a method to construct exponential familiesby representation theory” Geometric Science of Information.GSI2019, Lecture Notes in Computer Science, vol 11712, 147–156(2019)
27 / 36
MotivationG/H-method
Examples and related work
Method to construct familiesG -invariance of our familyClassification of G -invariant families
Second step for the classification
Question 2.14.
Determine the equivalence classes V(G ,H) := V(G ,H)/ ∼.
We are writing a paper about this question.
28 / 36
MotivationG/H-method
Examples and related work
SphereRelated workHyperbolic space
Message
Aim
We want to suggest new useful families of distributions forapplications by using G/H-method.
Take-home message
If you want a good family of distributions on some spaces,please tell us the spaces! We will try to propose good families ofdistributions on them.
29 / 36
MotivationG/H-method
Examples and related work
SphereRelated workHyperbolic space
Examples
1 Sphere Sn:
von Mises–Fisher distributions,Fisher–Bingham distributions.
2 Upper half plane H3 Hyperbolic space Hn:
hyperboloid distributions.
The distributions having names above were obtained heuristically.
Remark 3.1.
Machine learning using hyperbolic space Hn has been very active.Good exponential families on Hn has been desired.
Hn := x ∈ Rn+1 | x21 + x22 + · · ·+ x2n − x2n+1 = −1.
30 / 36
MotivationG/H-method
Examples and related work
SphereRelated workHyperbolic space
Example 1: Sphere
n-dimensional sphere:
Sn := x ∈ Rn+1 | x21 + · · ·+ x2n+1 = 1.
G = SO(n + 1), H = SO(n), X := G/H ≃ Sn.Low dimensional representations:
ρ : SO(n + 1) → GL(Rn+1) natural representation von Mises–Fisher distributions
ρ : SO(n + 1) → GL(Rn+1 ⊕ Sym(n + 1,R))ρ(g)(v , S) = (gv , gS tg) ((v ,S) ∈ Rn+1 × Sym(n + 1,R)). Fisher–Bingham distributions
Higher dimensional representations:We obtain families on Sn by G/H-method, which are not sowell-known now, but appeared in the research by T. S. Cohen andM. Welling [CW15].
31 / 36
MotivationG/H-method
Examples and related work
SphereRelated workHyperbolic space
Related work: [CW15]
[CW15] T. S. Cohen, M. Welling, “Harmonic exponential familieson manifolds” In Proceedings of the 32nd International Conferenceon Machine Learning(ICML), volume 37 (2015), 1757–1765
1 suggest “harmonic exponential families” on compact group orcpt homog. sp. such as S1 ≃ SO(2) and S2 ≃ SO(3)/SO(2).
2 use Fast Fourier Transform (FFT) on S1, S2 to performgradient-based optimization of log-likelihood for MLE.
3 apply them to the problem of modelling the spatial distributionof significant earthquakes on the surface of the earth.
32 / 36
MotivationG/H-method
Examples and related work
SphereRelated workHyperbolic space
Relation between G/H-method and [CW15]
The method in [CW15]
[CW15] give a method to construct exponential families on G/Hby using
unitary representation of compact Lie group G ,
G -invariant measure on G/H.
Remark 3.2.
The method in [CW15] is the case where G is compact ofG/H-method.
Fact 3.3.
In the case where G is compact, relatively G-invariant measure isautomatically G-invariant measure.
33 / 36
MotivationG/H-method
Examples and related work
SphereRelated workHyperbolic space
Example 2: Upper half plane
Upper half plane H := z = x + iy ∈ C | y > 0 admits the linearfractional transformation of SL(2,R).G = SL(2,R), H = SO(2), X := G/H ≃ H.
geodesicsLow dimensional representation:ρ : SL(2,R) → GL(Sym(2,R)),ρ(g)S = gS tg (S ∈ Sym(2,R)).
De2D
πexp
(−a(x2 + y2) + 2bx + c
y
)dxdy
y2
(a bb c
)∈Sym+(2,R)
Here D =√ac − b2.
Higher dimensional cases:We obtain new families by G/H-method.
This is a special case of hyperbolic space.
34 / 36
MotivationG/H-method
Examples and related work
SphereRelated workHyperbolic space
Example 3: Hyperbolic space
n-dim hyperbolic space:
Hn := x ∈ Rn+1 | x21 + · · ·+ x2n − x2n+1 = −1
G = SO0(n, 1), H = SO(n), X := G/H ≃ Hn
Low dimensional representation:ρ : G → GL(Rn+1) natural representation Hyperboloid distributions(O. E. Barndorff-Nielsen [BN78], J. L. Jensen [J81])
Higher dimensional cases:We obtain new families by G/H-method.
35 / 36
MotivationG/H-method
Examples and related work
SphereRelated workHyperbolic space
References
[BN78] O. E. Barndorff-Nielsen, Hyperbolic distributions anddistribution on hyperbolae, Scand. J. Stat. 8 (1978), 151–157.
[CW15] T. S. Cohen, M. Welling, Harmonic exponential families onmanifolds, In Proc. of the 32nd Inter. Conf. on MachineLearning(ICML), volume 37 (2015), 1757–1765
[J81] J. L. Jensen, On the hyperboloid distribution, Scand. J.Statist. 8 (1981), 193–206.
[TY18] K. Tojo, T. Yoshino, A method to construct exponentialfamilies by representation theory, arXiv:1811.01394v3.
[TY19] K. Tojo, T. Yoshino, On a method to construct exponentialfamilies by representation theory, GSI2019, Lecture Notes inComputer Science, vol 11712, 147–156 (2019).
[TY20] K. Tojo, T. Yoshino, Harmonic exponential families onhomogeneous spaces, Information geometry, Springer, toappear.
36 / 36