Download - 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

1

Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models

Guillaume Bouchard

Xerox Research Centre Europe

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 2

Deterministic Inference in Hybrid Graphical Models

Discrete variables with continuous* parents No sufficient statistic No conjugate distribution

Intractable inference Approximate deterministic inference

Local sampling Deterministic approximations

Gaussian quadrature delta method Laplace approximation Maximize a lower bound to the

variational free energy

X1X1 X2

X2 X3X3 X4

X4

Y1Y1

X5X5 X5

X5 Y2Y2

Y3Y3

X0X0

Discrete variable

Continuous variableObserved variable

Hidden variable *or a large number of discrete parents


Variational inference

Focus on Bayesian multinomial logistic regression

Mean field approximation

Q belongs to an approximation family

Discrete variable

Continuous variable

Observed variable

Hidden variable

X1iX1i X2i

X2i β 1β 1 β2

β2

YiYi

Data i

upper bound?

upper bound?

max


Bounding the log-partition function (1)

Binary case dimension: classical bound [Jordan and Jaakkola]

We propose its multiclass extension


Bounding the log-partition function (2)

K=2

K=10

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.5

1

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

25

30

35

40

45

50

x

log k ex

k

worse curvatureoptimal tightoptimal average (=2)

optimal average (=1)

optimal average (=0.1)


Other upper bounds

Concavity of the log [e.g. Blei et al.]

Worst curvature [Bohning]

Bound using hyperbolic cosines [Jebara]

Local approximation [Gibbs]not proved to be an upper bound


ProofIdea: Expand the product of inverted sigmoids

Upper-bounded by K quadratic upper bounds

Lower bounded by a linear function (log-convexity of f)

Proof: apply Jensen inequality to


Bounds on the Expectation

Exponential bound

Quadratic bound

simulations


Bayesian multinomial logistic regression

Exponential bound

Cannot be maximized in closed form gradient-based optimization Fixed point equation (unstable !)

Quadratic bound

Analytic update:


Numerical experiments

Iris dataset 4 dimensions 3 classes Prior: unit variance

Experiment Learning: Batch updates Compared to MCMC

estimation based on 100K samples

Error = Euclidian distance between the mean and variance parameters

ResultsThe “worse curvature” bound is more faster and better…

0 10 20 30 40 50 60 70 80 90 1001.5

2

2.5

3

3.5

4

4.5

5

5.5

6

number of iterations

err

or

worse curvaturesigmoid product bound

0 20 40 60 80 100 1200

2

4

6

8

10

12

14

16x 10

5


Var

iatio

na

l Fre

e E

nerg

y



Conclusion

Multinomial links in graphical models are feasible Existing bound work well We can expect further improvements Remark

better bounds are only needed for the Bayesian setting

For MAP estimation, even a loose bound converge Future work

Application to discriminative learning Mixture-based mean-field approximation


Backup slides








0 10 20 30 40 50 60 70 80 90 1001.5

2

2.5

3

3.5

4

4.5

5

5.5

6


err

or


0 20 40 60 80 100 1200

2

4

6

8

10

12

14

16x 10

5


Var

iatio

na

l Fre

e E

nerg

y









0 20 40 60 80 1000.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

0.022


err

or



-5 -4 -3 -2 -1 0 1 2 3 4 50

0.5

1

-5 -4 -3 -2 -1 0 1 2 3 4 50

10

20

30

40

50

60

70

x

K=3

log k ex

k




-5 -4 -3 -2 -1 0 1 2 3 4 50

0.5

1

-5 -4 -3 -2 -1 0 1 2 3 4 50

10

20

30

40

50

60

70

x

K=3

log k ex

k




-5 -4 -3 -2 -1 0 1 2 3 4 50

0.5

1

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

25

30

35

40

45

50

x

log k ex

k




-5 -4 -3 -2 -1 0 1 2 3 4 50

0.5

1

-5 -4 -3 -2 -1 0 1 2 3 4 50

20

40

60

80

100

120

140

x

K=100

log k ex

k





Jebara’s bound

One dimension: Hyperbolic cosine bound

Multi-dimensional case