Padhraic Smyth Department of Information and Computer Science University of California, Irvine
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine [email protected]...
-
date post
19-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of 1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine [email protected]...
1
Graphical Models in Data Assimilation Problems
Alexander Ihler
UC Irvine
Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth
2
Outline• Graphical models
– Convenient description of structure among random variables
• Use this structure to– Organize inference computations
• Finding optimal (ML, etc.) estimates• Calculate data likelihood• Simulation / drawing samples
– Suggest sub-optimal (approximate) inference computations• e.g. when optimal computations too expensive
• Some examples from data assimilation– Markov chains, Kalman filtering– Rainfall models
• Mixtures of trees• Loopy graphs
– Image analysis (de-noising, smoothing, etc.)
3
set of nodes
set of edges connecting nodes
Nodes are associated with random variables
An undirected graph is defined by
Graph Separation
Conditional Independence
Graphical Models
4
Graphical Models:
Factorization• Sufficient condition
– Distribution factors into product of “potential functions” defined on cliques of G
– Condition also necessary if distribution strictly positive
• Examples
5
Graphical Models:
Inference• Many possible inference goals
– Given a few observed RVs, compute:• Marginal distributions• Joint, Maximum a-posteriori (MAP) values• Data likelihood of observed variables• Samples from posterior
• Use graph structure to do computations efficiently– Example: compute posterior marginal p(x2 | x5=X5)
6
Combine the observations from all nodes in the graph through a series of local message-passing operations
neighborhood of node s (adjacent nodes)
message sent from node t to node s
(“sufficient statistic” of t’s knowledge about s)
Finding marginals via Belief Propagation(aka sum-product; other goals have similar algorithms)
7
II. Message Propagation: Transform distribution from node t to node s using the pairwise interaction potential
Integrate over to form distribution summarizing node t’s knowledge about
BP Message Updates
I. Message Product: Multiply incoming messages (from all nodes but s) with the local observation to form a distribution over
8
Example: sequential estimation• Well-known example
– Markov Chain– Jointly Gaussian uncertainty
• Gives integrals a simple, closed form
– Optimal inference (in many senses) given by Kalman filter– Convert large (T) problem to collection of smaller problems
– “exact” non-Gaussian: particle & ensemble filtering & extensions– Same general results hold for any tree-structured graph
• Partial elimination ordering of nodes
– Complexity limited by dimension ofeach variable
9
Exact estimation in non-trees• Often our variables aren’t so well-behaved
– May be able to convert using variable augmentation
• Often the case in Bayesian parameter estimation– Treat parameters as variables, include them in the graph– (increases nonlinearities!)
• But, dimensionality problem– Computation increases (maybe a lot!)
• Jointly Gaussian, d3
• Otherwise often exponential in d
– Can trade off graph complexity with dimensionality…
10
Example: rainfall data• 41 stations in India• Rainfall occurrence &
amounts for ~30 years• Some stations/days missing
• Tasks– Impute missing entries
– Simulate realistic rainfall
– Short term predictions
– …
• Can’t deal with joint distribution – too large to even manipulate• Conditional independence structure?
– Unlikely to be tree-structured
11
Example: rainfall data• “True” relationships
– not tree-like at all
– High tree-width
• Need some approximations– Approximate model,
exact inference
– Correct model,
approximate inference
• Even harder:– May get multiple observation
modalities (satellite data, etc.)
– Have own statistical structure
& relationships to stations
12
Example: rainfall data• Consider a single time-slice• Option 1: mixtures of trees
– Add “hidden” variable indicating which of several trees
– (Generally) marginalize over this variable
• Option 2: use loopy graph, ignore loops in inference– Utility depends on task:
– Works well for filling in missing data
– Perhaps less well for other tasks
+ + +
13
Multi-scale models• Another example of graph structure• Efficient computation if tree-structured
• Again, don’t really believe any particular tree– Perhaps average over (use mixture of) several
• (see e.g. Willsky 2002)• (also w/ loops,
similar to multi-grid)
14
Summary• Explicit structure among variables
– Prior knowledge / learned from data– Structure organizes computation, suggests approximations– Can provide computational efficiency– (often naïve distribution too large to represent / estimate)
• Offers some choice– Where to put the complexity?– Simple graph structure with high-dimensional variables– Complex graph structure with more manageable variables
• Approximate structure, exact computations
• Improved structures, approximate computations