Graphical Models

Graphical ModelsA Brief Introduction

Reference: Pattern Recognition and Machine Learningby C.M. Bishop, SpringerChapter 8.2

https://www.microsoft.com/en‐us/research/wp‐content/uploads/2016/05/Bishop‐PRML‐sample.pdf

ProbabilisticModel

Real WorldData

P(Data | Parameters)

ProbabilisticModel

Real WorldData

P(Parameters | Data)

ProbabilisticModel

Real WorldData

Generative Model, Probability

Inference, Statistics

Notation and Definitions• X is a random variable

– Lower‐case x is some possible value for X– “X = x” is a logical proposition: that X takes value x– There is uncertainty about the value of X

• e.g., X is the Hang Seng index at 5pm tomorrow

• p(X = x) is the probability that proposition X=x is true– often shortened to p(x)

• If the set of possible x’s is finite, we have a probability distribution and p(x) = 1

• If the set of possible x’s is infinite, p(x) is a density function, and p(x) integrates to 1 over the range of X

Multiple Variables• p(x, y, z)

– Probability that X=x AND Y=y AND Z =z– Possible values: cross‐product of X Y Z

– e.g., X, Y, Z each take 10 possible values• x,y,z can take 103 possible values• p(x,y,z) is a 3‐dimensional array/table

– Defines 103 probabilities• Note the exponential increase as we add more variables

– e.g., X, Y, Z are all real‐valued• x,y,z live in a 3‐dimensional vector space• p(x,y,z) is a positive function defined over this space, integrates to 1

Conditional Probability• p(x | y, z)

– Probability of x given that Y=y and Z = z– Could be

• hypothetical, e.g., “if Y=y and if Z = z”• observational, e.g., we observed values y and z

– can also have p(x, y | z), etc– “all probabilities are conditional probabilities”

• Computing conditional probabilities is the basis of many prediction and learning problems, e.g.,– p(DJI tomorrow | DJI index last week)– expected value of [DJI tomorrow | DJI index next week)– most likely value of parameter given observed data

Computing Conditional Probabilities• Variables A, B, C, D

– All distributions of interest related to A,B,C,D can be computed from the full joint distribution p(a,b,c,d)

• Examples, using the Law of Total Probability– p(a) = {b,c,d} p(a, b, c, d) – p(c,d) = {a,b} p(a, b, c, d)– p(a,c | d) = {b} p(a, b, c | d)

where p(a, b, c | d) = p(a,b,c,d)/p(d)• These are standard probability manipulations: however, we

will see how to use these to make inferences about parameters and unobserved variables, given data

Two Practical Problems

(Assume for simplicity each variable takes K values) • Problem 1: Computational Complexity

– Conditional probability computations scale as O(KN) • where N is the number of variables being summed over

• Problem 2: Model Specification– To specify a joint distribution we need a table of O(KN) numbers

– Where do these numbers come from?

Two Key Ideas

• Problem 1: Computational Complexity– Idea: Graphical models

• Structured probability models lead to tractable inference

• Problem 2: Model Specification– Idea: Probabilistic learning

• General principles for learning from data

Conditional Independence• A is conditionally independent of B given C iff

p(a | b, c) = p(a | c)(also implies that B is conditionally independent of A given C)

• In words, B provides no information about A, if value of C is known

• Example:– a = “reading ability”– b = “height”– c = “age”

• Note that conditional independence does not imply marginal independence

Graphical Models

• Represent dependency structure with a directed graph– Node <‐> random variable– Edges encode dependencies

• Absence of edge ‐> conditional independence– Directed and undirected versions

• Why is this useful?– A language for communication– A language for computation

Examples of 3‐way Graphical Models

A CB Marginal Independence:p(A,B,C) = p(A) p(B) p(C)

Conditionally independent effects:p(A,B,C) = p(B|A)p(C|A)p(A)

B and C are conditionally independentGiven A

e.g., A is a disease, and we model B and C as conditionally independentsymptoms given A

Independent Causes:p(A,B,C) = p(C|A,B)p(A)p(B)

A CB Markov dependence:p(A,B,C) = p(C|B) p(B|A)p(A)

Directed Graphical Models

p(A,B,C) = p(C|A,B)p(A)p(B)

In general,p(X1, X2,....XN) = p(Xi | parents(Xi ) )

• Probability model has simple factored form

• Directed edges => direct dependence

• Absence of an edge => conditional independence

• Also known as belief networks, Bayesian networks, causal networks

In general,p(X1, X2,....XN) = p(Xi | parents(Xi ) )

Reminders from Probability….

• Law of Total ProbabilityP(a) = b P(a, b) = b P(a | b) P(b)

– Conditional version:P(a|c) = b P(a, b|c) = b P(a | b , c) P(b|c)

• Factorization or Chain Rule– P(a, b, c, d) = P(a | b, c, d) P(b | c, d) P(c | d) P (d), or

= P(b | a, c, d) P(c | a, d) P(d | a) P(a), or= …..

Probability Calculations on Graphs• General algorithms exist ‐ beyond trees

– Complexity is typically O(m (number of parents ) )(where m = arity of each node)

– If single parents (e.g., tree), ‐> O(m)– The sparser the graph the lower the complexity

• Technique can be “automated”– i.e., a fully general algorithm for arbitrary graphs– For continuous variables:

• replace sum with integral– For identification of most likely values

• Replace sum with max operator

ProbabilisticModel

Real WorldData

Generative Model, Probability

Inference, Statistics

The Likelihood Function• Likelihood = p(data | parameters)

= p( D | ) = L ()

• Likelihood tells us how likely the observed data are conditioned on a particular setting of the parameters

• Details– Constants that do not involve can be dropped in defining L ()

– Often easier to work with log L ()

Comments on the Likelihood Function

• Constructing a likelihood function L () is the first step in probabilistic modeling

• The likelihood function implicitly assumes an underlying probabilistic model M with parameters

• L () connects the model to the observed data

• Graphical models provide a useful language for constructing likelihoods

Binomial Likelihood• Binomial model

– N memoryless trials, 2 outcomes– probability of success at each trial

• Observed data– r successes in n trials – Defines a likelihood:

L() = p(D | ) = p(successes) p(non‐successes)= r (1‐) n‐r

Binomial Likelihood Examples

Graphical Models

• Left – data points are conditionally independent given

• Right – plate notation (same model as left)repeated nodes are inside a box (plate)number in lower right hand corner , specifies the number of repetitions of the node

• Represent using a graphical model:

• Assume each data case was generated independently but from the same distribution

• Data cases are only independent conditional on the parameters

• Marginally, the data cases are dependent• The order in which the data cases arrive makes

no difference to the benefits about (all orderings have same sufficient statistics) data is exchangeable

Graphical Models

• Avoid visual clutter:use a form of syntactic sugar, called plates

• Draw a little box around the repeated variables• With the convention that nodes within the box is

repeated when the model is unrolled• Bottom right corner of the box: number of copies

or repetitions• The corresponding joint distribution has the form:

Plate Notation

Multinomial Likelihood• Multinomial model

– N memoryless trials, K outcomes– Probability vector for outcomes at each trial

• Observed data– nj successes in n trials – Defines a likelihood:

Graphical Model for Multinomial

= [ p(w1), p(w2),….. p(wk) ]

Parameters

Observed data

“Plate” Notation

Data = D = {w1,…wn}

Model parameters

Plate (rectangle) indicates replicated nodes in a graphical model

Variables within a plate are conditionally independent manner given parent

Learning in Graphical Models

Data = D = {w1,…wn}

Model parameters

• Can view learning in a graphical model as computing the most likely value of the parameter node given the data nodes

Maximum Likelihood (ML) Principle

L () = p(Data | ) = p(yi | )

Maximum Likelihood: ML = arg max{ Likelihood() }

Select the parameters that make the observed data most likely

Data = {w1,…wn}

Model parameters

The Bayesian Approach to Learning

Fully Bayesian:p( | Data) = p(Data | ) p() / p(Data)

Maximum A Posteriori:MAP = arg max{ Likelihood() x Prior() }

Prior() = p( )

Summary of Bayesian Learning• Can use graphical models to describe relationships between

parameters and data• P(data | parameters) = Likelihood function• P(parameters) = prior

– In applications such as text mining, prior can be “uninformative”, i.e., flat

– Prior can also be optimized for prediction (e.g., on validation data)

• We can compute P(parameters | data, prior)or a “point estimate” (e.g., posterior mode or mean)

• Computation of posterior estimates can be computationally intractable – Monte Carlo techniques often used

Graphical Models - Department of Systems Engineering and...

Documents

Transcript of Graphical Models - Department of Systems Engineering and...

Probabilistic Graphical Models - People

Probabilistic Graphical Models - EPFL€¦ · Probabilistic Graphical Models Lecture 2: Graphical Models. Belief Propagation Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale

Probabilistic Graphical Models

Compiling Graphical Models

Lecture 1 graphical models

Conditional Graphical Models for Protein Structure Predictionyanliu/file/draft.pdf · Conditional Graphical Models for Protein Structure Prediction Yan Liu ... 3 Graphical models

Statistical Machine Learning€¦ · Principal Component Analysis Autoencoders Graphical Models 1 Graphical Models 2 Graphical Models 3 Sampling Sequential Data 1 Sequential Data

4. Probabilistic Graphical Models Directed Models - TUM › ... › mlcv16 › graphicalmodels.pdf · 2016-05-06 · 4. Probabilistic Graphical Models Directed Models. ... • In

Probabilistic graphical models...Probabilistic graphical models • The basic technical concepts behind probabilistic graphical models and how to work with them. • Applications in

Poisson Graphical Models

Graphical models: Foundations of neural computationpapers.cnl.salk.edu/PDFs/Graphical models... · graphical models framework provides formil definitions of both adap- tivity and

Graphical Models - IIT Bombay

Directed Graphical Models

Graphical Models - Inference -

Probabilistic Graphical Models...Graphical Models, Inference, Learning Graphical Model: A factorized probability representation • Directed: Sequential, causal structure for generative

Probabilistic Graphical Models - Radboud Universiteit · Probabilistic graphical models (PGMs) ... – AssignmentI ImplementaBayesiannetworkforareal-worlddomain. ... :437–48,2014.

Undirected Graphical Models - CEDAR

Gaussian Graphical Models - Oxford Statisticssteffen/teaching/cimpa/gauss.pdfGaussian graphical models Gaussian Graphical Models Ste en Lauritzen University of Oxford CIMPA Summerschool,

- Relational - Graphical Models