1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

37
1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003

Transcript of 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

Page 1: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

1

Structure and Uncertainty

Peter Green, University of Bristol, 10 July 2003

Page 2: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

3

“If your experiment needs statistics, you ought to have done a better experiment”

Statistics and science

Ernest Rutherford (1871-1937)

Page 3: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

5

Graphical models

Modelling

Inference

Mathematics

Algorithms

Page 4: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

6

Markov chains

Graphical models

Contingencytables

Spatial statistics

Sufficiency

Regression

Covariance selection

Statisticalphysics

Genetics

AI

Page 5: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

8

1. Modelling

Modelling

Inference

Mathematics

Algorithms

Page 6: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

9

Structured systems

A framework for building models, especially probabilistic models, for empirical data

Key idea - – understand complex system– through global model– built from small pieces

• comprehensible• each with only a few variables • modular

Page 7: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

12

Mendelian inheritance - a natural structured model

A

O

AB

A

O

AB AO

AO OO

OO

Mendel

Page 8: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

13

Ion channelmodel

levels &variances

modelindicator

transitionrates

hiddenstate

data

binarysignal

Hodgson and Green, Proc Roy Soc Lond A, 1999

Page 9: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

14

levels &variances

modelindicator

transitionrates

hiddenstate

data

binarysignal

O1 O2

C1 C2 C3

** *

*******

*

Page 10: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

15

Gene expression using Affymetrix chips

20µm

Millions of copies of a specificoligonucleotide sequence element

Image of Hybridised Array

Approx. ½ million differentcomplementary oligonucleotides

Single stranded, labeled RNA sample

Oligonucleotide element

**

**

*

1.28cm

Hybridised Spot

Slide courtesy of Affymetrix

Expressed genes

Non-expressed genes

Zoom Image of Hybridised Array

Page 11: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

16

Gene expression is a hierarchical process

• Substantive question• Experimental design• Sample preparation• Array design & manufacture• Gene expression matrix• Probe level data• Image level data

Page 12: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

20

Mapping of rare diseases using Hidden Markov model

G & Richardson, 2002

Larynx cancer in females in France,1986-1993 (standardised ratios)

Posterior probabilityof excess risk

Page 13: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

22

Probabilistic expert systems

Page 14: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

23

2. Mathematics

Modelling

Inference

Mathematics

Algorithms

Page 15: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

24

Graphical models

Use ideas from graph theory to• represent structure of a joint

probability distribution• by encoding conditional

independencies

D

EB

C

A

F

Page 16: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

25

• Genetics– pedigree (family connections)

• Lattice systems– interaction graph (e.g. nearest

neighbours)• Gaussian case

– graph determined by non-zeroes in inverse variance matrix

Where does the graph come from?

Page 17: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

26

1000

0300

0020

0001

A B C D

A B C D

A

B

C

D

Inverse of (co)variance matrix:

independent case

Page 18: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

27

3210

2410

1121

0012

A B C D

non-zero

non-zero),|,cov( CADB

A B C D

A

B

C

D

Inverse of (co)variance matrix:

dependent case

Few links implies few parameters - Occam’s razor

Page 19: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

29

Conditional independence

• X and Z are conditionally independent given Y if, knowing Y, discovering Z tells you nothing more about X: p(X|Y,Z) = p(X|Y)

• X Z Y

X Y Z

Page 20: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

30

Conditional independence

as seen in data on perinatal mortality vs. ante-natal care….

Does survival depend on ante-natal care?

.... what if you know the clinic?

Page 21: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

31

ante

clinic

survival

survival and clinic are dependent

and ante and clinic are dependent

but survival and ante are CI given clinic

Conditional independence

Page 22: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

32

D

EB

C

A

F

Conditional independence provides a mathematical basis for splitting up a large system into smaller components

Page 23: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

33

B

C

E

D

A

B

F

D

E

Page 24: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

34

3. Inference

Modelling

Inference

Mathematics

Algorithms

Page 25: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

36

Bayesian paradigm in structured modelling• ‘borrowing strength’• automatically integrates out all sources

of uncertainty• properly accounting for variability at all

levels• including, in principle, uncertainty in

model itself• avoids over-optimistic claims of certainty

Page 26: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

38

Bayesian structured modelling• ‘borrowing strength’• automatically integrates out all

sources of uncertainty

• … for example in forensic statistics with DNA probe data…..

Page 27: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

39 (thanks to J Mortera)

Page 28: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

40

Page 29: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

42

4. Algorithms

Modelling

Inference

Mathematics

Algorithms

Page 30: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

43

Algorithms for probability and likelihood calculations

Exploiting graphical structure:• Markov chain Monte Carlo• Probability propagation (Bayes nets)• Expectation-Maximisation• Variational methods

Page 31: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

44

Markov chain Monte Carlo

• Subgroups of one or more variables updated randomly,– maintaining detailed balance with

respect to target distribution

• Ensemble converges to equilibrium = target distribution ( = Bayesian posterior, e.g.)

Page 32: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

45

Markov chain Monte Carlo

?

Updating ? - need only look at neighbours

Page 33: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

46

Probability propagation

7 6 5

2 3 41

12

267 236 345626 36

2

form junction tree

Page 34: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

47

rootroot

Message passing in junction tree

Page 35: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

48

rootroot

Message passing in junction tree

Page 36: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

52

Structured systems’ success stories include...

• Genomics & bioinformatics– DNA & protein sequencing,

gene mapping, evolutionary genetics

• Spatial statistics– image analysis, environmetrics,

geographical epidemiology, ecology

• Temporal problems– longitudinal data, financial time series,

signal processing

Page 37: 1 Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003.

53

http://www.stats.bris.ac.uk/~peter

[email protected]

…thanks to many