Recent developments in imprecise probabilities and ...gdcooma/presentations/recipgm.pdf ·...
Transcript of Recent developments in imprecise probabilities and ...gdcooma/presentations/recipgm.pdf ·...
Recent developments in imprecise probabilitiesand probabilistic graphical models
Gert de Cooman
Ghent University, SYSTeMS
[email protected]://users.UGent.be/˜gdcooma
gertekoo.wordpress.com
ECAI 201231 August 2012
What would I like to achieve and convey?
IMPRECISEPROBABILITIES
PROBABILISTICGRAPHICAL
MODELS
IMPRECISE PROBABILITYMODELS
Credal sets
Mass functions and expectationsAssume we are uncertain about:I the value or a variable XI in a finite set of possible values X.
This is usually modelled by a probability mass function p on X:
p(x)≥ 0 and ∑x∈X
p(x) = 1;
With p we can associate a prevision/expectation operator Pp:
Pp(f ) := ∑x∈X
p(x) f (x) where f : X→ R.
If A⊆X is an event, then its probability is given by
Pp(A) = ∑x∈A
p(x) = Pp(IA).
The simplex of all probability mass functions
Consider the simplex ΣX of all mass functions on X:
ΣX :=
{p ∈ RX
+ : ∑x∈X
p(x) = 1
}.
b
c
a
ΣX
(0,1,0)
(0,0,1)
(1,0,0)
b
c
a
ΣX
pu
Credal sets
DefinitionA credal set M is a convex closed subset of ΣX .
b
c
a Mb
c
a
M
b
c
a
M
b
c
a
M
It is completely characterised by its set of extreme points ext(M ).
Conditioning and credal sets
Suppose we have two variables X1 in X1 and X2 in X2.
A credal set for (X1,X2) jointly is a convex closed set of joint massfunctions p(x1,x2):
M ⊆ ΣX1×X2
This gives rise to a conditional model by applying Bayes’s Rule to eachmass function:
M |x2 := {p(·|x2) : p ∈M } .
Working with extreme points does the job too.
Independence and credal sets
Suppose we have two variables X1 in X1 and X2 in X2.
Marginal models are credal sets for X1 and X2) separately:
M1 ⊆ ΣX1 and M2 ⊆ ΣX2
Their strong product is the joint credal set:
M1�M2 := CCH({p1 ·p2 : p1 ∈M1 and p2 ∈M2} .
This leads to a notion of strong independence.
Lower previsions
Lower and upper previsions
b
c
a
ΣX
P(I{c}) = 1/4
P(I{c}) = 4/7
Equivalent modelConsider the set L (X) = RX of all real-valued maps on X. We definetwo real functionals on L (X): for all f : X→ R
PM (f ) = min{Pp(f ) : p ∈M } lower prevision/expectationPM (f ) = max{Pp(f ) : p ∈M } upper prevision/expectation.
Observe thatPM (f ) =−PM (−f ).
Basic properties of lower previsions
DefinitionWe call a real functional P on L (X) a lower prevision if it satisfies thefollowing properties:for all f and g in L (X) and all real λ ≥ 0:
1. P(f )≥min f [boundedness];2. P(f +g)≥ P(f )+P(g) [super-additivity];3. P(λ f ) = λP(f ) [non-negative homogeneity].
TheoremA real functional P is a lower prevision if and only if it is the lowerenvelope of some credal set M .
Conditioning and lower previsions
Suppose we have two variables X1 in X1 and X2 in X2.
Consider for instance:I a joint lower prevision P1,2 for (X1,X2) defined on L (X1×X2);I a conditional lower prevision P2(·|x1) for X2 conditional on X1 = x1,
defined on L (X2), for all values x1 ∈X1.
CoherenceThese lower previsions P1,2 and P2(·|X1) must satisfy certain (joint)coherence criteria: compare with Bayes’s Rule and de Finetti’scoherence criteria for precise previsions
See the web site of SIPTA (www.sipta.org) for pointers to moredetails.
Independence and lower previsions
Suppose we have two variables X1 in X1 and X2 in X2.
Definition (Epistemic irrelevance)X1 is epistemically irrelevant to X2 when learning the value of X1 doesnot change our beliefs about X2:
P1,2(f (X2)) = P2(f (X2)|x1) for all f ∈L (X2) and all x1 ∈X1
Important:Epistemic irrelevance is not a symmetrical notion!It is weaker than strong independence.
Epistemic independence (also weaker) is the symmetrised version.
Sets of desirable gambles
First steps: Peter Walley (2000)
@ARTICLE{walley2000,author = {Walley, Peter},title = {Towards a unified theory of imprecise probability},journal = {International Journal of Approximate Reasoning},year = 2000,volume = 24,pages = {125--148}
}
First steps: Peter Williams (1977)
@ARTICLE{williams2007,author = {Williams, Peter M.},title = {Notes on conditional previsions},journal = {International Journal of Approximate Reasoning},year = 2007,volume = 44,pages = {366--383}
}
Set of desirable gambles as a belief model
Gambles:A gamble f : X→ R is an uncertain reward whose value is f (X)
Set of desirable gambles:D ⊆L (X) is a set of gambles that a subject strictly prefers to zero
Why work with sets of desirable gambles?
Working with sets of desirable gambles D:I is simpler, more intuitive and more elegantI is more general and expressive than (conditional) lower previsionsI gives a geometrical flavour to probabilistic inferenceI includes classical propositional logic as another special caseI shows that probabilistic inference and Bayes’ Rule are ‘logical’
inferenceI includes precise probability as one special caseI avoids problems with conditioning on sets of probability zero
Most comprehensive approach so far: note on arXiv
Introduction to Imprecise Probabilities
@BOOK{troffaes2012,title = {Introduction to Imprecise Probabilities},publisher = {Wiley},editor = {Augustin, Thomas and Coolen, Frank P. A.
and De Cooman, Gert and Troffaes, Matthias C. M.},note = {Due end 2012},
}
IMPRECISE-PROBABILISTICGRAPHICAL MODELS
Credal sets
Credal networks: the special case of a tree
Basic conceptConsider a directed tree T, with a variable Xt attached to each nodet ∈ T.
X1
X2
X3 X4
X5
X6
X7
X8 X9
X10 X11
Each variable Xt assumes values in a set Xt.
Credal trees: local uncertainty models
Local uncertainty model associated with each node tFor each possible value xm(t) ∈Xm(t) of the mother variable Xm(t), wehave a local conditional credal set
Mt|Xm(t)
which is a collection of credal sets
Mt|xm(t) ⊆ ΣXt for each xm(t) ∈Xm(t)
Xm(t)
Xs . . . Xt . . . Xs′
Interpretation of the graphical structure
The graphical structure is interpreted as follows:Conditional on the mother variable, the non-parent non-descendants ofeach node variable are strongly independent of it and its descendants.
X1
X2
X3 X4
X5
X6
X7
X8 X9
X10 X11
Lower previsions
Credal trees: local uncertainty models
Local uncertainty model associated with each node tFor each possible value xm(t) ∈Xm(t) of the mother variable Xm(t), wehave a conditional lower prevision/expectation
Qt(·|xm(t)) : L (Xt)→ R
where
Qt(f |xm(t)) = lower prevision of f (Xt), given that Xm(t) = xm(t).
The local model Qt(·|Xm(t)) is a conditional lower prevision operator.
Xm(t)
Xs . . . Xt . . . Xs′
Interpretation of the graphical structure
The graphical structure is interpreted as follows:Conditional on the mother variable, the non-parent non-descendants ofeach node variable are epistemically irrelevant to it and itsdescendants.
X1
X2
X3 X4
X5
X6
X7
X8 X9
X10 X11
@ARTICLE{cooman2010,author = {{d}e Cooman, Gert and Hermans, Filip and Antonucci, Alessandro and Zaffalon, Marco},title = {Epistemic irrelevance in credal nets: the case of imprecise {M}arkov trees},journal = {International Journal of Approximate Reasoning},year = 2010,volume = 51,pages = {1029--1052},doi = {10.1016/j.ijar.2010.08.011}
}
MePICTIr for updating a credal tree
For a credal tree we can find the joint model from the local modelsrecursively, from leaves to root.
Exact message passing algorithm
– credal tree treated as an expert system– linear complexity in the number of nodes
Python code
– written by Filip Hermans– testing and connection with strong independence in cooperation
with Marco Zaffalon and Alessandro Antonucci
Current (toy) applications in HMMscharacter recognition, air traffic trajectory tracking and identification,earthquake rate prediction
@INPROCEEDINGS{cooman2011,author = {De Bock, Jasper and {d}e Cooman, Gert},title = {Imprecise probability trees: Bridging two theories of imprecise probability},booktitle = {ISIPTA ’09 -- Proceedings of the 6th International Symposium on Imprecise Probability: Theories and Applications},year = 2009,editor = {Coolen, Frank P. A. and {d}e Cooman, Gert and Fetz, Thomas and Oberguggenberger, Michael},address = {Innsbruck, Austria},publisher = {SIPTA}
}
A HMM is a special credal tree
X1 X2 Xk Xn
O1 O2 Ok On
Q1 (·) Q2(·|X1) Qk(·|Xk−1) Qn(·|Xn−1)
S1(·|X1) S2(·|X2) Sk(·|Xk) Sn(·|Xn)
State sequence:
Output sequence:
Maximal state sequences
Classically (Viterbi):Find the state sequence x̂1:n that maximises the posterior probabilityp(x1:n|o1:n) corresponding to a given observation sequence o1:n.
Maximality (under robust ordering):Define a partial order > on state sequences:
x̂1:n > x1:n iff p(x̂1:n|o1:n)> p(x1:n|o1:n) for all compatible p(·|o1:n)
Find the state sequences x̂1:n that are maximal: undominated by anyother state sequence.
ESTIHMM for finding all maximal state sequences
Exact backward-forward algorithm
– developed by Jasper De Bock– finds all maximal state sequences that correspond to a given
observation sequence– quadratic complexity in the number of nodes [linear]– cubic complexity in the number of states [quadratic]– linear complexity in the number of maximal sequences. [linear]
Python code
– written by Jasper De Bock
Current (toy) applications in HMMscharacter recognition, finding gene islands
Sets of desirable gambles
@ARTICLE{moral2005,author = {Moral, Serafín},title = {Epistemic irrelevance on sets of desirable gambles},journal = {Annals of Mathematics and Artificial Intelligence},year = 2005,volume = 45,pages = {197--214},doi = {10.1007/s10472-005-9011-0}
}
Most comprehensive approach so far: note on arXiv