Hidden markov chain and bayes belief networks doctor consortium

Graphical Models of Probability

Graphical models use directed or undirected graphs over a set of random variables to explicitly specify variable dependencies and allow for less restrictive independence assumptions while limiting the number of parameters that must be estimated.

Bayesian Networks: Directed acyclic graphs that indicate causal structure.

Markov Networks: Undirected graphs that capture general dependencies.

Middle ware, CCNT, ZJU04/10/23

Hidden Markov Model

Zhejiang Univ

Yueshen Xu

Overview

Markov Chain HMM Three Core Problems and Algorithms Application

Middleware, CCNT, ZJU04/10/23

Markov Chain

Instance

We can regard the weather as three states： state1 ： Rain

state2 ： Cloudy

state3 ： Sun

Tomorrow

Rain Cloudy Sun

Rain 0.4 0.3 0.3

Cloudy 0.2 0.6 0.2

Sun 0.1 0.1 0.8

We can obtain the transition matrix with long term observation

Definition

one-step transition probability

That is to say, the evolvement of the stochastic process only relies on the current state and has nothing to do with those states before. Then we call this Markov property, and the process is regarded as Markov Process

State Space:

Observation Sequence:

Keystone

Middleware, CCNT, ZJU

state transition matrix

其中：

Initial state probability matrix

04/10/23

A HMM is a double random process, consisting of two parallel parts: Markov Chain: Describe the transition of the states, which is unobservable, by

means of transition probability matrix. Common stochastic process: Describe the stochastic process of the

observable events

Markov Chain（ , A）

Stochastic Process（ B）

State Sequence Observation Sequence

q1, q2, ..., qT o1, o2, ..., oT

Unobservable ObservableCoreFeature

S1 S2 S3

a11 0.3a

b0.80.2

a22 0.4a

b0.30.7

a12 0.5a

a23 0.6a

b0.50.5 a13 0.2

Example:What’s the probability of producing the sequence “abb” for this stochastic process?

S1 S2 S3

a11 0.3a

b0.80.2

a12 0.5a

a23 0.6a

b0.50.5 a13 0.2

S1→S1→S2→S3 0.3*0.8*0.5*1.0*0.6*0.5=0.036

a22 0.4a

b0.30.7

Instance1:

S1 S2 S3

a11 0.3a

b0.80.2

a12 0.5a

a23 0.6a

b0.50.5 a13 0.2

S1→S2→S2→S3 0.5*1.0*0.4*0.3*0.6*0.5=0.018

a22 0.4a

b0.30.7

Instance2:

S1 S2 S3

a11 0.3a

b0.80.2

a12 0.5a

a23 0.6a

b0.50.5 a13 0.2

S1→S1→S1→S3 0.3*0.8*0.3*0.8*0.2*1.0=0.01152

Therefore, the total probability is: 0.036+0.018+0.01152=0.06552

a22 0.4a

b0.30.7

Instance3:

We just know “abb”, but don’t know “S?S?S?”-----That’s the point.

04/10/23

Description

A HMM can be identified by those parameters below:

N: the number of states

M: the number of observable events for each state

A: the state transition matrix

B: observable event probability

: the initial state probability

We generally record it as ),,( BA

04/10/23

Three Core Problem

Evaluation: In the case that the observation sequence and the

model have been preseted, then how can we calculate ?

Optimization:Based on question 1, the question is how to choose a special

sequence so that the observation sequence O can be explained reasonably?

TrainingBased on question 1, here is how to adjust parameters of the

model to maximize ?

TOOOO 21,),,( BA

)|( OP

TqqqS 21

),,( BA )|( OP

We know O, but don’t know Q

04/10/23

Solution

There is no need to expound those algorithms, since we should pay attention to the application context.

Evaluation——Dynamic Programming Forward Backward Optimization——Greedy Viterbi Training——Iterative Baum-Welch & Maximum Likelihood Estimation

You can think over and deduce these methods after the workshop.

Application Context

Just think over it : The feature of HMM Which kind of problem can it describe and model?

Two stochastic sequence One relies on another or two is related. One can be “seen”, but another can not Just think about the Three Core Problem ……

I think we can make a conclusion , just as: Use One sequence to deduce and predict another or Find Out Who is Behind

““Iceberg” Iceberg” ProblemProblem

Application Context(1):Voice Recognition

Statistical DescriptionI. The characteristic pattern of voice, from sampling more often:

T =t1,t2,…, tn

II. The word sequence W(n): W1,W2,...,Wn

III. Therefore, what we concern about is P( W(n)|T )

Formalization DescriptionWhat we have to solve is :

k = arg max{ P( W(n)|T ) }

04/10/23

Application Context(1):Voice Recognition

Baum-WelchRe-estimation

Speechdatabase

FeatureExtraction

Converged?

waveform feature

Recognition FrameworkRecognition Framework

04/10/23

Application Context(2):Text Information Extraction

Figure out the HMM Model : Q1:What ‘s the state and what’s the observation event? Q2:How to figure out those parameters, just like aij?

),,( BA

state : what you want to extract observation event : text block or each word etc

Through Training Samples

04/10/23

Application Context(2):Text Information Extraction

Partition-ing

State List

Extracted Sequence

Document Partitioni-ng

Training Sample

Extraction FrameworkExtraction Frameworkcountry, state , city, street

title, author, email, abstract

04/10/23

Application Context(3):Other Fields:

Face Recognition POS tagging Web Data Extraction Bioinformatics Network intrusion detection Handwriting recognition Document Categorization Multiple Sequence Alignment …

Which field are you interested in ?

04/10/23

Bayes Belief Network

Yueshen Xu, too

Overview

Bayes Theorem Naïve Bayes Theorem Bayes Belief Network Application

Bayes Theorem

Basic Bayes FormulaBasic of basis, but vital.

)()|()|(

BPBAPABP ii

niBPBAP

BPBAPABP n

iii ,...,2,1,

)()|()|(

prior probabilityposterior probability

complete probability formula

Condition Condition InversionInversion

04/10/23

The naive Bayes theorem is a simple probabilistic theorem based on applying Bayes theorem with strong independence assumptions

Naïve Bayes Theorem

),,,,|(),|()|()(

121121

FFFCFPFCFPCFPCP

Chain RuleChain Rule

Conditional IndependenceConditional Independence

)|(),|( CFPFCFP iji

iin CFPCPFFFCP

121 )|()(),,,(

F1F1 F2

F2… Fn

Naïve Bayes is a simple Bayes Net

04/10/23

Bayes Belief Network:Graph Structure

Directed Acyclic Graph (DAG) Nodes are random variables Edges indicate causal influences

BurglaryBurglary EarthquakeEarthquake

AlarmAlarm

JohnCallsJohnCalls MaryCallsMaryCalls

parents

descendant

relationship

04/10/23

Bayes Belief Network:Conditional Probability Table

Each node has a conditional probability table (CPT) that gives the probability of each of its values given every possible combination of values for its parents.

Roots (sources) of the DAG that have no parents are given prior probabilities.

BurglaryBurglary EarthquakeEarthquake

AlarmAlarm

JohnCallsJohnCalls MaryCallsMaryCalls

B E P(A)

T T .95

T F .94

F T .29

F F .001

A P(M)

A P(J)

04/10/23

Bayes Belief Network:Joint Distributions

A Bayesian Network implicitly defines a joint distribution.

))(Parents|(),...,(1

iin XxPxxxP

Example

)( EBAMJP

)()()|()|()|( EPBPEBAPAMPAJP

00062.0998.0999.0001.07.09.0 Therefore an inefficient approach to inference is:

– 1) Compute the joint distribution using this equation.– 2) Compute any desired conditional probability using the joint

distribution.

Conditional Independence

04/10/23

Conditional Independence &D-separation

D-separation− Let X,Y and Z be three sets of node

− If X and Y are d-separation by Z then X and Y are conditional independent given Z

D-separation− A is d-separation from B given C if

every undirected path between them is blocked

Path blocking− Three cases that expand on three

basic independence structures.

Application:Simple Document Classification(1)

Step1: Assume for the moment that there are only two mutually exclusive classes, S and ¬S (eg, spam and not spam), such that every element(email) is in either one or the other, that is to say:

Step2: what we concern about is :

)|()|(

04/10/23

Application:Simple Document Classification(2)

Step3: Dividing one by the other gives, and the be re-factored .

Step4: Taking the logarithm of all these ratios for decreasing calculated quantity:

Known Sample

Training

Application:Overall

Medical diagnosis Pathfinder system outperforms leading experts in diagnosis of lymph-node

disease.

Microsoft applications Problem diagnosis: printer problems Recognizing user intents for HCI

Text categorization and spam filtering Student modeling for intelligent tutoring systems. Biochemical Data Analysis Predicting mutagenicity

So many…

Which field are you interested in ?

Hidden markov chain and bayes belief networks doctor consortium

Education

Transcript of Hidden markov chain and bayes belief networks doctor consortium

Bayesian Learning - University of Auckland · 2018. 3. 7. · Bayesian Belief Networks • Naïve Bayes assumes all the attributes are conditionally independent • Bayesian Belief

Graphical models, belief propagation, and Markov random fields Bill Freeman, MIT Fredo Durand, MIT 6.882 March 21, 2005.

Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

A CONTINUOUS-TIME SEMI-MARKOV BAYESIAN BELIEF … · Moura & Droguett – A continuous-time semi-Markov Bayesian belief network model for availability measure estimation of fault

MODEL BELIEF ADJUSTMENT DALAM PENGAMBILAN ...Bayes’ Theorem, yaitu pola penyajian informasi dan urutan informasi. Ashton dan Ashton (1988) menyatakan bahwa model belief adjustment

Using of Bayes Belief Networks for Sustainable Development Analysis

CHAPTER 15 SECTION 1 – 2 Markov Models. Outline Probabilistic Inference Bayes Rule Markov Chains.

1 Lecture 6. Maximum Likelihood Conditional Probability and two-stage experiments Markov Chains (introduction). Markov Chains with Mathematica Bayes formula.

Markov Random Fields, Graph Cuts, Belief Propagation Computational Photography Connelly Barnes Slides from Bill Freeman.

Bayesian Belief Network - ULisboa · 3 Naive Bayes assumption of conditional independence too restrictive But it's intractable without some such assumptions... Bayesian Belief networks

Directed - Bayes Nets Undirected - Markov Random Fields Gibbs Random Fields Causal graphs and causality GRAPHICAL MODELS.

Bayes Belief Network

Bayes’ Theorem, Bayesian Networks and Hidden Markov Model

Markov Localization & Bayes Filtering 1 with Kalman Filters Discrete Filters Particle Filters Slides adapted from Thrun et al., Probabilistic Robotics.

Markov Localization & Bayes Filtering - Dalhousie University

A Tutorial On Hidden Markov Models And Selected ...murphyk/Bayes/rabiner.pdf · Title: A tutorial on hidden Markov models and selected applications in speech r ecognition - Proceedings

Hidden Markov Models - Northwestern Engineeringddowney/courses/474_Fall2017/hmms.pdf · Hidden Markov Models as Bayes Nets ... 1 according to prob e 1 (x 1) 3. Go to state ...

CERIAS Tech Report 2007-83 Relational Dependency Networks ... › tools_and_resources › ... · Bayes networks and relational Markov networks and outline the relative strengths of

Judgement and Decision Making in Information Systems Diagnostic Modeling: Bayes’ Theorem, Influence Diagrams and Belief Networks Yuval Shahar, M.D., Ph.D.

Bayes Filters - Peoplepabbeel/cs287-fa11/slides/bayes-filters.pdf · Bayes rule allows us to compute probabilities that are hard to assess otherwise. ! Under the Markov assumption,