Factor Graphs and Message Passing Algorithms — Part 1...
Transcript of Factor Graphs and Message Passing Algorithms — Part 1...
December 2007 1
Factor Graphs and Message Passing Algorithms
— Part 1: Introduction
Hans-Andrea Loeliger
2
The Two Basic Problems
1. Marginalization: Compute
fk(xk)4=
∑x1, . . . , xn
except xk
f (x1, . . . , xn)
2. Maximization: Compute the “max-marginal”
fk(xk)4= max
x1, . . . , xn
except xk
f (x1, . . . , xn)
assuming that f is real-valued and nonnegative and has a maximum.Note that
argmax f (x1, . . . , xn) =(argmax f1(x1), . . . , argmax fn(xn)
).
For large n, both problems are in general intractable(even for x1, . . . , xn ∈ {0, 1}).
3
Factorization Helps
For example, if f (x1, . . . , fn) = f1(x1)f2(x2) · · · fn(xn) then
fk(xk) =∑x1
f1(x1) · · ·∑xk−1
fk−1(xk−1)fk(xk)∑xk+1
fk+1(xk+1) · · ·∑xn
fn(xn)
and
fk(xk) = maxx1
f1(x1) · · ·maxxk−1
fk−1(xk−1)fk(xk) maxxk+1
fk+1(xk+1) · · ·maxxn
fn(xn).
Factorization helps also beyond this trivial example.−→ Factor graphs and message passing algorithms.
4
Roots
Statistical physics:- Markov random fields (Ising 1925?)
Signal processing:- linear state-space models and Kalman filtering: Kalman 1960. . .- recursive least-squares adaptive filters- Hidden Markov models and forward-backward algorithm: Baumet al. 1966. . .
Error correcting codes:- Low-density parity check codes: Gallager 1962; Tanner 1981;MacKay 1996; Luby et al. 1998. . .- Convolutional codes and Viterbi decoding: Forney 1973. . .- Turbo codes: Berrou et al. 1993. . .
Machine learning, statistics:- Bayesian networks: Pearl 1988; Shachter 1988; Lauritzen andSpiegelhalter 1988; Shafer and Shenoy 1990. . .
5
Outline of this talk
1. Factor graphs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2. The sum-product and max product algorithms . . . . . . . . . . . 19
3. On factor graphs with cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4. Factor graphs and error correcting codes . . . . . . . . . . . . . . . . . 51
6
Factor Graphs
A factor graph represents the factorization of a function of severalvariables. We use Forney-style factor graphs (Forney, 2001).
Example:
f (x1, x2, x3, x4, x5) = fA(x1, x2, x3) · fB(x3, x4, x5) · fC(x4).
x1
fA
x2
x3
fB
fC
x4
x5
Rules:
• A node for every factor.
• An edge or half-edge for every variable.
• Node g is connected to edge x iff variable x appears in factor g.
(What if some variable appears in more than 2 factors?)
7
A main application of factor graphs are stochastic models. Example:
Markov Chain
pXY Z(x, y, z) = pX(x) pY |X(y|x) pZ|Y (z|y).
pX
X
pY |X
Y
pZ|Y
Z
8
Other Notation Systems for Graphical Models
Example: p(u, w, x, y, z) = p(u)p(w)p(x|u, w)p(y|x)p(z|x).
WU
X =
Y
Z
Forney-style factor graph.
� ��W
� ��U
� ��X
� ��Y
� ��Z
Original factor graph [FKLW 1997].
� ��W - � ��X
� ��U
?
?
� ��Y
- � ��Z
Bayesian network.
� ��W � ��X�
�����
� ��U
� ��Y
� ��Z
Markov random field.
9
Terminology
x1
fA
x2
x3
fB
fC
x4
x5
Local function = factor (such as fA, fB, fC).
Global function f = product of all local functions; usually (but notalways!) real and nonnegative.
A configuration is an assignment of values to all variables.
The configuration space is the set of all configurations, which is thedomain of the global function.
A configuration ω = (x1, . . . , x5) is valid iff f (ω) 6= 0.
10
Invalid Configurations Do Not Affect Marginals
A configuration ω = (x1, . . . , xn) is valid iff f (ω) 6= 0.
Recall:
1. Marginalization: Compute
fk(xk)4=
∑x1, . . . , xn
except xk
f (x1, . . . , xn)
2. Maximization: Compute the “max-marginal”
fk(xk)4= max
x1, . . . , xn
except xk
f (x1, . . . , xn)
assuming that f is real-valued and nonnegative and has a maximum.
11
Auxiliary Variables
Example: Let Y1 and Y2 be two independent observations of X:
p(x, y1, y2) = p(x)p(y1|x)p(y2|x).
pX
X
=f=
X ′
pY1|X
Y1
X ′′
pY2|X
Y2
Literally, the factor graph represents an extended model
p(x, x′, x′′, y1, y2) = p(x)p(y1|x′)p(y2|x′′)f=(x, x′, x′′)
wheref=(x, x′, x′′)
4= δ(x− x′)δ(x− x′′)
enforces X = X ′ = X ′′ for every valid configuration.
12
Branching Points
Equality constraint nodes may be viewed as branching points:
X
X ′
= X ′′ ⇐⇒ X t
The factorf=(x, x′, x′′)
4= δ(x− x′)δ(x− x′′)
enforces X = X ′ = X ′′ for every valid configuration.
13
Modularity, Special Symbols, Arrows
As a refinement of the previous example, let
Y1 = X + Z1 (1)
Y2 = X + Z2 (2)
with Z1 and Z2 independent of each other and of X:
pX
X
=
?pZ1
-Z1 +
pY1|X
?Y1
?pZ2
�Z2+
pY2|X
?Y2
The “+”-nodes represent the factors δ(x+z1−y1) and δ(x + z2 − y2),which enforce (1) and (2) for every valid configuration.
14
Known Variables vs. Unknown Variables
Known variables (observations, known parameters, . . . ) may beplugged into the corresponding factors.
Example: Y2 = y2 observed.
pX(·)X
=
pY1|X(·|·)
Y1
pY2|X(·|·)
y2 pY2|X(y2|·)
Known variables will be denoted by small letters;unknown variables will usually be denoted by capital letters.
15
From A Priori to A Posteriori Probability
Example (cont’d): Let Y1 = y1 and Y2 = y2 be two independentobservations of X. For fixed y1 and y2, we have
p(x|y1, y2) =p(x, y1, y2)
p(y1, y2)
∝ p(x, y1, y2).
pX
X
=
pY1|X
y1
pY2|X
y2
The factorization is unchanged (except for a scale factor).
16
Example:
Hidden Markov Model
p(x0, x1, x2, . . . , xn, y1, y2, . . . , yn) = p(x0)
n∏k=1
p(xk|xk−1)p(yk|xk−1)
p(x0)-
X0 =
?
p(y1|x0)
?
Y1
-
p(x1|x0)
-X1 =
?
p(y2|x1)
?
Y2
-
p(x2|x1)
-X2
. . .
17
Example:
Hidden Markov Model with Parameter(s)
p(x0, x1, x2, . . . , xn, y1, y2, . . . , yn | θ) = p(x0)
n∏k=1
p(xk|xk−1, θ)p(yk|xk−1)
p(x0)-
X0 =
?
?
Y1
-
p(x1|x0, θ)
-X1 =
?
?
Y2
-
p(x2|x1, θ)
-X2
. . .
Θ=
?
=
?
18
A non-stochastic example:
Least-Squares Problems
Minimizingn∑
k=1
x2k subject to (linear or nonlinear) constraints is
equivalent to maximizing
e−∑n
k=1 x2k =
n∏k=1
e−x2k
subject to the given constraints.
constraints
X1
e−x21
. . .Xn
e−x2n
Here, the factor graph represents a nonnegative real-valued functionthat we wish to maximize.
19
Outline
1. Factor graphs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2. The sum-product and max product algorithms . . . . . . . . . . . 19
3. On factor graphs with cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4. Factor graphs and error correcting codes . . . . . . . . . . . . . . . . . 51
20
Recall:
The Two Basic Problems
1. Marginalization: Compute
fk(xk)4=
∑x1, . . . , xn
except xk
f (x1, . . . , xn)
2. Maximization: Compute the “max-marginal”
fk(xk)4= max
x1, . . . , xn
except xk
f (x1, . . . , xn)
assuming that f is real-valued and nonnegative and has a maximum.
21
Message Passing Algorithms
operate by passing messages along the edges of a factor graph:
-
�
6?
6?
-
�
-
�
6?
6?
-
�
-
� . . .
-
�
6?
-
�
6?
-
�
22
Towards the sum-product algorithm:
Computing Marginals—A Generic Example
Assume we wish to compute
f3(x3) =∑
x1, . . . , x7
except x3
f (x1, . . . , x7)
and assume that f can be factored as follows:
f1
X1
f2
X2
f3
X3
f4
X4
f5
X5
f7
X7
f6
X6
23
Example cont’d:
Closing Boxes by the Distributive Law
f1
X1
f2
X2
f3
X3
f4
X4
f5
X5
f7
X7
f6
X6
f3(x3) =
(∑x1,x2
f1(x1)f2(x2)f3(x1, x2, x3)
)
·
(∑x4,x5
f4(x4)f5(x3, x4, x5)
(∑x6,x7
f6(x5, x6, x7)f7(x7)
))
24
Example cont’d: Message Passing View
f1
X1
f2
X2
f3
X3- �
f4
X4
f5
X5�
f7
X7
f6
X6
f3(x3) =
(∑x1,x2
f1(x1)f2(x2)f3(x1, x2, x3)︸ ︷︷ ︸−→µX3
(x3)
)
·
(∑x4,x5
f4(x4)f5(x3, x4, x5)
(∑x6,x7
f6(x5, x6, x7)f7(x7)︸ ︷︷ ︸←−µX5
(x5)
)︸ ︷︷ ︸
←−µX3(x3)
)
25
Example cont’d: Messages Everywhere
f1
X1-
f2
X2?
f3
X3- �
f4
X4 ?
f5
X5�
f7
X7 ?
f6
X6
With −→µX1(x1)4= f1(x1),
−→µX2(x2)4= f2(x2), etc., we have
−→µX3(x3) =∑x1,x2
−→µX1(x1)−→µX2(x2)f3(x1, x2, x3)
←−µX5(x5) =∑x6,x7
−→µX7(x7)f6(x5, x6, x7)
←−µX3(x3) =∑x4,x5
−→µX4(x4)←−µX5(x5)f5(x3, x4, x5)
26
The Sum-Product Algorithm (Belief Propagation)
HHHH
HHH
Y1
..
.
�������
Yn
gXH
HHj
���* - �
Sum-product message computation rule:
−→µX(x) =∑
y1,...,yn
g(x, y1, . . . , yn)−→µY1(y1) · · · −→µYn(yn)
Sum-product theorem:
If the factor graph for some global function f has no cycles, then
fX(x) = −→µX(x)←−µX(x).
27
Arrows and Notation for Messages
X-
�
-
For edges drawn with arrows:
−→µX denotes the message in the direction of the arrow.←−µX denotes the message in the opposite direction.
Edges may be drawn with arrows just for the sake of this notation.
28
Sum-Product Algorithm: a Simple Example
pX
?
X?
6
=
?
X ′6
pZ1
-Z1
- +
?y1
?
X ′′ 6
pZ2
�Z2�+
?y2
−→µX(x) = pX(x)−→µZ1(z1) = pZ1(z1)−→µZ2(z2) = pZ2(z2)
29
Sum-Product Example cont’d
pX
?
X?
6
=
?
X ′6
pZ1
-Z1
- +
?y1
?
X ′′ 6
pZ2
�Z2�+
?y2
←−µX ′(x′) =
∫z1
−→µZ1(z1) δ(x′ + z1 − y1) dz1
= pZ1(y1 − x′)
←−µX(x) =
∫x′
∫x′′
←−µX ′(x′)←−µX ′′(x
′′) δ(x− x′) δ(x− x′′) dx′ dx′′
= pZ1(y1 − x) pZ2(y2 − x)
30
Sum-Product Example cont’d
pX
?
X?
6
=
?
X ′6
pZ1
-Z1
- +
?y1
?
X ′′ 6
pZ2
�Z2�+
?y2
Marginal of the global function at X:
−→µX(x)←−µX(x) = pX(x) pZ1(y1 − x) pZ2(y2 − x)︸ ︷︷ ︸p(y1,y2|x)
∝ p(x|y1, y2).
31
Messages for Finite-Alphabet Variables
may be represented by a list of function values.
Assume, for example that X takes values in {+1,−1}:
pX
?
X?
6
=
?
X ′6
pZ1
-Z1
- +
?y1
?
X ′′ 6
pZ2
�Z2�+
?y2
−→µX =(−→µX(+1),−→µX(−1)
)=(pX(+1), pX(−1)
)←−µX ′ =
(←−µX ′(+1),←−µX ′(−1))
=(pZ1(y1 − 1), pZ1(y1 + 1)
)etc.
32
Applying the sum-product algorithm to
Hidden Markov Models
yields recursive algorithms for many things.
Recall the definition of a hidden Markov model (HMM):
p(x0, x1, x2, . . . , xn, y1, y2, . . . , yn) = p(x0)
n∏k=1
p(xk|xk−1)p(yk|xk−1)
p(x0)-
X0 =
?
p(y1|x0)
?
y1
-
p(x1|x0)
-X1 =
?
p(y2|x1)
?
y2
-
p(x2|x1)
-X2 =
?
p(y3|x2)
?
y3
-. . .
Assume that Y1 = y1, . . . , Yn = yn are observed (known).
33
Sum-product algorithm applied to HMM:
Estimation of Current State
p(xn|y1, . . . , yn) =p(xn, y1, . . . , yn)
p(y1, . . . , yn)
∝ p(xn, y1, . . . , yn)
=∑x0
. . .∑xn−1
p(x0, x1, . . . , xn, y1, y2, . . . , yn)
= −→µXn(xn).
For n = 2:
-X0
- =
?
6
?
y1
--
-X1
- =
?
6
?
y2
--
-X2
- =
?
6=1
?
Y3
-�
=1. . .
34
Backward Message in Chain Rule Model
pX
-X
�
pY |X
-Y
If Y = y is known (observed):
←−µX(x) = pY |X(y|x),
the likelihood function.
If Y is unknown:
←−µX(x) =∑
y
pY |X(y|x)
= 1.
35
Sum-product algorithm applied to HMM:
Prediction of Next Output Symbol
p(yn+1|y1, . . . , yn) =p(y1, . . . , yn+1)
p(y1, . . . , yn)∝ p(y1, . . . , yn+1)
=∑
x0,x1,...,xn
p(x0, x1, . . . , xn, y1, y2, . . . , yn, yn+1)
= −→µYn(yn).
For n = 2:
-X0
- =
?
6
?
y1
--
-X1
- =
?
6
?
y2
--
-X2
- =
?
?
?
Y3?
-�
=1. . .
36
Sum-product algorithm applied to HMM:
Estimation of Time-k State
p(xk | y1, y2, . . . , yn) =p(xk, y1, y2, . . . , yn)
p(y1, y2, . . . , yn)∝ p(xk, y1, y2, . . . , yn)
=∑
x0, . . . , xn
except xk
p(x0, x1, . . . , xn, y1, y2, . . . , yn)
= −→µXk(xk)←−µXk
(xk)
For k = 1:
-X0
- =
?
6
?
y1
--
-X1
-�
=
?
6
?
y2
-�
-X2� =
?
6
?
y3
-� . . .
37
Sum-product algorithm applied to HMM:
All States Simultaneously
p(xk|y1, . . . , yn) for all k:
-X0
-�
=
?
6
?
y1
--
�
-X1
-�
=
?
6
?
y2
--
�
-X2
-�
=
?
6
?
y3
-�
- . . .
In this application, the sum-product algorithm coincides with theBaum-Welch / BCJR forward-backward algorithm.
38
Scaling of Messages
In all the examples so far:
• The final result (such as −→µXk(xk)←−µXk
(xk)) equals the desiredquantity (such as p(xk|y1, . . . , yn)) only up to a scale factor.
• The missing scale factor γ may be recovered at the end fromthe condition ∑
xk
γ−→µXk(xk)←−µXk
(xk) = 1.
• It follows that messages may be scaled freely along the way.
• Such message scaling is often mandatory to avoid numericalproblems.
39
Sum-product algorithm applied to HMM:
Probability of the Observation
p(y1, . . . , yn) =∑x0
. . .∑xn
p(x0, x1, . . . , xn, y1, y2, . . . , yn)
=∑xn
−→µXn(xn).
This is a number. Scale factors cannot be neglected in this case.
For n = 2:
-X0
- =
?
6
?
y1
--
-X1
- =
?
6
?
y2
--
-X2
- =
?
6=1
?
Y3
-�
=1. . .
40
Towards the max-product algorithm:
Computing Max-Marginals—A Generic Example
Assume we wish to compute
f3(x3) = maxx1, . . . , x7
except x3
f (x1, . . . , x7)
and assume that f can be factored as follows:
f1
X1
f2
X2
f3
X3
f4
X4
f5
X5
f7
X7
f6
X6
41
Example:
Closing Boxes by the Distributive Law
f1
X1
f2
X2
f3
X3
f4
X4
f5
X5
f7
X7
f6
X6
f3(x3) =
(maxx1,x2
f1(x1)f2(x2)f3(x1, x2, x3)
)
·
(maxx4,x5
f4(x4)f5(x3, x4, x5)
(maxx6,x7
f6(x5, x6, x7)f7(x7)
))
42
Example cont’d: Message Passing View
f1
X1
f2
X2
f3
X3- �
f4
X4
f5
X5�
f7
X7
f6
X6
f3(x3) =
(maxx1,x2
f1(x1)f2(x2)f3(x1, x2, x3)︸ ︷︷ ︸−→µX3
(x3)
)
·
(maxx4,x5
f4(x4)f5(x3, x4, x5)
(maxx6,x7
f6(x5, x6, x7)f7(x7)︸ ︷︷ ︸←−µX5
(x5)
)︸ ︷︷ ︸
←−µX3(x3)
)
43
Example cont’d: Messages Everywhere
f1
X1-
f2
X2?
f3
X3- �
f4
X4 ?
f5
X5�
f7
X7 ?
f6
X6
With −→µX1(x1)4= f1(x1),
−→µX2(x2)4= f2(x2), etc., we have
−→µX3(x3) = maxx1,x2
−→µX1(x1)−→µX2(x2)f3(x1, x2, x3)
←−µX5(x5) = maxx6,x7
−→µX7(x7)f6(x5, x6, x7)
←−µX3(x3) = maxx4,x5
−→µX4(x4)←−µX5(x5)f5(x3, x4, x5)
44
The Max-Product Algorithm
HHHHH
HH
Y1
..
.
�������
Yn
gXHHHj
���* - �
Max-product message computation rule:
−→µX(x) = maxy1,...,yn
g(x, y1, . . . , yn)−→µY1(y1) · · · −→µYn(yn)
Max-product theorem:
If the factor graph for some global function f has no cycles, then
fX(x) = −→µX(x)←−µX(x).
45
Max-product algorithm applied to HMM:
MAP Estimate of the State Trajectory
The estimate
(x0, . . . , xn)MAP = argmaxx0,...,xn
p(x0, . . . , xn|y1, . . . , yn)
= argmaxx0,...,xn
p(x0, . . . , xn, y1, . . . , yn)
may be obtained by computing
pk(xk)4= max
x1, . . . , xn
except xk
p(x0, . . . , xn, y1, . . . , yn)
= −→µXk(xk)←−µXk
(xk)
for all k by forward-backward max-product sweeps.
In this example, the max-product algorithm is a time-symmetricversion of the Viterbi algorithm with soft output.
46
Max-product algorithm applied to HMM:
MAP Estimate of the State Trajectory cont’d
Computing
pk(xk)4= max
x1, . . . , xn
except xk
p(x0, . . . , xn, y1, . . . , yn)
= −→µXk(xk)←−µXk
(xk)
simultaneously for all k:
-X0
-�
=
?
6
?
y1
--
�
-X1
-�
=
?
6
?
y2
--
�
-X2
-�
=
?
6
?
y3
-�
- . . .
47
Marginals and Output Edges
Marginals such −→µX(x)←−µX(x) may be viewed as messages out of a“output half edge” (without incoming message):
-X
- � =⇒-
X ′- = -
X ′′�
?
X ?
−→µ X(x) =
∫x′
∫x′′
−→µX ′(x′)←−µX ′′(x
′′) δ(x− x′) δ(x− x′′) dx′ dx′′
= −→µX ′(x)←−µX ′′(x)
=⇒ Marginals are computed like messages out of “=”-nodes.
48
Outline
1. Factor graphs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2. The sum-product and max product algorithms . . . . . . . . . . . 19
3. On factor graphs with cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4. Factor graphs and error correcting codes . . . . . . . . . . . . . . . . . 51
49
What About Factor Graphs with Cycles?
-
�
6?
6?
-
�
-
�
6?
6?
-
�
-
�
-
�
6?
-
�
6?
-
�
50
What About Factor Graphs with Cycles?
• Generally iterative algorithms.
• For example, alternating maximization
xnew = argmaxx
f (x, y) and ynew = argmaxy
f (x, y)
using the max-product algorithm in each iteration.
• Iterative sum-product message passing gives excellent resultsfor maximization(!) in some applications (e.g., the decoding oferror correcting codes).
• Many other useful algorithms can be formulated in messagepassing form (e.g., gradient ascent, Gibbs sampling, expectationmaximization, variational methods,. . . ).
• Rich and vast research area. . .
51
Outline
1. Factor graphs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2. The sum-product and max product algorithms . . . . . . . . . . . 19
3. On factor graphs with cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4. Factor graphs and error correcting codes . . . . . . . . . . . . . . . . . 51
52
Factor Graph of an Error Correcting Code
≈ Tanner graph of the code (Tanner 1981)
A factor graph of a code C ⊂ F n represents (a factorization of)the membership indicator function of the code:
IC : F n → {0, 1} : x 7→{
1, if x ∈ C0, else
53
Factor Graph from Parity Check Matrix
Example: (7, 4, 3) binary Hamming code. (F4= GF(2).)
C = {x ∈ F n : HxT = 0}
with
H =
1 1 1 0 1 0 00 1 1 1 0 1 00 0 1 1 1 0 1
The membership indicator function
IC : F n → {0, 1} : x 7→{
1, if x ∈ C0, else
of this code may be written as
IC(x1, . . . , xn) = δ(x1⊕x2⊕x3⊕x5)·δ(x2⊕x3⊕x4⊕x6)·δ(x3⊕x4⊕x5⊕x7)
where ⊕ denotes addition modulo 2. Each factor corresponds toone row of the parity check matrix.
54
Factor Graph from Parity Check Matrix (cont’d)
Example: (7, 4, 3) binary Hamming code.
C = {x ∈ F n : HxT = 0}
with
H =
1 1 1 0 1 0 00 1 1 1 0 1 00 0 1 1 1 0 1
X1
=
X2
=
X3
=
X4
=
X5 X6 X7
⊕
JJJJJJJ
HHH
HHH
HHH
HHH
HHH
H⊕
����������
ZZZ
ZZZ
ZZZ
ZZ
⊕
����������������
����������
JJJJJJJJ
55
Factor Graph for Joint Code / Channel Model
code
X1 X2. . . Xn
channel model
Y1 Y2. . .
Yn
Example: memoryless channel:
X1
Y1
X2
Y2
. . .
Xn
Yn
56
Factor Graph from Generator Matrix
Example: (7, 4, 3) binary Hamming code is the image of
F k → F n : u 7→ uG
with
G =
1 0 1 1 0 0 00 1 0 1 1 0 00 0 1 0 1 1 00 0 0 1 0 1 1
.
X1 X2
⊕
X3
⊕
X4
⊕
X5
⊕
X6 X7
=
U1
ZZZ
ZZZ
ZZZZ
HHH
HHH
HHH
HHH
HHH
H=
U2
JJJJJJJ
ZZ
ZZ
ZZZ
ZZZ
=
U3
����������
JJJJJJJ
=
U4
����������������
57
Factor Graph of Dual Code
is obtained by interchanging partity check nodes and equality checknodes (Kschischang, Forney).
Works only for Forney-style factor graphs where all code symbolsare external (half-edge) variables.
Example: dual of (7, 4, 3) binary Hamming code
X1
⊕
X2
⊕
X3
⊕
X4
⊕
X5 X6 X7
=
JJJJJJJ
HHHHH
HHH
HHH
HHH
HH=
����������
ZZZ
ZZZ
ZZZ
ZZ
=
����������������
����������
JJJJJJJJ
58
Factor Graph Corresponding to Trellis
Example: (7, 4, 3) binary Hamming code
s����������
1
HHHHH
HHHHH
0
s
s
!!!!!!
!!!!0
cccccccccc
1
##########
1
aaaaaaaaaa0
s
s
s
s
0bbbbbbbbbb
1
""""""""""
1
0
0bbbbbbbbbb
1
""""""""""
1
0
s
s
s
s
01@@@@@@@@@@@@
10
00@@@@@@@@@@@@
11
������������
10
01
������������
11
00
s
s
s
s
aaaaaaaaaa
0
1
``````````
1
!!!!!!
!!!!
0
s
s
HHHHH
HHHHH
1
����
������
0
s
X1 X4 X3 X2 X5 X6 X7
S1 S2 S3 S4 S6
59
Factor Graph of Low-Density Parity-Check Codes
“random” connections
=
X1
LLL
���
=
X2
\\\
���
. . .=
Xn
\\\LLL���
⊕���LLL\\\
. . . ⊕������LLL
Standard decoder: iterative sum-product message passing.Convergence is not guaranteed!
Much recent / ongoing research on improved decoding:Yedidia, Freeman, Weiss; Feldman; Wainwright; Chertkov. . .
60
Single-Number Parameterizationsof Soft-Bit Messages
Difference: ∆4=
µ(0)− µ(1)
µ(0) + µ(1)= mean of {+1,−1}-representation
Ratio: Λ4= µ(0)/µ(1)
Logarithm of ratio: L4= log
(µ(0)/µ(1)
)
61
Conversions among Parameterizations
(µ(0)
µ(1)
)∆ = m Λ L
(µ(0)
µ(1)
) (1+∆
21−∆
2
) (Λ
Λ+11
Λ+1
) (eL
eL+11
eL+1
)
∆ = m µ(0)−µ(1)µ(0)+µ(1)
Λ−1Λ+1 tanh(L/2)
Λ µ(0)µ(1)
1+∆1−∆ eL
L ln µ(0)µ(1) 2 tanh−1(∆) ln Λ
σ2 4µ(0)µ(1)(µ(0)+µ(1))2
1−m2 4Λ(Λ+1)2
4eL+e−L+2
62
Sum-Product Rules for Binary Parity Check Codes
δ[x− y] δ[x− z]
=X-
Z-
Y 6
(µZ(0)
µZ(1)
)=
(µX(0) µY (0)
µX(1) µY (1)
)
∆Z =∆X + ∆Y
1 + ∆X∆Y
ΛZ = ΛX · ΛY
LZ = LX + LY
δ[x⊕ y ⊕ z]
⊕X-
Z-
Y 6
(µZ(0)
µZ(1)
)=
(µX(0) µY (0) + µX(1) µY (1)
µX(0) µY (1) + µX(1) µY (0)
)∆Z = ∆X ·∆Y
ΛZ =1 + ΛXΛY
ΛX + ΛY
tanh(LZ/2) = tanh(LX/2) · tanh(LY /2)
63
Max-Product Rules for Binary Parity Check Codes
δ[x− y] δ[x− z]
=X-
Z-
Y 6
µZ(0)
µZ(1)
=
µX(0) µY (0)
µX(1) µY (1)
LZ = LX + LY
δ[x⊕ y ⊕ z]
⊕X-
Z-
Y 6
µZ(0)
µZ(1)
=
max{µX(0) µY (0), µX(1) µY (1)
}max
{µX(0) µY (1), µX(1) µY (0)
}
|LZ| = min{|LX|, |LY |
}sgn(LZ) = sgn(LX) · sgn(LY )
64
Decomposition of Multi-Bit Checks
= = =
⊕ ⊕ ⊕
65
Encoder of a Turbo Code
-X1,ks
- s?
s? ?n- - n- - n s
6
-X2,k
s
?
Interleaver
- s?
s? ?n- - n- - n s
6
-X3,k
66
Factor Graph of a Turbo Code
X1,k−1
=
X1,k
=
X1,k+1
=
· · ·
X2,k−1 X2,k X2,k+1
· · ·
“random” connections
· · ·
X3,k−1 X3,k X3,k+1
· · ·
67
Turbo Decoder: Conventional View
-
Lin2,k - Decoder A -
Lout,k
s?ns -−
Lin1,k
s6n?n?
LoutA,k������������������LoutB,k
PPPP
PPPP
PPPP
PPPP
P�
InterleaverInverse
interleaver
-
-Lin3,k Decoder B
6
n6s -−
68
Messages in a Turbo Decoder
=
?Lin1,k 6LoutA,k
+LoutB,k=
6LoutA,k ?
?6LoutB,k
=
· · ·6Lin2,k
· · ·
“random” connections
· · ·6Lin3,k
· · ·