Factor Graphs and Message Passing Algorithms — Part 1...

December 2007 1

Factor Graphs and Message Passing Algorithms

— Part 1: Introduction

Hans-Andrea Loeliger

2

The Two Basic Problems

1. Marginalization: Compute

fk(xk)4=

∑x1, . . . , xn

except xk

f (x1, . . . , xn)

2. Maximization: Compute the “max-marginal”

fk(xk)4= max

x1, . . . , xn

except xk

f (x1, . . . , xn)

assuming that f is real-valued and nonnegative and has a maximum.Note that

argmax f (x1, . . . , xn) =(argmax f1(x1), . . . , argmax fn(xn)

).

For large n, both problems are in general intractable(even for x1, . . . , xn ∈ {0, 1}).

3

Factorization Helps

For example, if f (x1, . . . , fn) = f1(x1)f2(x2) · · · fn(xn) then

fk(xk) =∑x1

f1(x1) · · ·∑xk−1

fk−1(xk−1)fk(xk)∑xk+1

fk+1(xk+1) · · ·∑xn

fn(xn)

and

fk(xk) = maxx1

f1(x1) · · ·maxxk−1

fk−1(xk−1)fk(xk) maxxk+1

fk+1(xk+1) · · ·maxxn

fn(xn).

Factorization helps also beyond this trivial example.−→ Factor graphs and message passing algorithms.

4

Roots

Statistical physics:- Markov random fields (Ising 1925?)

Signal processing:- linear state-space models and Kalman filtering: Kalman 1960. . .- recursive least-squares adaptive filters- Hidden Markov models and forward-backward algorithm: Baumet al. 1966. . .

Error correcting codes:- Low-density parity check codes: Gallager 1962; Tanner 1981;MacKay 1996; Luby et al. 1998. . .- Convolutional codes and Viterbi decoding: Forney 1973. . .- Turbo codes: Berrou et al. 1993. . .

Machine learning, statistics:- Bayesian networks: Pearl 1988; Shachter 1988; Lauritzen andSpiegelhalter 1988; Shafer and Shenoy 1990. . .

5

Outline of this talk

1. Factor graphs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2. The sum-product and max product algorithms . . . . . . . . . . . 19

3. On factor graphs with cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4. Factor graphs and error correcting codes . . . . . . . . . . . . . . . . . 51

6

Factor Graphs

A factor graph represents the factorization of a function of severalvariables. We use Forney-style factor graphs (Forney, 2001).

Example:

f (x1, x2, x3, x4, x5) = fA(x1, x2, x3) · fB(x3, x4, x5) · fC(x4).

x1

fA

x2

x3

fB

fC

x4

x5

Rules:

• A node for every factor.

• An edge or half-edge for every variable.

• Node g is connected to edge x iff variable x appears in factor g.

(What if some variable appears in more than 2 factors?)

7

A main application of factor graphs are stochastic models. Example:

Markov Chain

pXY Z(x, y, z) = pX(x) pY |X(y|x) pZ|Y (z|y).

pX

X

pY |X

Y

pZ|Y

Z

8

Other Notation Systems for Graphical Models

Example: p(u, w, x, y, z) = p(u)p(w)p(x|u, w)p(y|x)p(z|x).

WU

X =

Y

Z

Forney-style factor graph.

� ��W

� ��U

� ��X

� ��Y

� ��Z

Original factor graph [FKLW 1997].

� ��W - � ��X

� ��U

?

?

� ��Y

- � ��Z

Bayesian network.

� ��W � ��X�

��

� ��U

� ��Y

� ��Z

Markov random field.

9

Terminology

x1

fA

x2

x3

fB

fC

x4

x5

Local function = factor (such as fA, fB, fC).

Global function f = product of all local functions; usually (but notalways!) real and nonnegative.

A configuration is an assignment of values to all variables.

The configuration space is the set of all configurations, which is thedomain of the global function.

A configuration ω = (x1, . . . , x5) is valid iff f (ω) 6= 0.

10

Invalid Configurations Do Not Affect Marginals

A configuration ω = (x1, . . . , xn) is valid iff f (ω) 6= 0.

Recall:


fk(xk)4=

∑x1, . . . , xn

except xk

f (x1, . . . , xn)


fk(xk)4= max

x1, . . . , xn

except xk

f (x1, . . . , xn)

assuming that f is real-valued and nonnegative and has a maximum.

11

Auxiliary Variables

Example: Let Y1 and Y2 be two independent observations of X:

p(x, y1, y2) = p(x)p(y1|x)p(y2|x).

pX

X

=f=

X ′

pY1|X

Y1

X ′′

pY2|X

Y2

Literally, the factor graph represents an extended model

p(x, x′, x′′, y1, y2) = p(x)p(y1|x′)p(y2|x′′)f=(x, x′, x′′)

wheref=(x, x′, x′′)

4= δ(x− x′)δ(x− x′′)

enforces X = X ′ = X ′′ for every valid configuration.

12

Branching Points

Equality constraint nodes may be viewed as branching points:

X

X ′

= X ′′ ⇐⇒ X t

The factorf=(x, x′, x′′)

4= δ(x− x′)δ(x− x′′)

enforces X = X ′ = X ′′ for every valid configuration.

13

Modularity, Special Symbols, Arrows

As a refinement of the previous example, let

Y1 = X + Z1 (1)

Y2 = X + Z2 (2)

with Z1 and Z2 independent of each other and of X:

pX

X

=

?pZ1

-Z1 +

pY1|X

?Y1

?pZ2

�Z2+

pY2|X

?Y2

The “+”-nodes represent the factors δ(x+z1−y1) and δ(x + z2 − y2),which enforce (1) and (2) for every valid configuration.

14

Known Variables vs. Unknown Variables

Known variables (observations, known parameters, . . . ) may beplugged into the corresponding factors.

Example: Y2 = y2 observed.

pX(·)X

=

pY1|X(·|·)

Y1

pY2|X(·|·)

y2 pY2|X(y2|·)

Known variables will be denoted by small letters;unknown variables will usually be denoted by capital letters.

15

From A Priori to A Posteriori Probability

Example (cont’d): Let Y1 = y1 and Y2 = y2 be two independentobservations of X. For fixed y1 and y2, we have

p(x|y1, y2) =p(x, y1, y2)

p(y1, y2)

∝ p(x, y1, y2).

pX

X

=

pY1|X

y1

pY2|X

y2

The factorization is unchanged (except for a scale factor).

18

A non-stochastic example:

Least-Squares Problems

Minimizingn∑

k=1

x2k subject to (linear or nonlinear) constraints is

equivalent to maximizing

e−∑n

k=1 x2k =

n∏k=1

e−x2k

subject to the given constraints.

constraints

X1

e−x21

. . .Xn

e−x2n

Here, the factor graph represents a nonnegative real-valued functionthat we wish to maximize.

19

Outline

1. Factor graphs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6




20

Recall:

The Two Basic Problems


fk(xk)4=

∑x1, . . . , xn

except xk

f (x1, . . . , xn)


fk(xk)4= max

x1, . . . , xn

except xk

f (x1, . . . , xn)

assuming that f is real-valued and nonnegative and has a maximum.

21

Message Passing Algorithms

operate by passing messages along the edges of a factor graph:

-

�

6?

6?

-

�

-

�

6?

6?

-

�

-

� . . .

-

�

6?

-

�

6?

-

�

22

Towards the sum-product algorithm:

Computing Marginals—A Generic Example

Assume we wish to compute

f3(x3) =∑

x1, . . . , x7

except x3

f (x1, . . . , x7)

and assume that f can be factored as follows:

f1

X1

f2

X2

f3

X3

f4

X4

f5

X5

f7

X7

f6

X6

23

Example cont’d:

Closing Boxes by the Distributive Law

f1

X1

f2

X2

f3

X3

f4

X4

f5

X5

f7

X7

f6

X6

f3(x3) =

(∑x1,x2

f1(x1)f2(x2)f3(x1, x2, x3)

)

·

(∑x4,x5

f4(x4)f5(x3, x4, x5)

(∑x6,x7

f6(x5, x6, x7)f7(x7)

))

24

Example cont’d: Message Passing View

f1

X1

f2

X2

f3

X3- �

f4

X4

f5

X5�

f7

X7

f6

X6

f3(x3) =

(∑x1,x2

f1(x1)f2(x2)f3(x1, x2, x3)︸︷︷︸−→µX3

(x3)

)

·

(∑x4,x5

f4(x4)f5(x3, x4, x5)

(∑x6,x7

f6(x5, x6, x7)f7(x7)︸︷︷︸←−µX5

(x5)

)︸︷︷︸

←−µX3(x3)

)

25

Example cont’d: Messages Everywhere

f1

X1-

f2

X2?

f3

X3- �

f4

X4 ?

f5

X5�

f7

X7 ?

f6

X6

With −→µX1(x1)4= f1(x1),

−→µX2(x2)4= f2(x2), etc., we have

−→µX3(x3) =∑x1,x2

−→µX1(x1)−→µX2(x2)f3(x1, x2, x3)

←−µX5(x5) =∑x6,x7

−→µX7(x7)f6(x5, x6, x7)

←−µX3(x3) =∑x4,x5

−→µX4(x4)←−µX5(x5)f5(x3, x4, x5)

26

The Sum-Product Algorithm (Belief Propagation)

HHHH

HHH

Y1

..

.

��

Yn

gXH

HHj

��* - �

Sum-product message computation rule:

−→µX(x) =∑

y1,...,yn

g(x, y1, . . . , yn)−→µY1(y1) · · · −→µYn(yn)

Sum-product theorem:

If the factor graph for some global function f has no cycles, then

fX(x) = −→µX(x)←−µX(x).

27

Arrows and Notation for Messages

X-

�

-

For edges drawn with arrows:

−→µX denotes the message in the direction of the arrow.←−µX denotes the message in the opposite direction.

Edges may be drawn with arrows just for the sake of this notation.

28

Sum-Product Algorithm: a Simple Example

pX

?

X?

6

=

?

X ′6

pZ1

-Z1

- +

?y1

?

X ′′ 6

pZ2

�Z2�+

?y2

−→µX(x) = pX(x)−→µZ1(z1) = pZ1(z1)−→µZ2(z2) = pZ2(z2)

29

Sum-Product Example cont’d

pX

?

X?

6

=

?

X ′6

pZ1

-Z1

- +

?y1

?

X ′′ 6

pZ2

�Z2�+

?y2

←−µX ′(x′) =

∫z1

−→µZ1(z1) δ(x′ + z1 − y1) dz1

= pZ1(y1 − x′)

←−µX(x) =

∫x′

∫x′′

←−µX ′(x′)←−µX ′′(x

′′) δ(x− x′) δ(x− x′′) dx′ dx′′

= pZ1(y1 − x) pZ2(y2 − x)

30

Sum-Product Example cont’d

pX

?

X?

6

=

?

X ′6

pZ1

-Z1

- +

?y1

?

X ′′ 6

pZ2

�Z2�+

?y2

Marginal of the global function at X:

−→µX(x)←−µX(x) = pX(x) pZ1(y1 − x) pZ2(y2 − x)︸︷︷︸p(y1,y2|x)

∝ p(x|y1, y2).

31

Messages for Finite-Alphabet Variables

may be represented by a list of function values.

Assume, for example that X takes values in {+1,−1}:

pX

?

X?

6

=

?

X ′6

pZ1

-Z1

- +

?y1

?

X ′′ 6

pZ2

�Z2�+

?y2

−→µX =(−→µX(+1),−→µX(−1)

)=(pX(+1), pX(−1)

)←−µX ′ =

(←−µX ′(+1),←−µX ′(−1))

=(pZ1(y1 − 1), pZ1(y1 + 1)

)etc.

32

Applying the sum-product algorithm to

Hidden Markov Models

yields recursive algorithms for many things.

Recall the definition of a hidden Markov model (HMM):

p(x0, x1, x2, . . . , xn, y1, y2, . . . , yn) = p(x0)

n∏k=1

p(xk|xk−1)p(yk|xk−1)

p(x0)-

X0 =

?

p(y1|x0)

?

y1

-

p(x1|x0)

-X1 =

?

p(y2|x1)

?

y2

-

p(x2|x1)

-X2 =

?

p(y3|x2)

?

y3

-. . .

Assume that Y1 = y1, . . . , Yn = yn are observed (known).

33

Sum-product algorithm applied to HMM:

Estimation of Current State

p(xn|y1, . . . , yn) =p(xn, y1, . . . , yn)

p(y1, . . . , yn)

∝ p(xn, y1, . . . , yn)

=∑x0

. . .∑xn−1

p(x0, x1, . . . , xn, y1, y2, . . . , yn)

= −→µXn(xn).

For n = 2:

-X0

- =

?

6

?

y1

--

-X1

- =

?

6

?

y2

--

-X2

- =

?

6=1

?

Y3

-�

=1. . .

34

Backward Message in Chain Rule Model

pX

-X

�

pY |X

-Y

If Y = y is known (observed):

←−µX(x) = pY |X(y|x),

the likelihood function.

If Y is unknown:

←−µX(x) =∑

y

pY |X(y|x)

= 1.

35


Prediction of Next Output Symbol

p(yn+1|y1, . . . , yn) =p(y1, . . . , yn+1)

p(y1, . . . , yn)∝ p(y1, . . . , yn+1)

=∑

x0,x1,...,xn

p(x0, x1, . . . , xn, y1, y2, . . . , yn, yn+1)

= −→µYn(yn).

For n = 2:

-X0

- =

?

6

?

y1

--

-X1

- =

?

6

?

y2

--

-X2

- =

?

?

?

Y3?

-�

=1. . .

36


Estimation of Time-k State

p(xk | y1, y2, . . . , yn) =p(xk, y1, y2, . . . , yn)

p(y1, y2, . . . , yn)∝ p(xk, y1, y2, . . . , yn)

=∑

x0, . . . , xn

except xk

p(x0, x1, . . . , xn, y1, y2, . . . , yn)

= −→µXk(xk)←−µXk

(xk)

For k = 1:

-X0

- =

?

6

?

y1

--

-X1

-�

=

?

6

?

y2

-�

-X2� =

?

6

?

y3

-� . . .

37


All States Simultaneously

p(xk|y1, . . . , yn) for all k:

-X0

-�

=

?

6

?

y1

--

�

-X1

-�

=

?

6

?

y2

--

�

-X2

-�

=

?

6

?

y3

-�

- . . .

In this application, the sum-product algorithm coincides with theBaum-Welch / BCJR forward-backward algorithm.

38

Scaling of Messages

In all the examples so far:

• The final result (such as −→µXk(xk)←−µXk

(xk)) equals the desiredquantity (such as p(xk|y1, . . . , yn)) only up to a scale factor.

• The missing scale factor γ may be recovered at the end fromthe condition ∑

xk

γ−→µXk(xk)←−µXk

(xk) = 1.

• It follows that messages may be scaled freely along the way.

• Such message scaling is often mandatory to avoid numericalproblems.

39


Probability of the Observation

p(y1, . . . , yn) =∑x0

. . .∑xn

p(x0, x1, . . . , xn, y1, y2, . . . , yn)

=∑xn

−→µXn(xn).

This is a number. Scale factors cannot be neglected in this case.

For n = 2:

-X0

- =

?

6

?

y1

--

-X1

- =

?

6

?

y2

--

-X2

- =

?

6=1

?

Y3

-�

=1. . .

40

Towards the max-product algorithm:

Computing Max-Marginals—A Generic Example

Assume we wish to compute

f3(x3) = maxx1, . . . , x7

except x3

f (x1, . . . , x7)

and assume that f can be factored as follows:

f1

X1

f2

X2

f3

X3

f4

X4

f5

X5

f7

X7

f6

X6

41

Example:

Closing Boxes by the Distributive Law

f1

X1

f2

X2

f3

X3

f4

X4

f5

X5

f7

X7

f6

X6

f3(x3) =

(maxx1,x2

f1(x1)f2(x2)f3(x1, x2, x3)

)

·

(maxx4,x5

f4(x4)f5(x3, x4, x5)

(maxx6,x7

f6(x5, x6, x7)f7(x7)

))

42

Example cont’d: Message Passing View

f1

X1

f2

X2

f3

X3- �

f4

X4

f5

X5�

f7

X7

f6

X6

f3(x3) =

(maxx1,x2

f1(x1)f2(x2)f3(x1, x2, x3)︸︷︷︸−→µX3

(x3)

)

·

(maxx4,x5

f4(x4)f5(x3, x4, x5)

(maxx6,x7

f6(x5, x6, x7)f7(x7)︸︷︷︸←−µX5

(x5)

)︸︷︷︸

←−µX3(x3)

)

43

Example cont’d: Messages Everywhere

f1

X1-

f2

X2?

f3

X3- �

f4

X4 ?

f5

X5�

f7

X7 ?

f6

X6

With −→µX1(x1)4= f1(x1),

−→µX2(x2)4= f2(x2), etc., we have

−→µX3(x3) = maxx1,x2

−→µX1(x1)−→µX2(x2)f3(x1, x2, x3)

←−µX5(x5) = maxx6,x7

−→µX7(x7)f6(x5, x6, x7)

←−µX3(x3) = maxx4,x5

−→µX4(x4)←−µX5(x5)f5(x3, x4, x5)

44

The Max-Product Algorithm

HHHHH

HH

Y1

..

.

��

Yn

gXHHHj

��* - �

Max-product message computation rule:

−→µX(x) = maxy1,...,yn

g(x, y1, . . . , yn)−→µY1(y1) · · · −→µYn(yn)

Max-product theorem:

If the factor graph for some global function f has no cycles, then

fX(x) = −→µX(x)←−µX(x).

45

Max-product algorithm applied to HMM:

MAP Estimate of the State Trajectory

The estimate

(x0, . . . , xn)MAP = argmaxx0,...,xn

p(x0, . . . , xn|y1, . . . , yn)

= argmaxx0,...,xn

p(x0, . . . , xn, y1, . . . , yn)

may be obtained by computing

pk(xk)4= max

x1, . . . , xn

except xk

p(x0, . . . , xn, y1, . . . , yn)


(xk)

for all k by forward-backward max-product sweeps.

In this example, the max-product algorithm is a time-symmetricversion of the Viterbi algorithm with soft output.

46

Max-product algorithm applied to HMM:

MAP Estimate of the State Trajectory cont’d

Computing

pk(xk)4= max

x1, . . . , xn

except xk

p(x0, . . . , xn, y1, . . . , yn)


(xk)

simultaneously for all k:

-X0

-�

=

?

6

?

y1

--

�

-X1

-�

=

?

6

?

y2

--

�

-X2

-�

=

?

6

?

y3

-�

- . . .

47

Marginals and Output Edges

Marginals such −→µX(x)←−µX(x) may be viewed as messages out of a“output half edge” (without incoming message):

-X

- � =⇒-

X ′- = -

X ′′�

?

X ?

−→µ X(x) =

∫x′

∫x′′

−→µX ′(x′)←−µX ′′(x

′′) δ(x− x′) δ(x− x′′) dx′ dx′′

= −→µX ′(x)←−µX ′′(x)

=⇒ Marginals are computed like messages out of “=”-nodes.

48

Outline

1. Factor graphs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6




49

What About Factor Graphs with Cycles?

-

�

6?

6?

-

�

-

�

6?

6?

-

�

-

�

-

�

6?

-

�

6?

-

�

50

What About Factor Graphs with Cycles?

• Generally iterative algorithms.

• For example, alternating maximization

xnew = argmaxx

f (x, y) and ynew = argmaxy

f (x, y)

using the max-product algorithm in each iteration.

• Iterative sum-product message passing gives excellent resultsfor maximization(!) in some applications (e.g., the decoding oferror correcting codes).

• Many other useful algorithms can be formulated in messagepassing form (e.g., gradient ascent, Gibbs sampling, expectationmaximization, variational methods,. . . ).

• Rich and vast research area. . .

51

Outline

1. Factor graphs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6




52

Factor Graph of an Error Correcting Code

≈ Tanner graph of the code (Tanner 1981)

A factor graph of a code C ⊂ F n represents (a factorization of)the membership indicator function of the code:

IC : F n → {0, 1} : x 7→{

1, if x ∈ C0, else

53

Factor Graph from Parity Check Matrix

Example: (7, 4, 3) binary Hamming code. (F4= GF(2).)

C = {x ∈ F n : HxT = 0}

with

H =

1 1 1 0 1 0 00 1 1 1 0 1 00 0 1 1 1 0 1

The membership indicator function

IC : F n → {0, 1} : x 7→{

1, if x ∈ C0, else

of this code may be written as

IC(x1, . . . , xn) = δ(x1⊕x2⊕x3⊕x5)·δ(x2⊕x3⊕x4⊕x6)·δ(x3⊕x4⊕x5⊕x7)

where ⊕ denotes addition modulo 2. Each factor corresponds toone row of the parity check matrix.

54

Factor Graph from Parity Check Matrix (cont’d)

Example: (7, 4, 3) binary Hamming code.

C = {x ∈ F n : HxT = 0}

with

H =

1 1 1 0 1 0 00 1 1 1 0 1 00 0 1 1 1 0 1

X1

=

X2

=

X3

=

X4

=

X5 X6 X7

⊕

JJJJJJJ

HHH

HHH

HHH

HHH

HHH

H⊕

��

ZZZ

ZZZ

ZZZ

ZZ

⊕

��

��

JJJJJJJJ

55

Factor Graph for Joint Code / Channel Model

code

X1 X2. . . Xn

channel model

Y1 Y2. . .

Yn

Example: memoryless channel:

X1

Y1

X2

Y2

. . .

Xn

Yn

56

Factor Graph from Generator Matrix

Example: (7, 4, 3) binary Hamming code is the image of

F k → F n : u 7→ uG

with

G =

1 0 1 1 0 0 00 1 0 1 1 0 00 0 1 0 1 1 00 0 0 1 0 1 1

.

X1 X2

⊕

X3

⊕

X4

⊕

X5

⊕

X6 X7

=

U1

ZZZ

ZZZ

ZZZZ

HHH

HHH

HHH

HHH

HHH

H=

U2

JJJJJJJ

ZZ

ZZ

ZZZ

ZZZ

=

U3

��

JJJJJJJ

=

U4

��

57

Factor Graph of Dual Code

is obtained by interchanging partity check nodes and equality checknodes (Kschischang, Forney).

Works only for Forney-style factor graphs where all code symbolsare external (half-edge) variables.

Example: dual of (7, 4, 3) binary Hamming code

X1

⊕

X2

⊕

X3

⊕

X4

⊕

X5 X6 X7

=

JJJJJJJ

HHHHH

HHH

HHH

HHH

HH=

��

ZZZ

ZZZ

ZZZ

ZZ

=

��

��

JJJJJJJJ

58

Factor Graph Corresponding to Trellis

Example: (7, 4, 3) binary Hamming code

s��

1

HHHHH

HHHHH

0

s

s

!!!!!!

!!!!0

cccccccccc

1

##########

1

aaaaaaaaaa0

s

s

s

s

0bbbbbbbbbb

1

""""""""""

1

0

0bbbbbbbbbb

1

""""""""""

1

0

s

s

s

s

01@@@@@@@@@@@@

10

00@@@@@@@@@@@@

11

��

10

01

��

11

00

s

s

s

s

aaaaaaaaaa

0

1

``````````

1

!!!!!!

!!!!

0

s

s

HHHHH

HHHHH

1

��

��

0

s

X1 X4 X3 X2 X5 X6 X7

S1 S2 S3 S4 S6

59

Factor Graph of Low-Density Parity-Check Codes

“random” connections

=

X1

LLL

��

=

X2

\\\

��

. . .=

Xn

\\\LLL��

⊕��LLL\\\

. . . ⊕��LLL

Standard decoder: iterative sum-product message passing.Convergence is not guaranteed!

Much recent / ongoing research on improved decoding:Yedidia, Freeman, Weiss; Feldman; Wainwright; Chertkov. . .

60

Single-Number Parameterizationsof Soft-Bit Messages

Difference: ∆4=

µ(0)− µ(1)

µ(0) + µ(1)= mean of {+1,−1}-representation

Ratio: Λ4= µ(0)/µ(1)

Logarithm of ratio: L4= log

(µ(0)/µ(1)

)

61

Conversions among Parameterizations

(µ(0)

µ(1)

)∆ = m Λ L

(µ(0)

µ(1)

) (1+∆

21−∆

2

) (Λ

Λ+11

Λ+1

) (eL

eL+11

eL+1

)

∆ = m µ(0)−µ(1)µ(0)+µ(1)

Λ−1Λ+1 tanh(L/2)

Λ µ(0)µ(1)

1+∆1−∆ eL

L ln µ(0)µ(1) 2 tanh−1(∆) ln Λ

σ2 4µ(0)µ(1)(µ(0)+µ(1))2

1−m2 4Λ(Λ+1)2

4eL+e−L+2

62

Sum-Product Rules for Binary Parity Check Codes

δ[x− y] δ[x− z]

=X-

Z-

Y 6

(µZ(0)

µZ(1)

)=

(µX(0) µY (0)

µX(1) µY (1)

)

∆Z =∆X + ∆Y

1 + ∆X∆Y

ΛZ = ΛX · ΛY

LZ = LX + LY

δ[x⊕ y ⊕ z]

⊕X-

Z-

Y 6

(µZ(0)

µZ(1)

)=

(µX(0) µY (0) + µX(1) µY (1)

µX(0) µY (1) + µX(1) µY (0)

)∆Z = ∆X ·∆Y

ΛZ =1 + ΛXΛY

ΛX + ΛY

tanh(LZ/2) = tanh(LX/2) · tanh(LY /2)

63

Max-Product Rules for Binary Parity Check Codes

δ[x− y] δ[x− z]

=X-

Z-

Y 6

µZ(0)

µZ(1)

=

µX(0) µY (0)

µX(1) µY (1)

LZ = LX + LY

δ[x⊕ y ⊕ z]

⊕X-

Z-

Y 6

µZ(0)

µZ(1)

=

max{µX(0) µY (0), µX(1) µY (1)

}max

{µX(0) µY (1), µX(1) µY (0)

}

|LZ| = min{|LX|, |LY |

}sgn(LZ) = sgn(LX) · sgn(LY )

64

Decomposition of Multi-Bit Checks

= = =

⊕ ⊕ ⊕

65

Encoder of a Turbo Code

-X1,ks

- s?

s? ?n- - n- - n s

6

-X2,k

s

?

Interleaver

- s?

s? ?n- - n- - n s

6

-X3,k

66

Factor Graph of a Turbo Code

X1,k−1

=

X1,k

=

X1,k+1

=

· · ·

X2,k−1 X2,k X2,k+1

· · ·


· · ·

X3,k−1 X3,k X3,k+1

· · ·

67

Turbo Decoder: Conventional View

-

Lin2,k - Decoder A -

Lout,k

s?ns -−

Lin1,k

s6n?n?

LoutA,k��LoutB,k

PPPP

PPPP

PPPP

PPPP

P�

InterleaverInverse

interleaver

-

-Lin3,k Decoder B

6

n6s -−

68

Messages in a Turbo Decoder

=

?Lin1,k 6LoutA,k

+LoutB,k=

6LoutA,k ?

?6LoutB,k

=

· · ·6Lin2,k

· · ·


· · ·6Lin3,k

· · ·

Factor Graphs and Message Passing Algorithms — Part 1...

Documents

Transcript of Factor Graphs and Message Passing Algorithms — Part 1...