Download - CS252 Graduate Computer Architecture Lecture 17 ECC (continued), CRC

CS252Graduate Computer Architecture

Lecture 17

ECC (continued), CRC

John Kubiatowicz

Electrical Engineering and Computer Sciences

University of California, Berkeley

http://www.eecs.berkeley.edu/~kubitron/cs252

4/1/2009 cs252-S09, Lecture 17 2

Code Space

v0

C0=f(v0)

Code Distance(Hamming Distance)

Review: Code Vector Space

• Not every vector in the code space is valid• Hamming Distance (d):

– Minimum number of bit flips to turn one code word into another

• Number of errors that we can detect: (d-1)• Number of errors that we can fix: ½(d-1)

4/1/2009 cs252-S09, Lecture 17 3

Review: How to Generate code words?• Consider a linear code. Need a Generator Matrix.

– Let vi be the data value (k bits), Ci be resulting code (n bits):

• Are there 2k unique code values?– Only if the k columns of G are linearly independent!

• Of course, need some way of decoding as well.

– Is this linear??? Why or why not?

• A code is systematic if the data is directly encoded within the code words.

– Means Generator has form:– Can always turn non-systematic

code into a systematic one (row ops)

'idi Cfv

ii vC G

P

IG

G must be an nk matrix

4/1/2009 cs252-S09, Lecture 17 4

Implicitly Defining Codes by Check Matrix• But – what is the distance of the code? Not obvious• Instead, consider a parity-check matrix H (n[n-k])

– Compute the following syndrome Si given code element Ci:

– Define valid code words Ci as those that give Si=0 (null space of H)– Size of null space? (n-rank H)=k if (n-k) linearly independent columns in H

• Suppose you transmit code word C, and there is an error. Model this as vector E which flips selected bits of C to get R (received):

• Consider what happens when we multiply by H:

• What is distance of code?– Code has distance d if no sum of d-1 or less columns yields 0– I.e. No error vectors, E, of weight < d have zero syndromes– Code design: Design H matrix with these properties

ii CS H

ECR

EECRS HHH )(

Error vector

4/1/2009 cs252-S09, Lecture 17 5

How to relate G and H (Binary Codes)• Defining H makes it easy to understand distance of

code, but hard to generate code (H defines code implicitly!)

• However, let H be of following form:

• Then, G can be of following form (maximal code size):

• Notice: G generates values in null-space of H

IPH | P is (n-k)k, I is (n-k)(n-k)Result: H is (n-k)k

P

IG P is (n-k)k, I is kk

Result: G is nk

0|

iii vvS

P

IIPGH

4/1/2009 cs252-S09, Lecture 17 6

Simple example (Parity, D=2)• Parity code (8-bits):

• Note: Complexity of logic depends on number of 1s in row!

111111111H

11111111

10000000

01000000

00100000

00010000

00001000

00000100

00000010

00000001

G

v7

v6

v5

v4

v3

v2

v1

v0

+ c8

+ s0

C8

C7

C6

C5

C4

C3

C2

C1

C0

4/1/2009 cs252-S09, Lecture 17 7

Simple example: Repetition (voting, D=3)• Repetition code (1-bit):

• Positives: simple

• Negatives: – Expensive: only 33% of code word is data

– Not packed in Hamming-bound sense (only D=3). Could get much more efficient coding by encoding multiple bits at a time

101

011H

1

1

1

G

C0

C1

C2

Error

v0

C0

C1

C2

4/1/2009 cs252-S09, Lecture 17 8

p1

p2

p3

Note: number bits from left to right.

• Example: (7,4) code: – Protect 4 data bits with 3 parity bits

1 2 3 4 5 6 7

p1 p2 d1 p3 d2 d3 d4

• Bit position number

001 = 110

011 = 310

101 = 510

111 = 710

010 = 210

011 = 310

110 = 610

111 = 710

100 = 410

101 = 510

110 = 610

111 = 710

Simple Example: Hamming Code (d=3)

1111000

1100110

1010101

H

1000

0100

0010

1110

0001

1101

1011

G

4/1/2009 cs252-S09, Lecture 17 9

How to correct errors?• But – what is the distance of the code? Not obvious

• Instead, consider a parity-check matrix H (n[n-k])

– Compute the following syndrome Si given code element Ci:

• Suppose that two correctable error vectors E1 and E2 produce same syndrome:

• But, since both E1 and E2 have (d-1)/2 bits, E1 + E2 d-1 bits set this cannot be true!

• So, syndrome is unique indicator of correctable error vectors

ECS ii HH

set bits moreor d has

0

21

2121

EE

EEEE

HHH

4/1/2009 cs252-S09, Lecture 17 10

Example, d=4 code (SEC-DED)• Design H with:

– All columns non-zero, odd-weight, distinct» Note that odd-weight refers to Hamming Weight, i.e. number of zeros

• Why does this generate d=4?– Any single bit error will generate a distinct, non-zero value– Any double error will generate a distinct, non-zero value

» Why? Add together two distinct columns, get distinct result– Any triple error will generate a non-zero value

» Why? Add together three odd-weight values, get an odd-weight value– So: need four errors before indistinguishable from code word

• Because d=4:– Can correct 1 error (Single Error Correction, i.e. SEC)– Can detect 2 errors (Double Error Detection, i.e. DED)

• Example:– Note: log size of nullspace will

be (columns – rank) = 4, so:» Rank = 4, since rows

independent, 4 cols indpt» Clearly, 8 bits in code word» Thus: (8,4) code

7

6

5

4

3

2

1

0

3

2

1

0

10001110

01001101

00101011

00010111

C

C

C

C

C

C

C

C

S

S

S

S

4/1/2009 cs252-S09, Lecture 17 11

Tweeks:• No reason cannot make code shorter than required

• Suppose n-k=8 bits of parity. What is max code size (n) for d=4?

– Maximum number of unique, odd-weight columns: 27 = 128

– So, n = 128. But, then k = n – (n – k) = 120. Weird!

– Just throw out columns of high weight and make 72, 64 code!

• But – shortened codes like this might have d > 4 in some special directions

– Example: Kaneda paper, catches failures of groups of 4 bits

– Good for catching chip failures when DRAM has groups of 4 bits

• What about EVENODD code?– Can be used to handle two erasures

– What about two dead DRAMs? Yes, if you can really know they are dead

4/1/2009 cs252-S09, Lecture 17 12

4/1/2009 cs252-S09, Lecture 17 13

Galois Field• Definition: Field: a complete group of elements with:

– Addition, subtraction, multiplication, division– Completely closed under these operations– Every element has an additive inverse– Every element except zero has a multiplicative inverse

• Examples:– Real numbers– Binary, called GF(2) Galois Field with base 2

» Values 0, 1. Addition/subtraction: use xor. Multiplicative inverse of 1 is 1– Prime field, GF(p) Galois Field with base p

» Values 0 … p-1» Addition/subtraction/multiplication: modulo p» Multiplicative Inverse: every value except 0 has inverse» Example: GF(5): 11 1 mod 5, 23 1mod 5, 44 1 mod 5

– General Galois Field: GF(pm) base p (prime!), dimension m» Values are vectors of elements of GF(p) of dimension m» Add/subtract: vector addition/subtraction» Multiply/divide: more complex» Just like read numbers but finite!» Common for computer algorithms: GF(2m)

4/1/2009 cs252-S09, Lecture 17 14

Specific Example: Galois Fields GF(2n)• Consider polynomials whose coefficients come from GF(2).

• Each term of the form xn is either present or absent.

• Examples: 0, 1, x, x2, and x7 + x6 + 1

= 1·x7 + 1· x6 + 0 · x5 + 0 · x4 + 0 · x3 + 0 · x2 + 0 · x1 + 1· x0

• With addition and multiplication these form a field:

• “Add”: XOR each element individually with no carry:x4 + x3 + + x + 1

+ x4 + + x2 + x

x3 + x2 + 1

• “Multiply”: multiplying by x is like shifting to the left.

x2 + x + 1 x + 1

x2 + x + 1 x3 + x2 + x x3 + 1

4/1/2009 cs252-S09, Lecture 17 15

So what about division (mod)x4 + x2

x= x3 + x with remainder 0

x4 + x2 + 1 X + 1

= x3 + x2 with remainder 1

x4 + 0x3 + x2 + 0x + 1 X + 1

x3

x4 + x3

x3 + x2

+ x2

x3 + x2

0x2 + 0x

+ 0x

0x + 1

+ 0

Remainder 1

4/1/2009 cs252-S09, Lecture 17 16

Producing Galois Fields• These polynomials form a

Galois (finite) field if we take the results of this multiplication modulo a prime polynomial p(x).

– A prime polynomial is one that cannot be written as the product of two non-trivial polynomials q(x)r(x)

– Perform modulo operation by subtracting a (polynomial) multiple of p(x) from the result. If the multiple is 1, this corresponds to XOR-ing the result with p(x).

• For any degree, there exists at least one prime polynomial.

• With it we can form GF(2n)

• Additionally, …

• Every Galois field has a primitive element, , such that all non-zero elements of the field can be expressed as a power of . By raising to powers (modulo p(x)), all non-zero field elements can be formed.

• Certain choices of p(x) make the simple polynomial x the primitive element. These polynomials are called primitive, and one exists for every degree.

• For example, x4 + x + 1 is primitive. So = x is a primitive element and successive powers of will generate all non-zero elements of GF(16). Example on next slide.

4/1/2009 cs252-S09, Lecture 17 17

Galois Fields – Primitives0 = 1

1 = x

2 = x2

3 = x3

4 = x + 1

5 = x2 + x

6 = x3 + x2

7 = x3 + x + 1

8 = x2 + 1

9 = x3 + x

10 = x2 + x + 1

11 = x3 + x2 + x

12 = x3 + x2 + x + 1

13 = x3 + x2 + 1

14 = x3 + 1

15 = 1

• Note this pattern of coefficients matches the bits from our 4-bit LFSR example.

• In general finding primitive polynomials is difficult. Most people just look them up in a table, such as:

4 = x4 mod x4 + x + 1 = x4 xor x4 + x + 1 = x + 1

4/1/2009 cs252-S09, Lecture 17 18

Primitive Polynomialsx2 + x +1x3 + x +1x4 + x +1x5 + x2 +1x6 + x +1x7 + x3 +1x8 + x4 + x3 + x2 +1x9 + x4 +1x10 + x3 +1x11 + x2 +1

x12 + x6 + x4 + x +1x13 + x4 + x3 + x +1x14 + x10 + x6 + x +1x15 + x +1x16 + x12 + x3 + x +1x17 + x3 + 1x18 + x7 + 1x19 + x5 + x2 + x+ 1x20 + x3 + 1x21 + x2 + 1

x22 + x +1x23 + x5 +1x24 + x7 + x2 + x +1x25 + x3 +1x26 + x6 + x2 + x +1x27 + x5 + x2 + x +1x28 + x3 + 1x29 + x +1x30 + x6 + x4 + x +1x31 + x3 + 1x32 + x7 + x6 + x2 +1

Galois Field Hardware

Multiplication by x shift leftTaking the result mod p(x) XOR-ing with the coefficients of p(x)

when the most significant coefficient is 1.Obtaining all 2n-1 non-zero Shifting and XOR-ing 2n-1 times.elements by evaluating xk

for k = 1, …, 2n-1

4/1/2009 cs252-S09, Lecture 17 19

Building an LFSR from a Primitive Poly(Cycle through all non-zero values)

• For k-bit LFSR number the flip-flops with FF1 on the right.

• The feedback path comes from the Q output of the leftmost FF.

• Find the primitive polynomial of the form xk + … + 1.

• The x0 = 1 term corresponds to connecting the feedback directly to the D input of FF 1.

• Each term of the form xn corresponds to connecting an xor between FF n and n+1.

• 4-bit example, uses x4 + x + 1– x4 FF4’s Q output

– x xor between FF1 and FF2

– 1 FF1’s D input

• To build an 8-bit LFSR, use the primitive polynomial x8 + x4 + x3 + x2 + 1 and connect xors between FF2 and FF3, FF3 and FF4, and FF4 and FF5.

Q DQ1Q DQ2Q DQ3Q DQ4

CLK


CLK

Q DQ3 Q DQ2 Q DQ1Q8Q D

4/1/2009 cs252-S09, Lecture 17 20

Reed-Solomon Codes• Galois field codes: code words consist of symbols

– Rather than bits

• Reed-Solomon codes:– Based on polynomials in GF(2k) (I.e. k-bit symbols)– Data as coefficients, code space as values of polynomial:– P(x)=a0+a1x1+… ak-1xk-1

– Coded: P(0),P(1),P(2)….,P(n-1)– Can recover polynomial as long as get any k of n

• Properties: can choose number of check symbols– Reed-Solomon codes are “maximum distance separable” (MDS)– Can add d symbols for distance d+1 code– Often used in “erasure code” mode: as long as no more than n-k

coded symbols erased, can recover data

• Side note: Multiplication by constant in GF(2k) can be represented by kk matrix: ax

– Decompose unknown vector into k bits: x=x0+2x1+…+2k-1xk-1

– Each column is result of multiplying a by 2i

4/1/2009 cs252-S09, Lecture 17 21

Reed-Solomon Codes (con’t)

4

3

2

1

0

43210

43210

43210

43210

43210

43210

43210

77777

66666

55555

44444

33333

22222

11111

a

a

a

a

a

G

• Reed-solomon codes (Non-systematic):

– Data as coefficients, code space as values of polynomial:

– P(x)=a0+a1x1+… a6x6

– Coded: P(0),P(1),P(2)….,P(6)

• Called Vandermonde Matrix: maximum rank

• Different representation(This H’ and G not related)

– Clear that all combinations oftwo or less columns independent d=3

– Very easy to pick whatever d you happen to want: add more rows

• Fast, Systematic version of Reed-Solomon:

– Cauchy Reed-Solomon, others

1111111

0000000'

7654321

7654321H

4/1/2009 cs252-S09, Lecture 17 22

Another Example: Redundant Check• Send a message M and a “check” word C

• Simple function on <M,C> to determine if both received correctly (with high probability)

• Example: XOR all the bytes in M and append the “checksum” byte, C, at the end

– Receiver XORs <M,C>

– What should result be?

– What errors are caught?

***

bit i is XOR of ith bit of each byte

4/1/2009 cs252-S09, Lecture 17 23

Example: TCP Checksum

Application

(HTTP,FTP, DNS)

Transport

(TCP, UDP)

Network

(IP)

Data Link

(Ethernet, 802.11b)

Physical1

2

3

4

7

TCP Packet Format

• TCP Checksum a 16-bit checksum, consisting of the one's complement of the one's complement sum of the contents of the TCP segment header and data, is computed by a sender, and included in a segment transmission. (note end-around carry)

• Summing all the words, including the checksum word, should yield zero

http://en.wikipedia.org/wiki/Checksum

http://en.wikipedia.org/wiki/One%27s_complement

http://en.wikipedia.org/wiki/Signed_number_representations#Ones.27_complement

4/1/2009 cs252-S09, Lecture 17 24

Example: Ethernet CRC-32

Application

(HTTP,FTP, DNS)

Transport

(TCP, UDP)

Network

(IP)

Data Link

(Ethernet, 802.11b)

Physical1

2

3

4

7

4/1/2009 cs252-S09, Lecture 17 25

CRC concept • I have a msg polynomial M(x) of degree m• We both have a generator poly G(x) of degree n• Let r(x) = remainder of M(x) xn / G(x)

– M(x) xn = G(x)p(x) + r(x)– r(x) is of degree n

• What is (M(x) xn – r(x)) / G(x) ?

• So I send you M(x) xn – r(x) – m+n degree polynomial– You divide by G(x) to check– M(x) is just the m most signficant coefficients, r(x) the lower n

• x-bit Message is viewed as coefficients of x-degree polynomial over binary numbers

n bits of zero at the end

tack on n bits of remainder

Instead of the zeros

4/1/2009 cs252-S09, Lecture 17 26

Polynomial division

• When MSB is zero, just shift left, bringing in next bit

• When MSB is 1, XOR with divisor and shift eft

1 0 1 1 0 0 1 0 0 0 01 0 0 1 1


CLK

serial_in

0 0 0 0

1 0 0 1 1

0 0 1 0 1

1

0 1 0 1 0

0

1 0 1 0 1

1 0 0 1 1

1

0 0 1 0 0

4/1/2009 cs252-S09, Lecture 17 27

CRC encoding


CLK

serial_in1 0 1 1 0 0 1 0 0 0 0

0 0 0 0

0 0 0 1 0 1 1 0 0 1 0 0 0 00 0 1 0 1 1 0 0 1 0 0 0 00 1 0 1 1 0 0 1 0 0 0 01 0 1 1 0 0 1 0 0 0 00 1 0 1 0 1 0 0 0 0

1 0 1 0 1 0 0 0 0

0 1 1 0 0 0 0 0

1 0 1 1 0 0 1 1 0 1 0

Message sent:

1 1 0 0 0 0 0

1 0 1 1 0 0

0 1 0 1 0

1 0 1 0

4/1/2009 cs252-S09, Lecture 17 28

CRC decoding


CLK

serial_in1 0 1 1 0 0 1 1 0 1 0

0 0 0 0

0 0 0 1 0 1 1 0 0 1 1 0 1 00 0 1 0 1 1 0 0 1 1 0 1 00 1 0 1 1 0 0 1 1 0 1 01 0 1 1 0 0 1 1 0 1 00 1 0 1 0 1 1 0 1 0

1 0 1 0 1 1 0 1 0

0 1 1 0 1 0 1 0

1 1 0 1 0 1 0 1 0 0 1 1 0

0 0 0 0 0

0 0 0 0

4/1/2009 cs252-S09, Lecture 17 29

Generating Polynomials• CRC-16: G(x) = x16 + x15 + x2 + 1

– detects single and double bit errors

– All errors with an odd number of bits

– Burst errors of length 16 or less

– Most errors for longer bursts

• CRC-32: G(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1

– Used in ethernet

– Also 32 bits of 1 added on front of the message » Initialize the LFSR to all 1s

4/1/2009 cs252-S09, Lecture 17 30

Conclusion• ECC: add redundancy to correct for errors

– (n,k,d) n code bits, k data bits, distance d– Linear codes: code vectors computed by linear transformation

• Erasure code: after identifying “erasures”, can correct• Reed-Solomon codes

– Based on GF(pn), often GF(2n)– Easy to get distance d+1 code with d extra symbols– Often used in erasure mode

• Redundancy useful to gain reliability– Redundant disks+controllers+etc (RAID)– Geographical scale systems (OceanStore)

• Disk technology:– Two innovations: GMR, Vertical recording– Disk Latency =

Queuing Time + Seek Time + Rotation Time + Xfer Time + Ctrl Time