CS252Graduate Computer Architecture
Lecture 17
ECC (continued), CRC
John Kubiatowicz
Electrical Engineering and Computer Sciences
University of California, Berkeley
http://www.eecs.berkeley.edu/~kubitron/cs252
4/1/2009 cs252-S09, Lecture 17 2
Code Space
v0
C0=f(v0)
Code Distance(Hamming Distance)
Review: Code Vector Space
• Not every vector in the code space is valid• Hamming Distance (d):
– Minimum number of bit flips to turn one code word into another
• Number of errors that we can detect: (d-1)• Number of errors that we can fix: ½(d-1)
4/1/2009 cs252-S09, Lecture 17 3
Review: How to Generate code words?• Consider a linear code. Need a Generator Matrix.
– Let vi be the data value (k bits), Ci be resulting code (n bits):
• Are there 2k unique code values?– Only if the k columns of G are linearly independent!
• Of course, need some way of decoding as well.
– Is this linear??? Why or why not?
• A code is systematic if the data is directly encoded within the code words.
– Means Generator has form:– Can always turn non-systematic
code into a systematic one (row ops)
'idi Cfv
ii vC G
P
IG
G must be an nk matrix
4/1/2009 cs252-S09, Lecture 17 4
Implicitly Defining Codes by Check Matrix• But – what is the distance of the code? Not obvious• Instead, consider a parity-check matrix H (n[n-k])
– Compute the following syndrome Si given code element Ci:
– Define valid code words Ci as those that give Si=0 (null space of H)– Size of null space? (n-rank H)=k if (n-k) linearly independent columns in H
• Suppose you transmit code word C, and there is an error. Model this as vector E which flips selected bits of C to get R (received):
• Consider what happens when we multiply by H:
• What is distance of code?– Code has distance d if no sum of d-1 or less columns yields 0– I.e. No error vectors, E, of weight < d have zero syndromes– Code design: Design H matrix with these properties
ii CS H
ECR
EECRS HHH )(
Error vector
4/1/2009 cs252-S09, Lecture 17 5
How to relate G and H (Binary Codes)• Defining H makes it easy to understand distance of
code, but hard to generate code (H defines code implicitly!)
• However, let H be of following form:
• Then, G can be of following form (maximal code size):
• Notice: G generates values in null-space of H
IPH | P is (n-k)k, I is (n-k)(n-k)Result: H is (n-k)k
P
IG P is (n-k)k, I is kk
Result: G is nk
0|
iii vvS
P
IIPGH
4/1/2009 cs252-S09, Lecture 17 6
Simple example (Parity, D=2)• Parity code (8-bits):
• Note: Complexity of logic depends on number of 1s in row!
111111111H
11111111
10000000
01000000
00100000
00010000
00001000
00000100
00000010
00000001
G
v7
v6
v5
v4
v3
v2
v1
v0
+ c8
+ s0
C8
C7
C6
C5
C4
C3
C2
C1
C0
4/1/2009 cs252-S09, Lecture 17 7
Simple example: Repetition (voting, D=3)• Repetition code (1-bit):
• Positives: simple
• Negatives: – Expensive: only 33% of code word is data
– Not packed in Hamming-bound sense (only D=3). Could get much more efficient coding by encoding multiple bits at a time
101
011H
1
1
1
G
C0
C1
C2
Error
v0
C0
C1
C2
4/1/2009 cs252-S09, Lecture 17 8
p1
p2
p3
Note: number bits from left to right.
• Example: (7,4) code: – Protect 4 data bits with 3 parity bits
1 2 3 4 5 6 7
p1 p2 d1 p3 d2 d3 d4
• Bit position number
001 = 110
011 = 310
101 = 510
111 = 710
010 = 210
011 = 310
110 = 610
111 = 710
100 = 410
101 = 510
110 = 610
111 = 710
Simple Example: Hamming Code (d=3)
1111000
1100110
1010101
H
1000
0100
0010
1110
0001
1101
1011
G
4/1/2009 cs252-S09, Lecture 17 9
How to correct errors?• But – what is the distance of the code? Not obvious
• Instead, consider a parity-check matrix H (n[n-k])
– Compute the following syndrome Si given code element Ci:
• Suppose that two correctable error vectors E1 and E2 produce same syndrome:
• But, since both E1 and E2 have (d-1)/2 bits, E1 + E2 d-1 bits set this cannot be true!
• So, syndrome is unique indicator of correctable error vectors
ECS ii HH
set bits moreor d has
0
21
2121
EE
EEEE
HHH
4/1/2009 cs252-S09, Lecture 17 10
Example, d=4 code (SEC-DED)• Design H with:
– All columns non-zero, odd-weight, distinct» Note that odd-weight refers to Hamming Weight, i.e. number of zeros
• Why does this generate d=4?– Any single bit error will generate a distinct, non-zero value– Any double error will generate a distinct, non-zero value
» Why? Add together two distinct columns, get distinct result– Any triple error will generate a non-zero value
» Why? Add together three odd-weight values, get an odd-weight value– So: need four errors before indistinguishable from code word
• Because d=4:– Can correct 1 error (Single Error Correction, i.e. SEC)– Can detect 2 errors (Double Error Detection, i.e. DED)
• Example:– Note: log size of nullspace will
be (columns – rank) = 4, so:» Rank = 4, since rows
independent, 4 cols indpt» Clearly, 8 bits in code word» Thus: (8,4) code
7
6
5
4
3
2
1
0
3
2
1
0
10001110
01001101
00101011
00010111
C
C
C
C
C
C
C
C
S
S
S
S
4/1/2009 cs252-S09, Lecture 17 11
Tweeks:• No reason cannot make code shorter than required
• Suppose n-k=8 bits of parity. What is max code size (n) for d=4?
– Maximum number of unique, odd-weight columns: 27 = 128
– So, n = 128. But, then k = n – (n – k) = 120. Weird!
– Just throw out columns of high weight and make 72, 64 code!
• But – shortened codes like this might have d > 4 in some special directions
– Example: Kaneda paper, catches failures of groups of 4 bits
– Good for catching chip failures when DRAM has groups of 4 bits
• What about EVENODD code?– Can be used to handle two erasures
– What about two dead DRAMs? Yes, if you can really know they are dead
4/1/2009 cs252-S09, Lecture 17 12
4/1/2009 cs252-S09, Lecture 17 13
Galois Field• Definition: Field: a complete group of elements with:
– Addition, subtraction, multiplication, division– Completely closed under these operations– Every element has an additive inverse– Every element except zero has a multiplicative inverse
• Examples:– Real numbers– Binary, called GF(2) Galois Field with base 2
» Values 0, 1. Addition/subtraction: use xor. Multiplicative inverse of 1 is 1– Prime field, GF(p) Galois Field with base p
» Values 0 … p-1» Addition/subtraction/multiplication: modulo p» Multiplicative Inverse: every value except 0 has inverse» Example: GF(5): 11 1 mod 5, 23 1mod 5, 44 1 mod 5
– General Galois Field: GF(pm) base p (prime!), dimension m» Values are vectors of elements of GF(p) of dimension m» Add/subtract: vector addition/subtraction» Multiply/divide: more complex» Just like read numbers but finite!» Common for computer algorithms: GF(2m)
4/1/2009 cs252-S09, Lecture 17 14
Specific Example: Galois Fields GF(2n)• Consider polynomials whose coefficients come from GF(2).
• Each term of the form xn is either present or absent.
• Examples: 0, 1, x, x2, and x7 + x6 + 1
= 1·x7 + 1· x6 + 0 · x5 + 0 · x4 + 0 · x3 + 0 · x2 + 0 · x1 + 1· x0
• With addition and multiplication these form a field:
• “Add”: XOR each element individually with no carry:x4 + x3 + + x + 1
+ x4 + + x2 + x
x3 + x2 + 1
• “Multiply”: multiplying by x is like shifting to the left.
x2 + x + 1 x + 1
x2 + x + 1 x3 + x2 + x x3 + 1
4/1/2009 cs252-S09, Lecture 17 15
So what about division (mod)x4 + x2
x= x3 + x with remainder 0
x4 + x2 + 1 X + 1
= x3 + x2 with remainder 1
x4 + 0x3 + x2 + 0x + 1 X + 1
x3
x4 + x3
x3 + x2
+ x2
x3 + x2
0x2 + 0x
+ 0x
0x + 1
+ 0
Remainder 1
4/1/2009 cs252-S09, Lecture 17 16
Producing Galois Fields• These polynomials form a
Galois (finite) field if we take the results of this multiplication modulo a prime polynomial p(x).
– A prime polynomial is one that cannot be written as the product of two non-trivial polynomials q(x)r(x)
– Perform modulo operation by subtracting a (polynomial) multiple of p(x) from the result. If the multiple is 1, this corresponds to XOR-ing the result with p(x).
• For any degree, there exists at least one prime polynomial.
• With it we can form GF(2n)
• Additionally, …
• Every Galois field has a primitive element, , such that all non-zero elements of the field can be expressed as a power of . By raising to powers (modulo p(x)), all non-zero field elements can be formed.
• Certain choices of p(x) make the simple polynomial x the primitive element. These polynomials are called primitive, and one exists for every degree.
• For example, x4 + x + 1 is primitive. So = x is a primitive element and successive powers of will generate all non-zero elements of GF(16). Example on next slide.
4/1/2009 cs252-S09, Lecture 17 17
Galois Fields – Primitives0 = 1
1 = x
2 = x2
3 = x3
4 = x + 1
5 = x2 + x
6 = x3 + x2
7 = x3 + x + 1
8 = x2 + 1
9 = x3 + x
10 = x2 + x + 1
11 = x3 + x2 + x
12 = x3 + x2 + x + 1
13 = x3 + x2 + 1
14 = x3 + 1
15 = 1
• Note this pattern of coefficients matches the bits from our 4-bit LFSR example.
• In general finding primitive polynomials is difficult. Most people just look them up in a table, such as:
4 = x4 mod x4 + x + 1 = x4 xor x4 + x + 1 = x + 1
4/1/2009 cs252-S09, Lecture 17 18
Primitive Polynomialsx2 + x +1x3 + x +1x4 + x +1x5 + x2 +1x6 + x +1x7 + x3 +1x8 + x4 + x3 + x2 +1x9 + x4 +1x10 + x3 +1x11 + x2 +1
x12 + x6 + x4 + x +1x13 + x4 + x3 + x +1x14 + x10 + x6 + x +1x15 + x +1x16 + x12 + x3 + x +1x17 + x3 + 1x18 + x7 + 1x19 + x5 + x2 + x+ 1x20 + x3 + 1x21 + x2 + 1
x22 + x +1x23 + x5 +1x24 + x7 + x2 + x +1x25 + x3 +1x26 + x6 + x2 + x +1x27 + x5 + x2 + x +1x28 + x3 + 1x29 + x +1x30 + x6 + x4 + x +1x31 + x3 + 1x32 + x7 + x6 + x2 +1
Galois Field Hardware
Multiplication by x shift leftTaking the result mod p(x) XOR-ing with the coefficients of p(x)
when the most significant coefficient is 1.Obtaining all 2n-1 non-zero Shifting and XOR-ing 2n-1 times.elements by evaluating xk
for k = 1, …, 2n-1
4/1/2009 cs252-S09, Lecture 17 19
Building an LFSR from a Primitive Poly(Cycle through all non-zero values)
• For k-bit LFSR number the flip-flops with FF1 on the right.
• The feedback path comes from the Q output of the leftmost FF.
• Find the primitive polynomial of the form xk + … + 1.
• The x0 = 1 term corresponds to connecting the feedback directly to the D input of FF 1.
• Each term of the form xn corresponds to connecting an xor between FF n and n+1.
• 4-bit example, uses x4 + x + 1– x4 FF4’s Q output
– x xor between FF1 and FF2
– 1 FF1’s D input
• To build an 8-bit LFSR, use the primitive polynomial x8 + x4 + x3 + x2 + 1 and connect xors between FF2 and FF3, FF3 and FF4, and FF4 and FF5.
Q DQ1Q DQ2Q DQ3Q DQ4
CLK
Q DQ4Q DQ5Q DQ6Q DQ7
CLK
Q DQ3 Q DQ2 Q DQ1Q8Q D
4/1/2009 cs252-S09, Lecture 17 20
Reed-Solomon Codes• Galois field codes: code words consist of symbols
– Rather than bits
• Reed-Solomon codes:– Based on polynomials in GF(2k) (I.e. k-bit symbols)– Data as coefficients, code space as values of polynomial:– P(x)=a0+a1x1+… ak-1xk-1
– Coded: P(0),P(1),P(2)….,P(n-1)– Can recover polynomial as long as get any k of n
• Properties: can choose number of check symbols– Reed-Solomon codes are “maximum distance separable” (MDS)– Can add d symbols for distance d+1 code– Often used in “erasure code” mode: as long as no more than n-k
coded symbols erased, can recover data
• Side note: Multiplication by constant in GF(2k) can be represented by kk matrix: ax
– Decompose unknown vector into k bits: x=x0+2x1+…+2k-1xk-1
– Each column is result of multiplying a by 2i
4/1/2009 cs252-S09, Lecture 17 21
Reed-Solomon Codes (con’t)
4
3
2
1
0
43210
43210
43210
43210
43210
43210
43210
77777
66666
55555
44444
33333
22222
11111
a
a
a
a
a
G
• Reed-solomon codes (Non-systematic):
– Data as coefficients, code space as values of polynomial:
– P(x)=a0+a1x1+… a6x6
– Coded: P(0),P(1),P(2)….,P(6)
• Called Vandermonde Matrix: maximum rank
• Different representation(This H’ and G not related)
– Clear that all combinations oftwo or less columns independent d=3
– Very easy to pick whatever d you happen to want: add more rows
• Fast, Systematic version of Reed-Solomon:
– Cauchy Reed-Solomon, others
1111111
0000000'
7654321
7654321H
4/1/2009 cs252-S09, Lecture 17 22
Another Example: Redundant Check• Send a message M and a “check” word C
• Simple function on <M,C> to determine if both received correctly (with high probability)
• Example: XOR all the bytes in M and append the “checksum” byte, C, at the end
– Receiver XORs <M,C>
– What should result be?
– What errors are caught?
***
bit i is XOR of ith bit of each byte
4/1/2009 cs252-S09, Lecture 17 23
Example: TCP Checksum
Application
(HTTP,FTP, DNS)
Transport
(TCP, UDP)
Network
(IP)
Data Link
(Ethernet, 802.11b)
Physical1
2
3
4
7
TCP Packet Format
• TCP Checksum a 16-bit checksum, consisting of the one's complement of the one's complement sum of the contents of the TCP segment header and data, is computed by a sender, and included in a segment transmission. (note end-around carry)
• Summing all the words, including the checksum word, should yield zero
4/1/2009 cs252-S09, Lecture 17 24
Example: Ethernet CRC-32
Application
(HTTP,FTP, DNS)
Transport
(TCP, UDP)
Network
(IP)
Data Link
(Ethernet, 802.11b)
Physical1
2
3
4
7
4/1/2009 cs252-S09, Lecture 17 25
CRC concept • I have a msg polynomial M(x) of degree m• We both have a generator poly G(x) of degree n• Let r(x) = remainder of M(x) xn / G(x)
– M(x) xn = G(x)p(x) + r(x)– r(x) is of degree n
• What is (M(x) xn – r(x)) / G(x) ?
• So I send you M(x) xn – r(x) – m+n degree polynomial– You divide by G(x) to check– M(x) is just the m most signficant coefficients, r(x) the lower n
• x-bit Message is viewed as coefficients of x-degree polynomial over binary numbers
n bits of zero at the end
tack on n bits of remainder
Instead of the zeros
4/1/2009 cs252-S09, Lecture 17 26
Polynomial division
• When MSB is zero, just shift left, bringing in next bit
• When MSB is 1, XOR with divisor and shift eft
1 0 1 1 0 0 1 0 0 0 01 0 0 1 1
Q DQ1Q DQ2Q DQ3Q DQ4
CLK
serial_in
0 0 0 0
1 0 0 1 1
0 0 1 0 1
1
0 1 0 1 0
0
1 0 1 0 1
1 0 0 1 1
1
0 0 1 0 0
4/1/2009 cs252-S09, Lecture 17 27
CRC encoding
Q DQ1Q DQ2Q DQ3Q DQ4
CLK
serial_in1 0 1 1 0 0 1 0 0 0 0
0 0 0 0
0 0 0 1 0 1 1 0 0 1 0 0 0 00 0 1 0 1 1 0 0 1 0 0 0 00 1 0 1 1 0 0 1 0 0 0 01 0 1 1 0 0 1 0 0 0 00 1 0 1 0 1 0 0 0 0
1 0 1 0 1 0 0 0 0
0 1 1 0 0 0 0 0
1 0 1 1 0 0 1 1 0 1 0
Message sent:
1 1 0 0 0 0 0
1 0 1 1 0 0
0 1 0 1 0
1 0 1 0
4/1/2009 cs252-S09, Lecture 17 28
CRC decoding
Q DQ1Q DQ2Q DQ3Q DQ4
CLK
serial_in1 0 1 1 0 0 1 1 0 1 0
0 0 0 0
0 0 0 1 0 1 1 0 0 1 1 0 1 00 0 1 0 1 1 0 0 1 1 0 1 00 1 0 1 1 0 0 1 1 0 1 01 0 1 1 0 0 1 1 0 1 00 1 0 1 0 1 1 0 1 0
1 0 1 0 1 1 0 1 0
0 1 1 0 1 0 1 0
1 1 0 1 0 1 0 1 0 0 1 1 0
0 0 0 0 0
0 0 0 0
4/1/2009 cs252-S09, Lecture 17 29
Generating Polynomials• CRC-16: G(x) = x16 + x15 + x2 + 1
– detects single and double bit errors
– All errors with an odd number of bits
– Burst errors of length 16 or less
– Most errors for longer bursts
• CRC-32: G(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1
– Used in ethernet
– Also 32 bits of 1 added on front of the message » Initialize the LFSR to all 1s
4/1/2009 cs252-S09, Lecture 17 30
Conclusion• ECC: add redundancy to correct for errors
– (n,k,d) n code bits, k data bits, distance d– Linear codes: code vectors computed by linear transformation
• Erasure code: after identifying “erasures”, can correct• Reed-Solomon codes
– Based on GF(pn), often GF(2n)– Easy to get distance d+1 code with d extra symbols– Often used in erasure mode
• Redundancy useful to gain reliability– Redundant disks+controllers+etc (RAID)– Geographical scale systems (OceanStore)
• Disk technology:– Two innovations: GMR, Vertical recording– Disk Latency =
Queuing Time + Seek Time + Rotation Time + Xfer Time + Ctrl Time
Top Related