FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA...

29
FPGA Architectures and FPGA Architectures and Operation for Tolerating Operation for Tolerating SEUs SEUs Chuck Stroud Chuck Stroud Electrical and Computer Engineering Electrical and Computer Engineering Auburn University Auburn University

Transcript of FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA...

Page 1: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

FPGA Architectures and FPGA Architectures and Operation for Tolerating Operation for Tolerating SEUsSEUs

Chuck StroudChuck StroudElectrical and Computer EngineeringElectrical and Computer Engineering

Auburn UniversityAuburn University

Page 2: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 2

Outline of PresentationOutline of PresentationField Programmable Gate Arrays (Field Programmable Gate Arrays (FPGAsFPGAs))

How Programmable Logic WorksHow Programmable Logic WorksConfiguration MemoryConfiguration Memory

Single Event Upset (SEU) Problem in Single Event Upset (SEU) Problem in FPGAsFPGAsConfiguration MemoryConfiguration MemorySystem Function Memory ElementsSystem Function Memory Elements

Architectural SolutionsArchitectural SolutionsHamming Code for MemoryHamming Code for Memory

SEU Controller for Configuration MemorySEU Controller for Configuration MemoryTriple Modular Redundancy and Guard BandsTriple Modular Redundancy and Guard Bands

Operational SolutionsOperational SolutionsPlan for AubieSatPlan for AubieSat--22

Summary & ConclusionsSummary & ConclusionsAubieSatAubieSatAUBIAUBIeeSSaaTT

Page 3: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 3

11100110100010001001010100010111110011010001000100101010001011100010100101010101001001000100010001010010101010100100100010001010100100100110010010000111100101010010010011001001000011110001100101000100001100100010100010110010100010000110010001010001001001001000101001010101001001000100100100010100101010100100100101000101001010001010010100100010100010100101000101001010010001001010101110101010101010101010100101010111010101010101010101010101111011111000000000000001101010111101111100000000000000110100111110000100111000001110010010011111000010011100000111001001010000000011111001001000101000101000000001111100100100010100111001001010000111100011100010011100100101000011110001110001001010101010101010101001010010101101010101010101010100101001010101001001010101010101010010010010100100101010101010101001001001

Basic FPGA OperationBasic FPGA OperationWriting configuration Writing configuration memorymemory ⇒⇒ ddefines system efines system functionfunction

Input/Output (I/O) CellsInput/Output (I/O) CellsLogic in Logic BlocksLogic in Logic BlocksConnections between Connections between Logic Blocks & I/O cellsLogic Blocks & I/O cells

Changing configuration Changing configuration memory data memory data ⇒⇒ changes changes system functionsystem function

Can change at anytimeCan change at anytimePartial reconfigurationPartial reconfiguration

SEUsSEUs can change can change configuration memory configuration memory data to another functiondata to another function

Page 4: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 4

FPGA ResourcesFPGA Resources

79,704,83279,704,83242,10442,104Configuration memory bitsConfiguration memory bits1,2001,2006262Input/output cellsInput/output cells

OtherOther

51251200DSP coresDSP cores5765761616Memory cores per FPGAMemory cores per FPGA

36,86436,864128128Bits per memory coreBits per memory coreSpecializeSpecializedd

CoresCores

3,4623,462139139PIPsPIPs per PLBper PLB4064064545Wire segments per PLBWire segments per PLB

RoutingRouting

8811LUTsLUTs and flipand flip--flops per PLBflops per PLB25,92025,920256256PLBsPLBs per FPGAper FPGA

LogicLogic

Large FPGALarge FPGASmall FPGASmall FPGAFPGA ResourceFPGA Resource

Almost everything in FPGA eitherAlmost everything in FPGA eitherUses memory elements, orUses memory elements, orIs controlled by configuration memoryIs controlled by configuration memory

Page 5: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 5

PLB ArchitecturePLB ArchitectureLookLook--up Table (LUT) implements truth table up Table (LUT) implements truth table for combination logic functionsfor combination logic functionsCarry & control logic implements fast Carry & control logic implements fast adders/adders/subtractorssubtractorsMemory elements susceptible to Memory elements susceptible to SEUsSEUs::

FlipFlip--flop/latchflop/latchLUTsLUTs are memory elements storing truth tableare memory elements storing truth table

In some In some FPGAsFPGAs LUTsLUTs can function as small can function as small RAMsRAMs

carry in

LUT/RAM Carry &

ControlLogic

Flip-flop/Latch

4

carry out

3

Control

OutputQ output

Input[1:4]

clock, enable, set/reset

Page 6: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 6

Combinational Logic Combinational Logic FucntionsFucntionsAny digital logic function Any digital logic function can be represented by a can be represented by a truth tabletruth tableMultiplexer exampleMultiplexer example

If S = 0, Z = AIf S = 0, Z = AIf S = 1, Z = BIf S = 1, Z = BHeavily used in Heavily used in FPGAsFPGAs

S input controlled by S input controlled by configuration memory bit to configuration memory bit to allow selection of signal allow selection of signal flowflow

A

S

B

Z

0

1

A

B

S

Z

Logic symbol

01

S A B Z0 0 0 00 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 01 1 1 1

Truth table

Page 7: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 7

LookLook--up Tablesup TablesConfiguration Configuration memory holds memory holds outputs for truth outputs for truth tabletableInternal signals Internal signals connect to connect to control signals control signals of multiplexers of multiplexers to select value to select value of truth table for of truth table for any given input any given input valuevalue

0

1

A

B

S

Z

Multiplexer

S A B Z0 0 0 00 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 01 1 1 1

Truth table

B A S

0

1

Z

0

1

0

1

0

1

0

1

0

1

0

1

1 0 1

1

0

0

1

1

0

1

0

1

Page 8: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 8

Data In

Add

ress

Dec

oderWriteEnable

In0In1In2

ld0

ld1

ld2

ld3

ld4

ld5

ld6

ld7

LookLook--up Table Based up Table Based RAMsRAMsNormal LUT mode Normal LUT mode performs read performs read operationsoperationsAddress decoder Address decoder with write enable with write enable generates load generates load signals to latches signals to latches for write operationsfor write operationsSmall Small RAMsRAMs but but can be combined can be combined for larger for larger RAMsRAMs

In0 In1 In2

0

1

Z

0

1

0

1

0

1

0

1

0

1

0

1

0

0

1

1

0

1

0

1

Page 9: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 9

Xilinx VirtexXilinx Virtex--4 4 FPGAsFPGAsConfiguration memory: 4.7M to Configuration memory: 4.7M to 50.8M bits of RAM50.8M bits of RAMLogic Blocks: 1,536 to 22,272Logic Blocks: 1,536 to 22,272

4 4 LUTsLUTs (4(4--input)input)4 4 LUTs/RAMsLUTs/RAMs (4(4--input)input)8 FF/latches8 FF/latches

Block Block RAMsRAMs: 48 to 552 18K: 48 to 552 18K--bit bit dualdual--port RAMsport RAMs

Also operate as FIFOsAlso operate as FIFOsDSP cores: 32 to 512, each DSP cores: 32 to 512, each includes:includes:

18x1818x18--bit multiplierbit multiplier4848--bit adder & accumulatorbit adder & accumulator

PowerPC processors: 0 to 2PowerPC processors: 0 to 2

PC

PC

Page 10: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 10

ItIt’’s Getting Worse All The Times Getting Worse All The TimeSmaller design rules & lower supply voltagesSmaller design rules & lower supply voltagesM. M. OhlssonOhlsson, P. , P. Dyreklev, K. Johansson, & P. Alfke,

““Neutron Single Even Upsets in SRAMNeutron Single Even Upsets in SRAM--Based Based FPGAsFPGAs,,””Proc. Proc. 1998 1998 IEEE Nuclear & Space Radiation Effects ConfIEEE Nuclear & Space Radiation Effects Conf. .

Used radiation chamber to calculate SEU frequency at Used radiation chamber to calculate SEU frequency at altitude of 10km at 60altitude of 10km at 60°°N (Sweden)N (Sweden)

3.3V3.3V5V5VVccVcc2.8x102.8x1055 hrshrs

0.350.35µµmmXC4010XLXC4010XL

1.3x101.3x1066 hrshrs1 SEU every1 SEU every

0.600.60µµmmProcessProcessXC4010EXC4010EFPGAFPGA

Increase by Increase by a factor of a factor of

21.521.5

Projecting this for 3 design rule shrinks & 2 voltage reductionsProjecting this for 3 design rule shrinks & 2 voltage reductions we getwe get≈≈1 SEU every 28.2 hrs1 SEU every 28.2 hrs

400 slices in 4010400 slices in 4010vs.vs.

89,088 in Virtex89,088 in Virtex--44

Page 11: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 11

Hardware SolutionsHardware SolutionsFPGA manufacturers are including FPGA manufacturers are including somesomemechanisms formechanisms for

Detecting/correcting Detecting/correcting SEUsSEUsHamming codeHamming code

Configuration memoryConfiguration memorySEU controller soft coreSEU controller soft core

RAM coresRAM cores

Tolerating Tolerating SEUsSEUsTools for Triple Modular Redundancy (TMR)Tools for Triple Modular Redundancy (TMR)

TMR would be used for FPGA memory elements not TMR would be used for FPGA memory elements not covered by Hamming codecovered by Hamming code

Allows limited number of Allows limited number of SEUsSEUs to be toleratedto be tolerated

Need more & better techniquesNeed more & better techniques

Page 12: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 12

Calculating Hamming CodeCalculating Hamming CodeHH = # Hamming bits= # Hamming bits

DD++HH+1 +1 ≤≤ 22HH

D= D= # data bits# data bitsHamming, BSTJ Hamming, BSTJ ‘‘5050

DD=8 example=8 exampleH1=D1H1=D1⊕⊕D2D2⊕⊕D4D4⊕⊕D5D5⊕⊕D7D7H2=D1H2=D1⊕⊕D3D3⊕⊕D4D4⊕⊕D6D6⊕⊕D7D7H3=D2H3=D2⊕⊕D3D3⊕⊕D4D4⊕⊕D8D8H4=D5H4=D5⊕⊕D6D6⊕⊕D7D7⊕⊕D8D8

Hamming distance, Hamming distance, dd=3==3=EE++CC+1+1Single bit error detection & Single bit error detection & correction (SEC)correction (SEC)

EE=1, =1, CC=1=1Additional parity bit, Additional parity bit, dd=4==4=EE++CC+1+1

Parity over data & Hamming bitsParity over data & Hamming bitsDouble error detection (DED) & Double error detection (DED) & single error correction (SEC)single error correction (SEC)

EE=2, =2, CC=1=1

121110987654321PositionPosition

1000

H4

0100

H3

0010

H2

0001

H1

11110000Parity H410001110Parity H301101101Parity H201011011Parity H1

D8D7D6D5D4D3D2D1BitBit

Hamming mismatch, no parity error2-bit error detectionHamming mismatch, parity error1-bit correctable errorHamming match, no parity errorNo bit error

ConditionError Type

E = #bit errors to detectC = #bit errors to correct

Page 13: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 13

Hamming Code OperationHamming Code OperationExample: RAM or configuration memoryExample: RAM or configuration memoryInput (Generate Circuit):Input (Generate Circuit):

Generate Hamming code for dataGenerate Hamming code for dataStore data and Hamming bitsStore data and Hamming bits

Output (Detect/Correct Circuit):Output (Detect/Correct Circuit):Regenerate Hamming code for dataRegenerate Hamming code for dataBitBit--wise XOR with stored Hamming bitswise XOR with stored Hamming bits

NonNon--zero syndrome indicateszero syndrome indicatesError detection and bit position of error bitError detection and bit position of error bit

Flip that bit to correctFlip that bit to correct

Extra parity bit determines nonExtra parity bit determines non--correctable double bit errorcorrectable double bit errorIndication can disable correction circuit to avoid further corruIndication can disable correction circuit to avoid further corruptionption

HHstoredstored

HHregeneratedregeneratedSyndromeSyndrome

HHHH

HH

Syndrome Decoder

DDii DDiiSyndromeSyndromeHH

11 DDii

Page 14: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 14

H1=0H1=0H2=0H2=0H3=1H3=1H4=1H4=1

H1=0H1=0H2=1H2=1H3=0H3=0H4=1H4=1

H1=0H1=0H2=1H2=1H3=1H3=1H4=0H4=0

Error Detection and CorrectionError Detection and CorrectionSingle bit error examplesSingle bit error examples

D3 is erroneousD3 is erroneousChanges H3 and H2Changes H3 and H2

Syndrome = 0110 = bit 6Syndrome = 0110 = bit 6D6 is erroneousD6 is erroneous

Changes H4 and H2Changes H4 and H2Syndrome = 1010 = bit 10Syndrome = 1010 = bit 10

Odd number of bits changeOdd number of bits changeOverall parity bit error Overall parity bit error ⇒⇒ SECSEC

Double bit error exampleDouble bit error exampleD3 and D6 are erroneousD3 and D6 are erroneous

Changes H3 and H4 (but not H2)Changes H3 and H4 (but not H2)Syndrome = 1100 = bit 12Syndrome = 1100 = bit 12

Indicates error in D8Indicates error in D8Even number of bits changeEven number of bits change

No overall parity error No overall parity error ⇒⇒ DEDDED

121211111010998877665544332211PositionPosition

11000000

H4H4

00110000

H3H3

00001100

H2H2

00000011

H1H1

1111111100000000110000001111110000111100111100110011001111001111

D8D8D7D7D6D6D5D5D4D4D3D3D2D2D1D1BitBit

01100110 10101010 11001100

Page 15: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 15

VirtexVirtex--4 Hamming Codes4 Hamming CodesHamming bits stored in each frame of Hamming bits stored in each frame of configuration memoryconfiguration memory

Frame ECC circuit checks Hamming code Frame ECC circuit checks Hamming code as each frame is read & indicatesas each frame is read & indicates

Single correctable errorsSingle correctable errorsNeed additional circuit to fix erroneous bitNeed additional circuit to fix erroneous bit

Multiple nonMultiple non--correctable errorscorrectable errorsNeed to reload configuration memoryNeed to reload configuration memory

Block Block RAMsRAMsContents not covered by configuration Contents not covered by configuration memory Hamming bitsmemory Hamming bitsRAMsRAMs have ECC mode with Hamming bitshave ECC mode with Hamming bitsDetection and correction circuitryDetection and correction circuitry

Correction only on output dataCorrection only on output dataNeed to write corrected data back in RAMNeed to write corrected data back in RAM

PC

PC

Page 16: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 16

Xilinx VirtexXilinx Virtex--4 Frame ECC Circuit4 Frame ECC CircuitHamming code stored in configuration memoryHamming code stored in configuration memory

1,3121,312--bit frame includesbit frame includesUp to 1,300 bits of configuration dataUp to 1,300 bits of configuration data11 Hamming bits + 1 overall parity bit11 Hamming bits + 1 overall parity bit

Hamming code generated by configuration bit generation Hamming code generated by configuration bit generation program and downloaded with configuration dataprogram and downloaded with configuration dataHamming code check performed on each read operationHamming code check performed on each read operation

No bit error correctionNo bit error correctionmust be performed by user logic and written back to configuratiomust be performed by user logic and written back to configuration n memorymemory

Status indications:Status indications:No errorNo errorDEDDEDSEC w/ syndromeSEC w/ syndromeSyndrome validSyndrome valid

ConfigMemory

1,312-bitwords

Parity BitGenerator

D

H

HammingCode

Generator

ParityCheck

HammingCheckH

OutputData

FRAME ECC

FrameAddressRegister

ErrorIndicators

DED

SECHSyndrome

Page 17: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 17

XilinxXilinx’’s SEU Controllers SEU ControllerSoft core synthesized with userSoft core synthesized with user’’s designs design

Sequences through frames one at a timeSequences through frames one at a timeUses Frame ECC circuit and Internal Configuration Access Uses Frame ECC circuit and Internal Configuration Access Port (ICAP) to detectPort (ICAP) to detect

Single bit detectable errorsSingle bit detectable errorsPicoBlazePicoBlaze microcontroller corrects bit and writes frame microcontroller corrects bit and writes frame back into configuration memoryback into configuration memory

Double bit nonDouble bit non--correctable errorscorrectable errors

Requires Requires ≈≈ 140 140 PLBsPLBs & 2 Block & 2 Block RAMsRAMs≈≈ 30 30 PLBsPLBs for for PicoBlazePicoBlaze and 1 RAM for and 1 RAM for program memoryprogram memory≈≈ 110 110 PLBsPLBs for SEC circuit and ICAP interfacefor SEC circuit and ICAP interface

Plus 1 RAM for storing and correcting frame dataPlus 1 RAM for storing and correcting frame data

SEU controller operation SEU controller operation (full chip @ 100MHz)(full chip @ 100MHz)

Error detection time Error detection time ≈≈ 1.2 to 14.6 1.2 to 14.6 msecmsecSmallest to largest VirtexSmallest to largest Virtex--4 4

Error correction time Error correction time ≈≈ 24 to 278 24 to 278 msecmsec

Page 18: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 20

Complicating the ProblemComplicating the ProblemBlock RAM contents not covered by configuration Block RAM contents not covered by configuration memory Hamming bitsmemory Hamming bits

Current program memory for Current program memory for PicoBlazePicoBlaze not SEU tolerantnot SEU tolerantChanging data in memory elementsChanging data in memory elements

FFsFFs & LUT& LUT--RAMsRAMsDo not change Hamming bitsDo not change Hamming bits

Restore operationRestore operationLoads Loads configconfig memory data into memory data into FFsFFs, LUT, LUT--RAMsRAMs, and , and BRAMsBRAMs

Capture operationCapture operationLoads FF, LUTLoads FF, LUT--RAM, and BRAM contents to RAM, and BRAM contents to configconfig memmem for readfor read

Destroys Hamming informationDestroys Hamming informationCannot use Capture with SEU controllerCannot use Capture with SEU controller

Operational restrictions on FPGA for SEU toleranceOperational restrictions on FPGA for SEU tolerance

SEU controller not SEUSEU controller not SEU--toleranttolerantNeed TMR SEU controller design Need TMR SEU controller design Need TMR Need TMR PicoBlazePicoBlaze design w/ ECC RAM for program design w/ ECC RAM for program memmem

Need to write corrected single bit errors back into program memoNeed to write corrected single bit errors back into program memoryry

Page 19: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 21

VirtexVirtex--4 Block 4 Block RAMssRAMssContain 48 to 552 18KContain 48 to 552 18K--bit dualbit dual--port port RAMsRAMs

Program from 16Kx1Program from 16Kx1--bit RAM to 512x36bit RAM to 512x36--bit RAMbit RAMNo SEU protection in these modes of operationNo SEU protection in these modes of operation

Can operate as 24 to 276 36KCan operate as 24 to 276 36K--bit bit RAMsRAMs with ECCwith ECC512x72512x72--bit bit RAMsRAMs

6464--bit databit data77--bit Hammingbit Hamming

Single error correctionSingle error correction11--bit overall paritybit overall parity

Double error detectionDouble error detection

Can also operate as Can also operate as FIFOsFIFOsWith or without ECC modeWith or without ECC mode

PPC

PPC

=DSPs=PLBs

=Block RAMs/FIFOs=I/O Buffers

Page 20: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 22

Xilinx VirtexXilinx Virtex--4 ECC RAM4 ECC RAM

D64D63D62D61D60D59D58H71000D57D56D55D54D53D52D51D500111D49D48D47D46D45D44D43D420110D41D40D39D38D37D36D35D340101D33D32D31D30D29D28D27H60100D26D25D24D23D22D21D20D190011D18D17D16D15D14D13D12H50010D11D10D9D8D7D6D5H40001D4D3D2H3D1H2H1no err0000111110101100011010001000Syndrome

Separate Hamming code Separate Hamming code generatorsgenerators

Separate write & read portsSeparate write & read portsOnly RAM output data Only RAM output data corrected by ECCcorrected by ECC

Contents of RAM still erroneousContents of RAM still erroneousExtra circuitry to write corrected Extra circuitry to write corrected data back into RAMdata back into RAM

VirtexVirtex--5 has internal correct mode5 has internal correct mode

RAMCore

512words

64+7+1 bits/word

D=64

H=7

InputData

Parity BitGenerator

D

H

HammingCode

Generator

ParityCheck

HammingCheckH

Bit ErrorCorrection Circuit

OutputData

Generate Detect/Correct

HammingCode

Generator

Parity BitGenerator

D

ErrorIndicators

DED

SEC

readaddr

writeaddr

Page 21: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 24

Triple Modular Redundancy (TMR)Replicate modules and add majority Replicate modules and add majority voter(svoter(s))

Protects against single faults in replicated modulesProtects against single faults in replicated modulesTMR SEU susceptibility problem in TMR SEU susceptibility problem in FPGAsFPGAs

Single faults in can cause multiple modules to failSingle faults in can cause multiple modules to failPrimarily biPrimarily bi--directional directional PIPsPIPs

TMR fault isolation with guard band regionsTMR fault isolation with guard band regionsGuard bands isolate module components and routingGuard bands isolate module components and routingAn SEU can cause errors in only one moduleAn SEU can cause errors in only one module

Deactivated switch

Module 2

Module 1 Module 3

Majority Voter

isolatedwire segments

Majority Voter

Module1

Module2

Module3

Guard Bands

Page 22: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 25

Programmable Interconnect PointsProgrammable Interconnect PointsBreakBreak--point PIPpoint PIP

Connect or isolate 2 wire segmentsConnect or isolate 2 wire segmentsCrossCross--point PIPpoint PIP

2 nets straight through2 nets straight through1 net turns corner and/or fans out1 net turns corner and/or fans out

Compound crossCompound cross--point PIPpoint PIPCollection of 6 breakCollection of 6 break--point point PIPsPIPs

Can route to two isolated signal netsCan route to two isolated signal netsThese biThese bi--directional directional PIPsPIPs were significant were significant portion of routing resources in early portion of routing resources in early FPGAsFPGAs

Now less than 0.4% of routing resourcesNow less than 0.4% of routing resourcesMultiplexer PIPMultiplexer PIP

Directional and bufferedDirectional and bufferedMain routing resource in recent Main routing resource in recent FPGAsFPGAsSelect 1Select 1--ofof--NN inputs for outputinputs for output

Buffer prevents some SEU affectsBuffer prevents some SEU affectsBut not all But not all –– currently studying effectscurrently studying effects

Page 23: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 26

Guard BandsGuard BandsGuard Bands reduce Guard Bands reduce interaction of signals interaction of signals between modulesbetween modules6 CLB wide 6 CLB wide GBsGBs

Good isolation but big Good isolation but big area overheadarea overhead

1 CLB wide 1 CLB wide GBsGBsSome isolationSome isolation

Turn off stub trimming Turn off stub trimming to see used wire to see used wire segment interactionsegment interaction

Still have problemsStill have problemsLong linesLong lines

Long lines use biLong lines use bi--directional directional PIPsPIPsPACE controls logic PACE controls logic but not routingbut not routing

CLB isolation for fault CLB isolation for fault monitoring circuitsmonitoring circuits

Page 24: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 27

Fault Monitoring CircuitFault Monitoring CircuitLocated in guard band regionsLocated in guard band regionsCompares outputs of adjacent working regionsCompares outputs of adjacent working regions

Can be used to compare internal nodesCan be used to compare internal nodesEarlier SEU detection than output aloneEarlier SEU detection than output alone

Any mismatch implies SEU occurredAny mismatch implies SEU occurredCount errors and/or take actionCount errors and/or take action

Scrub configuration memoryScrub configuration memoryActivate SEU controller to locate/correct single bit errorsActivate SEU controller to locate/correct single bit errors

Failure indications point to frames to scan for errorsFailure indications point to frames to scan for errors

output fromregion #1

PLBs forfault isolation

guard bandwith fault

monitor circuit

output fromregion #2

Interrupt to SEU controller

SR latch

Module1

Module2

Module3

Guard Bands

1 01 1

Page 25: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 28

Majority Voter for SEU ControllerMajority Voter for SEU ControllerAdding Adding XORsXORs to majority voting circuit to majority voting circuit gives circular comparison of module gives circular comparison of module outputsoutputs

Better diagnostic resolution for faulty modules Better diagnostic resolution for faulty modules to scan for SEU controllerto scan for SEU controllerLower latency for locating/correcting Lower latency for locating/correcting SEUsSEUs

Out1 Out2 Out3 Out1 Out2 Out3

11 11 00

XX

00 11 11

XX

11 00 11

XX

Page 26: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 29

TMRTMR

SEUSEUECCECC

ECCECC

SEUSEU

TMRTMR

Our Plan for VirtexOur Plan for Virtex--4 4 FPGAsFPGAsConfiguration memory: 4.7M to Configuration memory: 4.7M to 50.8M bits of RAM50.8M bits of RAMPLBsPLBs: 1,536 to 22,272: 1,536 to 22,272

4 4 LUTsLUTs (4(4--input)input)4 4 LUTs/RAMsLUTs/RAMs (4(4--input)input)8 FF/latches8 FF/latches

Block Block RAMsRAMs: 24 to 276 32K: 24 to 276 32K--bit bit ECC ECC RAMsRAMs ((ECC onlyECC only))

Also operate as FIFOsAlso operate as FIFOsDSP cores: 32 to 512, each DSP cores: 32 to 512, each includes:includes:

18x1818x18--bit multiplierbit multiplier4848--bit adder & accumulatorbit adder & accumulator

PowerPC processors: 0 to 2PowerPC processors: 0 to 2

PC

PC

CanCan’’t TMR PowerPCs!!t TMR PowerPCs!!Use TMR MicroUse TMR Micro-- or Picoor Pico--BlazeBlaze

Page 27: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 30

AUBIAUBIeeSSaaTT PlanPlanCount, correct, and classify Count, correct, and classify SEUsSEUs in an actual in an actual FPGA in spaceFPGA in space

Compare with sensor measurementsCompare with sensor measurementsDetermine if Determine if SEUsSEUs impact system function or notimpact system function or not

Single bit correctableSingle bit correctableWith and without impact on system functionWith and without impact on system function

Double bit nonDouble bit non--correctablecorrectableWith and without impact on system functionWith and without impact on system function

Record and transmit SEU counts and typesRecord and transmit SEU counts and typesTolerate/correct Tolerate/correct SEUsSEUs using various mechanismusing various mechanism

Use ECC functionality to count & correct Use ECC functionality to count & correct SEUsSEUsConfiguration Memory (w/ SEU controller circuit)Configuration Memory (w/ SEU controller circuit)Block Block RAMsRAMs in ECC modein ECC mode

Monitor and count failure indicationsMonitor and count failure indications

Page 28: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 31

AUBIAUBIeeSSaaTT PlanPlanUse TMR with guard bands for all other logicUse TMR with guard bands for all other logic

Design SEU Design SEU Include fault monitoring circuits to detect/count Include fault monitoring circuits to detect/count SEUsSEUs

SEUsSEUs can occur in configuration memory & be counted twicecan occur in configuration memory & be counted twiceBut only configuration memory ECC can correct But only configuration memory ECC can correct SEUsSEUs

SEUsSEUs in system TMR flipin system TMR flip--flops may be flops may be ““flushed outflushed out”” in timein timeFault monitor failures indicate area for SEU controller scanFault monitor failures indicate area for SEU controller scan

Reduces latency for detection & correction of SEUReduces latency for detection & correction of SEU

Include ability to download original configurationInclude ability to download original configurationTo scrub memories in case of multiple nonTo scrub memories in case of multiple non--correctable errors in configuration memorycorrectable errors in configuration memory

Use Use ““radrad--hardhard”” ROM to store configurationROM to store configurationMay also periodically reMay also periodically re--download to scrub memorydownload to scrub memory

Page 29: FPGA Architectures and Operation for Tolerating SEUsstrouce/DaTseminar/Stroud07s.pdf · FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering

1/31/07 VLSI Design & Test Seminar 32

SummarySummarySingle Event Upsets (Single Event Upsets (SEUsSEUs) in ) in FPGAsFPGAs

Serious problemSerious problemEverything controlled by configuration memory bitsEverything controlled by configuration memory bits

New architectural features provide indication of New architectural features provide indication of SEUsSEUswith ability to correctwith ability to correct

SEU controller scan to detect and correct single bit errorsSEU controller scan to detect and correct single bit errorsECC Block RAM modeECC Block RAM mode

TMR with guard band regions in TMR with guard band regions in FPGAsFPGAsIsolate multiple working regions that contain Isolate multiple working regions that contain functionally equivalent system functionsfunctionally equivalent system functionsFault monitoring circuits within guard bandsFault monitoring circuits within guard bands

Compare working regionsCompare working regionsDetects Detects SEUsSEUs that could impact system operationthat could impact system operation

Take action when mismatch occursTake action when mismatch occursAHABAHAB