Would Error Correction Provide a Benefit in Classical Computers? · 2017. 9. 5. · Eli...

27
Would Error Correction Provide a Benefit in Classical Computers? 27 Jan 2012 Photons, Electrons, Bands Thomas Szkopek Canada Research Chair in Nanoscale Electronics Department of Electrical and Computer Engineering

Transcript of Would Error Correction Provide a Benefit in Classical Computers? · 2017. 9. 5. · Eli...

Would Error Correction Provide a

Benefit in Classical Computers?

27 Jan 2012 Photons, Electrons, Bands

Thomas Szkopek

Canada Research Chair in Nanoscale Electronics Department of Electrical and Computer Engineering

Acknowledgements

Vwani Roychowdhury,

(collaborator)

UCLA

Funding:

Eli Yablonovitch,

(provocateur)

UC Berkeley

system reliability

3

ENIAC, 1946 17,468 vacuum tubes mean time between faults: ~2 days

Source Drain

Gate

IBM BlueGene/L, 2006 131,072 processors mean time between faults: ~6 days

Lawrence Livermore National Laboratory

system reliability

4

Source Drain

Gate

“[with] current state‐of‐the‐art fault‐tolerance strategy, checkpoint/restart, for a 1 PFlop/s system… a computational job that could complete in 100 hours in a failure‐free environment will actually take 251 hours” “While several [high-end computing] vendors are looking to address reliability at the hardware level, the costs are proving to be staggeringly high in both money and power.”

DeBardeleben et al., High‐End Computing Resilience: Analysis of Issues Facing the HEC Community and Path‐Forward for Research and Development, Los Alamos National Laboratory 2010, http://institute.lanl.gov/resilience/docs/

let’s look at the hardware level!

error correction: memory and communications

5

reliable encoding

reliable decoding & error correction

channel (memory)

identity

transmitter (write)

receiver (read)

errors

• reliable encoding, decoding and error correcting hardware • efficient, complex codes are used

error correction: computation

6

reliable encoding

reliable decoding & error correction

logic unit

encoded logic

encoder decoder

errors

• reliable encoding, decoding and error correcting hardware • logic performed in code space (eg. Reed-Muller codes)

D. Pradhan & S. Reddy, IEEE Trans. Comp. 21, 1331 (1972).

• however, it is likely that all hardware is equally (un)reliable

error correction: computation

7

error correction logic

error correction logic

errors

• errors occur in all hardware • never decode bits or they will be corrupted, in other words:

all operations must be perfomed in protected code space!

protecting 1 bit : repetition

8

repetition code

“0” = 0 0 0 0 0

“1” = 1 1 1 1 1

error correction by majority vote

0 0 0 1 0 0 0 0 0 0

1 1 0 1 1 1 1 1 1 1

0 1 0 1 1 1 1 1 1 1

0 1 0 0 1 0 0 0 0 0

J. von Neumann, Lectures on Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components, 1952.

single bit flip: p

logical bit flip: P = 60p3 + …

p

error rate

p P = 60p3

protecting 1 bit

9

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

If majority gates are error-free, then the majority voting process is error free if <50% of input bits are in error.

MAJ

MAJ = majority vote

protecting 1 bit

10

President Harry S. Truman

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

MAJ

If majority gates are error-prone, then the majority voting process is error-prone.

MAJ = majority vote If majority gates are error-free, then the majority voting process is error free if <50% of input bits are in error.

error with probability p

fault tolerant architecture

11

majority gate M error correction concatenation

copy the bits ×3

majority vote ×3

Triplicate repetition code and fault-tolerant majority

PO Boykin, VP Roychowdhury, Proc. Int. Conf. Dep. Sys. Net. 2005.

fault tolerant architecture

12 PO Boykin, VP Roychowdhury, Proc. Int. Conf. Dep. Sys. Net. 2005.

error per majority gate: p

error with L concatenations:

P p

2L

bits with L concatenations:

~

1

108

N 9L (with ancillae)

error rate versus bits:

P p

Nlog2/log9

protecting more than 1 bit?

13

Can universal logic operations be performed in code space? (difficulty lies in the parity bits) Unknown. Best result is with an evolving RM code space. Can fault tolerant error correction be performed? Unknown. Promising quantum computing results. Is the overhead prohibitive? Unknown.

error correction logic

error correction logic

14

what about device physics? complementary transistor inverter:

Nin = input charge Nout = output charge N = CV/e = maximum charge Gn = n-channel conductance Gp = p-channel conductance

G

pG

0exp eV

GS/ k

BT

Assume sub-threshold conductance / thermionic emission through channels:

G

nG

0exp eV

GS/ k

BT

source

drain

VGS

15

CNT inverter Ph. Avouris, et al., Physica B 323 (2002) 6–14

Si nanowire inverter D. Wang, et al., Small 2 (2006) 1153-8

complementary logic

ZnO nanowire inverter S. Roy, et al., Nanotech 21 (2010) 245306

source

drain

VGS

16

complementary logic

Nout

N

2G

pG

n

GpG

n

N

2tanh

Nin

kBTC / e2

information theoretic perspective: single charge ~ physical bit

total charge ~ logical bit signal restoration ~ majority vote

metal-insulator transition in transistor channels:

17

complementary logic universal NAND gate:

δq2=kBTC

18

complementary logic

p(N

in)

Nin

Probability of logical error:

P :1

2

2

N ln 1

1/2

N

exp(eV / 8kBT )

Error scales as a ideal majority vote of N electrons with an error p per electron:

p

2

4

logical error

P N

N / 2

pN/2 :

2

N

1/2

4p N/2

NM

NM

reliability and redundancy

19

error rate per particle

p

logical error rate for N particles

P : T

p

T

Nlog2/log9

p

P :2

N

1/2

4p N/2

p ~ exp eV

kBT

P :1

N ln 1

4p

1/2

4p N 2

p ~ r

r 2

P :2

N

1/2

4p N 2

ideal majority vote

transistor logic circuit

ballistic gates

1-bit architecture

J

00010000

10111111

exp

on

enti

al

sup

pre

ssio

n in

N

sub

-exp

on

enti

al

sup

pre

ssio

n in

N

T. Szkopek et al PRL 106, 176801 (2011).

45nm node (2010)

21nm node (2015)

11.9nm node (2020)

L, gate length [nm] 27 17 10.7

Cg, gate capacitance [aF] 19.7 10.0 4.0

V, operating voltage [V] 0.97 0.81 0.68

N, electrons per inverter gate 240 100 34

N, electrons per NAND gate 480 200 68

M, transistors/chip 2.2×109 8.8×109 35×109

f, clock freq. [GHz] 5.9 8.5 12.4

P, error probability at 1000 FITs 2×10−29 4×10−30 4×10−31

P, error probability at 1 fault/year 2×10−27 4×10−28 7×10−29

CMOS

20 International Technology Roadmap for Semiconductors, 2009 edition.

Intel 45nm, strained Si

Source Drain

Gate

error rate comparison

21 T. Szkopek et al PRL 106, 176801 (2011).

structural disorder in transistor structures will increase error rates

1 electron, eV/kBT = 0.97eV/26meV

N = 30 at eV = 1.00 eV

is equivalent to

N = 3000 at eV = 10 meV

conclusions

22

• physics of transistors provides protection against logical errors • for 1-bit protection, it is better to prevent errors than to correct errors • error correction with multiple-bit code protection is an open problem

J

thank you for your attention

23

classical computing with spin

24

magnetic moments interaction:

V ~

2

r3

interaction error: V ~V

r

r

rotation for distinguishable states:

V t

h

rotation error: ~

r

r

J1 J2

r spin placement accurate to within δr

probability of erroneous spin flip!

δr

classical computing with spin

25

spin 1/2

~

r

r

Probability of error:

p ~

1

42

spin j = N × 1/2

p ~2

N

1/2

N

Probability of error:

classical computing with spin

26

N × spin 1/2

p ~

1

42

P N

N / 2

pN/2 ~

2

N

1/2

NMajority vote on N spins:

Probability of single error:

✗ ✗ ✗ ✗ ✗

27

complementary logic

NM

NM

p(N

in)

Nin

δq2=kBTC δq2=kBTC + T (δq2)

Local noise dominates when: δq/e << NM

δq/e Growth of charge fluctuations / error is suppressed by transistor error correction.