Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉=...

59
Mixed-Signal Techniques for Embedded Machine Learning Systems Boris Murmann June 16, 2019

Transcript of Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉=...

Page 1: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Mixed-Signal Techniques for Embedded

Machine Learning Systems

Boris Murmann

June 16, 2019

Page 2: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Applications

Speed of response

Bandwidth utilized

Privacy

Power consumed

CloudEdge

Conversational

Interfaces

…natural?

…seamless?

…real-time?

2

Page 3: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Task Complexity, Memory and Classification Energy

Face Detection

SVM

Digit RecognitionImage Classification

10 classes

Binary CNN

Image Classification

1000 classes

MobileNet v1

Self-Driving

ResNet-50 HD

pJ

nJ

Β΅J

mJ

J

3

Page 4: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Task Complexity, Memory and Classification Energy

pJ

nJ

Β΅J

mJ

J

My main interest

4

Page 5: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Edge Inference System

5

A/D

A/D

Sensors

Wake-up

Detector

Pre-

Processin

g

Deep

Neural

Network

DRA

M

SRA

M

Buffer

PE

Array

Register

File

Compute

MAC

Page 6: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Opportunities for Analog/Mixed-Signal Design

6

A/D

A/D

Sensors

Wake-up

Detector

Pre-

Processin

g

Deep

Neural

Network

DRA

M

SRA

M

Buffer

PE

Array

Register

File

Compute

MAC

Analog

Feature

Extraction

Analog

MAC

In-Memory

Computing

Page 7: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Outline

7

Data-Compressive Imager for Object Detection

β€Ί Omid-Zohoor & Young, TCSVT 2018 & ISSCC 2019

Mixed-Signal ConvNet

β€Ί Bankman, ISSCC 2018 & JSSC 2019

RRAM-based ConvNet with In-Memory Compute

β€Ί Ongoing work

Page 8: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Wake-Up Detector with Hand-Crafted Features

Feature

Extractor

Simple

Classifier

Sensor

Signal LabelA/D

Training

Algorithm

Labeled

Training Data

High-dimensional

data

Low-dimensional

representation

Data Deluge

8

Page 9: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Analog Feature Extractor

Low-rate and/or low–resolution ADC

Low data rate digital I/O

Reduced memory requirements

Feature

Extractor

Simple

Classification

Sensor

SignalLabelA/D

Training

Algorithm

Labeled

Training Data

Low-dimensional

representation

9

Page 10: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Histogram of Oriented Gradients

10

Page 11: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Analog Gradient Computation

H1 H2𝐺𝐻 =

V1 V2𝐺𝑉 =

horizontal

vertical

V2

H

1

PH

2

V1

𝐺𝐻 = 400π‘šπ‘‰ βˆ’ 100π‘šπ‘‰ = πŸ‘πŸŽπŸŽπ’Žπ‘½

𝐺𝐻 =1

4400π‘šπ‘‰ βˆ’

1

4100π‘šπ‘‰ = πŸ•πŸ“π’Žπ‘½

Dark patch

Bright patch

11

Page 12: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

High Dynamic Range Images

12

Gradient magnitude varies

significantly across image

Would require high-

resolution ADCs (β‰₯ 9b) to

digitize computed gradients

β€Ί But, we want to produce

as little data as possible

Page 13: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Ratio-Based (β€œLog”) Gradients

V2

H

1

PH

2

V1

H1

H2

𝐺𝐻 =V1

V2

𝐺𝑉 =

H1

H2

𝐺𝐻 = =𝛼 Γ—

𝛼 Γ—

H1

H2

Illumination Invariant

13

Page 14: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Ratio Quantization

V2

H

1

PH

2

V1

H1

H2

𝐺𝐻 =

V1

V2

𝐺𝑉 =

𝑉𝑝𝑖π‘₯1𝑉𝑝𝑖π‘₯2

1

22

Negative

edge

Positive

edge

No

edge

*Empirically determined

thresholds

𝑉𝑝𝑖π‘₯1𝑉𝑝𝑖π‘₯2

1

22

1.5b

2.75b

4.21.31

1.3

1

4.2

H1

H2

V1

V2

𝐺𝑉 = 𝑄𝐺𝐻 = 𝑄

14

Page 15: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

HOG Feature Compression with 1.5b Gradients

Quant

Fewer angle bins

Quantizing histogram

magnitudes

25Γ— less data in HOG

features compared to 8-bit

image

15

Page 16: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Log vs. Linear GradientsLess Illu

min

atio

n

16

Page 17: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Log vs. Linear GradientsLess Illu

min

atio

n

17

Page 18: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Log vs. Linear GradientsLess Illu

min

atio

n

18

Page 19: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Prototype Chip

19

Page 20: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Row Buffers with Pixel Binning (Image Pyramid)

20

Page 21: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Ratio-to-Digital Converter (RDC)

21

Page 22: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Data-Driven Spec Derivation

22

𝐻1 + 𝝐𝟏𝐻2 + 𝝐𝟐

Page 23: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Chip Summary

β€’ 0.13 Β΅m CIS 1P4M

β€’ 5Β΅m 4T pixels

β€’ QVGA 320(V) x 240(H)

β€’ 229 mW @ 30 FPS

Supply Voltages

Pixel: 2.5V

Analog: 1.5V, 2.5V

Digital: 0.9V

23

Page 24: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Sample Images

24

Page 25: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

25

Results using Deformable Parts Model detection & custom database (PascalRAW)

Page 26: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Comparison to State of the Art

26

Page 27: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Information Preservation

27

Page 28: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Use Log Gradients as ConvNet Input?

28

Ongoing work; comparable performance using ResNet-10 (PascalRaw dataset)

Page 29: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Digital ConvNet Fabric

Can We Play Mixed-Signal Tricks in a ConvNet?

29

Page 30: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

BinaryNet

Courbariaux et al., NIPS 2016

Weights and activations constrained to +1

and -1, multiplication becomes XNOR

Minimizes D/A and A/D overhead

Nice option for small/medium-size

problems and mixed-signal exploration

30

Page 31: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Bankman et al., ISSCC 2018

Mixed-Signal Binary CNN Processor

1. Binary CNN with β€œCMOS-inspired” topology, engineered for minimal circuit-level path loading

2. Hardware architecture amortizes memory access across many computations, with all memory on chip (328 KB)

3. Energy-efficient switched-capacitor neuron for wide vector summation, replacing digital adder tree

31

CIFAR-10

Page 32: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Zhao et al., FPGA 2017

Original BinaryNet Topology

88.54% accuracy on CIFAR-10

1.67 MB weight memory (68% FC layers)

27.9 mJ/classification with FPGA

32

Page 33: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Mixed-Signal BinaryNet Topology

Sacrificed accuracy for regularity and energy efficiency

86.05% accuracy on CIFAR-10

328 KB weight memory

3.8 mJ per classification

33

Page 34: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Neuron

34

Page 35: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

NaΓ―ve Sequential Computation

35

Page 36: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Weight-Stationary

256

x2 (north/south)

x4 (neuron mux)

36

Page 37: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Weight-Stationary and Data-ParallelParallel

broadcast

37

Page 38: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Complete Architecture

38

Page 39: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

𝑧 = sgn

𝑖=0

1023

𝑀𝑖π‘₯𝑖 + 𝑏 𝑧

𝑀 𝑏Filter Weights

π‘₯Image Patch Filter Bias

9b sign-magnitude

𝑏 = βˆ’1 𝑠

𝑖=0

7

2π‘–π‘šπ‘–

Output

2Γ—2Γ—256 2Γ—2Γ—256

Neuron Function

39

Page 40: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

𝑣diff𝑉𝐷𝐷

=πΆπ‘’πΆπ‘‘π‘œπ‘‘

𝑖=0

1023

𝑀𝑖π‘₯𝑖 + 𝑏

𝑏 = βˆ’1 𝑠

𝑖=0

7

2π‘–π‘šπ‘–

Switched-Capacitor Implementation

Bias & offset cal.Weights x inputs

Batch normalization

folded into weight

signs and bias

40

Page 41: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Behavioral Simulations

41

Significant margin in noise, offset, and mismatch (VFS = 460 mV)

𝐢𝑒 = 1 fFπ‘£π‘œπ‘  = 4.6 mV RMS𝑣𝑛 = 460 Β΅V RMS

Page 42: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

β€œMemory-Cell-Like” Processing Element

1 fF metal-oxide-metal fringe capacitorStandard-cell-based

42 transistors

24107 F2

42

Page 43: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

TSMC 28nm HPL 1P8M

6 mm2 area

328 KB SRAM

10 MHz clock

Supply Voltages

VDD – Digital Logic, 0.6V – 1.0V

VMEM – SRAM, 0.53V – 1.0V

VNEU – Neuron Array, 0.6V

VCOMP – Comparators, 0.8V

Die Photo

43

Page 44: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

ΞΌ = 86.05%

Οƒ = 0.40%

Measured Classification Accuracy

10 chips, 180 runs each through

10,000 CIFAR-10 test images

VDD = 0.8V, VMEM = 0.8V

3.8 ΞΌJ/classification

237 FPS, 899 ΞΌW

0.43 ΞΌJ in 1.8V I/O

Mean accuracy ΞΌ = 86.05% same

as perfect digital model

44

Page 45: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Comparison to Synthesized Digital

BinarEye(Moons et al., CICC 2018)

Synthesized Digital Mixed-Signal

45

Page 46: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Digital vs. Mixed-Signal Binary CNN Processor

Synthesized

Digital

Moons et al.

CICC 2018

Hand-Designed

Digital

(Projected)

Mixed-Signal

Bankman et al.

ISSCC 2018

Energy @

86.05%

CIFAR-10

46

Page 47: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

CIFAR-10 Energy vs. Accuracy Neuromorphicβ€Ί [1] TrueNorth, Esser PNAS

2016

GPUβ€Ί [2] Zhao FPGA 2017

FPGAβ€Ί [2] Zhao FPGA 2017

β€Ί [3] Umuroglu FPGA 2017

MCUβ€Ί [4] CMSIS-NN, Lai arXiv 2018

Memory-like, mixed-signalβ€Ί [5] Bankman ISSCC 2018

BinarEye, digitalβ€Ί [6] Moons CICC 2018

In-memory, mixed-signalβ€Ί [7] Jia arXiv 2018

*energy excludes off-chip DRAM

47

Page 48: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Limitations of Mixed-Signal BinaryNet

48

Poor programmability

Relatively limited accuracy (even on CIFAR-10) due to 1b arithmetic

Energy advantage over customized digital is not revolutionary

β€Ί Same SRAM, essentially same dataflow

Need a more β€œanalog” memory system to unleash larger gains

β€Ί In-memory computing

Page 49: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

BinaryNet Synapse versus Resistive RAM

β€’ TBD

β€’ 25 F2

β€’ Multi-bit (?)

49

β€’ 0.93 fJ per 1b-MAC in 28 nm

β€’ 24107 F2

β€’ Single-bit

Page 50: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Tsai, 2018

Matrix-Vector Multiplication with Resistive Cells

50

Typically use two cells to

achieve pos/neg weights

(other schemes possible)

Page 51: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

256 columns

…

10

24

row

s

25 F2 cell @ F = 90 nm

Side length s = 0.45 mm

2 x 1024 x 0.45 um x 256 x 0.45 um = 0.106 mm2

per layer

0.84 mm2 for the complete 8-layer network

RRAM Density – BinaryNet Example

51

(2x)

Page 52: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Ongoing Research

52

What is the best architecture?

How many levels can be stored in each cell?

What is the most efficient readout?

Can we cope with nonidealities using training techniques?

Page 53: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

53

(Content deleted for online distribution)

Page 54: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

54

(Content deleted for online distribution)

Page 55: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

55

(Content deleted for online distribution)

Page 56: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

56

(Content deleted for online distribution)

Page 57: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

VGG-7 Experiment (4.8 Million Parameters)

57

8192

10

3Γ—3Γ—128

3Γ—3Γ—128

3Γ—3Γ—256 3Γ—3Γ—512

3Γ—3Γ—2563Γ—3Γ—3

2-bit weights, 2-bit

activations

Accuracy on CIFAR-10

2b quantization only: 93%

2b quantization + RRAM/ADC model: 92%

Work in progress!

Page 58: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Optimal energy/MAC = 0.38 fJ

Energy Model for Column in Conv6 Layer

58

Page 59: Mixed-Signal Techniques for Embedded Machine Learning …2 H 1 P H 2 V 1 H 1 H 2 𝐻= V 1 V 2 𝑉= 𝑉 𝑖π‘₯1 𝑉 𝑖π‘₯2 1 2 2 Negative edge Positive edge No edge *Empirically

Summary

59

Analog feature extraction is attractive for wake-up detectors

Adding analog compute in ConvNets can be beneficial when it

simultaneously lets us reduce data movement

β€Ί In-memory analog compute looks most promising

β€Ί Can consider SRAM or emerging memories (e.g. RRAM)

Expect significant progress as more application drivers for β€œmachine

learning at the edge” emerge