Neural-inspired computing algorithms and hardware for image...

Photos placed in horizontal position with even amount of white space

between photos and header

Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. SAND2017-4198 C

1

Conrad D. James, Ph.D. Sandia National Laboratories

The Salishan Conference on High-Speed Computing, April 2017

Neural-inspired computing algorithms and hardware for image analysis and cybersecurity applications

Acknowledgments

Partnerships: David Follett, Duncan Townsend (Lewis Rhodes Labs); Pamela Follett; Isaac Richter,Engin Ipek (U. Rochester); Felix Wang (UIUC); David Lidsky, Marek Osinski (UNM);

2

Algorithms: Brad Aimone, Ojas Parekh, Nadine Miner, Sandra Faust, Steve Verzi, FrancesChance, Tu-Thach Quach, Chris Lamb, Meghan Galiardi, Samuel Mulder, William Severa,Kristofor Carlson, Michael Smith, Cynthia Phillips, Jacob Hobbs, Robert Abbott; Jeffrey Piersol,

Architecture: John Naegle, Alex Hsia, Sapan Agarwal, Craig Vineyard, Fred Rothganger,Jonathon Donaldson, Gabriel Popoola, Aaron Hill

Learning Hardware: Matt Marinella, Thomas Beechem, Ron Goeke, Alec Talin, Paul Kotula,Farid El Gabaly, Elliot Fuller, Jim Stevens, David Hughart, Andy Armstrong, David Henry, GaadiHaase, Steve Wolfley, Seth Decker, Christopher Saltonstall, Jamison Wagner, John Niroula,Derek Wilke, Michael Van Heukelom, Patrick Finnegan, Carl Smith, Robin Jacobs-Gedrim

Modeling & Applications: Steve Plimpton, Justin Doak, Richard Schiek, Brian Tierney,Robert Bondi, Harry Hjalmarson, Tim Draelos, Jonathan Cox, Joe Ingram, Jason Wheeler

MESA

3

Data-driven computing (machine learning)is necessary for real-world problems

C. Lampert, VRML 2013

Data-driven computingConventional numerical computing

yann.lecun.com

4

Data-driven (neural-inspired) computing has a complicated history…and mixed results

James et al., BICA 2017, 19, 49

5

Neural-inspired algorithms are achieving success but several challenges remain

Karpathy etc. NIPS 2014, 1889 Kemelmacher et al., CVPR 2016

Advanced Scientific & Data-DrivenComputing

Deployable National Security Applications

Computing & Information Science

Nanodevices & µSystems

Neural computing at Sandia Labsleverages a large research foundation

Computational Neuroscience

Algorithms Theory

Enabling Hardware

Neural Machine-Learning Algorithms

Formal Neural Computing Theory

Configurable CMOS Neural Architectures

Adaptive post-CMOS Neural Architectures

UQ of Neural Algorithms

Neural Computing Capabilities

Modeling & Simulation

UQ/SA

Beyond Moore Computing Devices

Micro-Sensors

Non-Von Neumann Architectures

MESA

Neural-inspired Communication

Adaptive memory management

Robust machine learning

Cybersecurity

Embedded Pattern Recognition Systems

Smart Sensor Technology

Machine Learning

CMOS & BEOL Technology

6

Hardware Acceleration of Adaptive Neural Algorithms (HAANA)

Neurogenesis deep learning: Draelos et al, IJCNN 2017Spiking network algorithms: Severa et al., ICRC 2016Digital neuromorphic architecture: Smith et al., IJCNN 2017

Resistive switching model: Mickel et al, Adv Mater 2014Electrochemical transistor: Fuller et al., Adv Mater 2016Resistive crossbar accelerator: Agarwal et al., IJCNN 2016

7

Translating neuroscience into the next generation of computing – Neural Machine learning (NML)

8

Identify critical aspectsof computationSimulate at high level of neural fidelity

Identify neurobiological circuits of interestFormalize & optimize neural algorithms

Translate into NMLalgorithm

9

Cox, Aimone, James; Complex Adaptive Systems, Nov. 2015; Procedia Comp Sci 61, 349

Example cyber problem: file identificationusing deep neural networks

*.aes*.elf *.pdf

*.jpeg*.gif *.png

*.doc*.gzip *.html

Limitations of supervised machine learning

10

PSDByte ValueEntropy

3 features - Predicted ClassHTML PNG JPEG GIF PDF DOC ELF GZIP AES

Act

ual C

lass

HTML 100PNG 91 1 1 1 1 1 4JPEG 1 99GIF 100PDF 1 3 1 95DOC 1 99ELF 100GZIP 100AES 5 95

Cox, Aimone, James; Complex Adaptive Systems, Nov. 2015; Procedia Comp Sci 61, 349

• Supervised learning requires subjectmatter experts to hand-craft features

• Data-driven algorithms are limited…bythe data

Translating neuroscience into the next generation of computing

11




12

entorhinal cortex

dentate gyrus

CA3

CA1

Leveraging computational neuroscience models to develop new algorithms

dentate gyrus

similar EC inputs

increased sparsity leads to lossless decorrelation of

DG outputs

Vineyard et al., IJCNN 2016, DOI: 10.1109/IJCNN.2016.7727884

wikipedia

13

Modeling the pattern separation function of the hippocampus

Severa et al., Neural Computation 2017, 29, 94

dentate gyrus

similar EC inputs

increased sparsity leads to decorrelated DG outputs

14

Quantifying the sparsity transformation in the hippocampus


Increased samplingfrom input to output →increased sparsity

This approach is feasible for well-defined inputs – need to formalize an algorithm to account for unknown inputs

?

?

15

Algorithm inspiration: neurogenesis in biological networks

Dieni et al, Nature Comm 2016, 7:11313

Adult neurogenesis improves information capacity…?

entorhinal cortex

dentate gyrus

CA3

CA1wikipedia

16


Neurogenesis may provide flexible encoding strategies for particular brain regions

Aimone, Deng and Gage, Neuron 2011, 70, 589

Mature neuron

Immature neuron

Information space

large pInformation space

intermediate pInformation space


17




Data-driven computing methods are limited…by data

Encoding / classification

path

Decoding / reconstruction

path

784

300

250

200

1,7

0

23

4

5 6

8

9

18

00112233445566778899

Draelos et al, ICLR 2016, IJCNN 2017

Encoding / classification

path

Decoding / reconstruction

path

784

300

250

200

1,7

0

23

4

5 6

8

9

“Neurogenic deep learning” enables adaptation to changing data

19

00112233445566778899

Draelos et al, ICLR 2016, IJCNN 2017


20




21

Categorizing cyber data under imperfect conditions with sparse coding

• Training data is not always available, fragmented

• Limited expertise; hand-engineered features

≈ ∗

Sparse Representation

Learned DictionaryFragment

*.aes*.elf *.pdf

*.jpeg*.gif *.png

*.doc*.gzip *.html

𝑦𝑦 ≈ 𝐴𝐴𝐴𝐴min𝑥𝑥

𝑦𝑦 − 𝐴𝐴𝐴𝐴 2 + 𝑆𝑆(𝐴𝐴)

22

Wang et al., in preparation

Generating local and global features from file fragments

• Byte dictionary patches & sparse representations of fragments – local features

Graves et al., ASRU2013Hochreiter & Schmidhuber, Neur Comp 1997

• Long short-term memory (LSTM) networks are used to improve long-range correlations – global features

23

Sparse dictionary learning and LSTM networks for file fragment ID – compared to SVM

• Averaged F1 score = 53.12%• NLP (SVM) approach achieved 49.1 ± 3.15% (Fitzgerald et al., DI 2012)

Wang et al., in preparation


24




Optimization of algorithm performance

Neural algorithm operations arecomputationally expensive (energyand time) due to training; manymatrix-vector operations

25

min𝑥𝑥

𝑦𝑦 − 𝐴𝐴𝐴𝐴 2 + 𝑆𝑆(𝐴𝐴)Sparse coding:

kk

jkk w δ∑=∆Backpropagation:

Make better/smarter algorithms:

Lee et al., Proc World Cong Eng Comp Sci 2013

Hardware accelerate algorithms:

{ } { }{ }gFfFFgf ⋅= −1*

∑−

=

1

0

1 N

k

jk kN

j ω

26

Hardware acceleration of spiking algorithms for time-dependent data processing

Example: liquid state machine(LSM); a tool for data transformation; randomly connected spiking neurons encode complex temporal dynamics

Maass et al, Neural Comp 2002, 14, 2531

Spiking algorithms are often inefficient on conventional hardware…

Hardware acceleration of algorithm operations

27

Ovtcharov et al.,(Microsoft), FPGAacceleration of CNNs,2015

Gokmen & Vlasov., (IBM), Resistivecrossbar acceleration of DNNs, 2016

Coates et al., (Nvidia,Stanford), Deeplearning with GPUs,2013

Gokhale et al., (Purdue), nnX for accelerating DNNswith ARMs, 2013

28

Spiking Temporal Processing Unit (STPU)

t=t1 t=t2

Impart complex temporal dynamics into neural networks

Smith et al, “A Novel Digital Neuromorphic Architecture…”, IJCNN 2017

29

Emulation of a LSM mapped onto an STPU architecture

• Test data: spoken digits (0-9)• Implement the liquid on the STPU• Use a classifier to categorize the spoken digits

Assemble an array of LIF neurons and combine with a synaptic map W

Smith et al, “A Novel Digital Neuromorphic Architecture…”, IJCNN 2017

Linear Model

3x3x15 5x5x5 4x5x10 2x2x20

LinearSVM

0.906 0.900 0.900 0.914

LDA 0.921 0.922 0.922 0.946Ridge Regress

0.745 0.717 0.717 0.897

Logistic Regress

0.431 0.254 0.254 0.815

30

Implementing synaptic connections in hardware for non-spiking neural algorithms

Agarwal et al, IJCNN 2016, DOI: 10.1109/IJCNN.2016.7727298

G1,2G1,1

G2,1

G3,1

G2,2

G3,2

G1,3

G2,3

G3,3

V1

V2

V3

I1=ΣVi,1Gi,1 I2=ΣVi,2Gi,2 I3=ΣVi,3Gi,3

wij wkl wij=Gij=(Rij)-1

Agarwal et al., Front. Neuroscience 2016, 9, 484

Use variable resistors to implement neural network weights in hardware– saves energy O(n3) to O(n2)

Designing, modeling, and fabricating devices with improved neural computing characteristics

31

Fuller et al., Adv Mater 2016, 10.1002/adma.201604310van de Brugt et al., Nat Materials 2017, 10.1038/nmat4856

Current

VoltageVREAD VSET

VRESET SET

RESET

Mickel et al., Adv Mater, 26, 4486, 2014Landon et al., APL 2015, 107, 023108

𝑇𝑇𝑠𝑠 = 𝑇𝑇𝑅𝑅𝑅𝑅 + 𝜎𝜎𝑉𝑉2𝑑𝑑𝐸𝐸

2𝑘𝑘𝐸𝐸𝑑𝑑𝑜𝑜1 −

𝑘𝑘𝐸𝐸𝑘𝑘𝐹𝐹

𝑟𝑟𝐹𝐹2

4𝑑𝑑𝐸𝐸𝑑𝑑𝑜𝑜

Filament surface temperature (Ts):

Ta

Pt

+ +++

++

TaOx

+

G vs t

Model hardware-acceleration to assess the impact on algorithm performance

Python wrapper

32

aeself

htmlpdf

gzip

jpeggif png

doc

0

90

98Linear Asymmetric, ν = 1

Asymmetric, ν = 5 Symmetric, ν = 5

File Categorization A

ccuracy

Read Noise (σRN) Read Noise (σRN)

Writ

e No

ise

(σWN)

Writ

e No

ise

(σWN)

Agarwal et al, IJCNN 2016, DOI: 10.1109/IJCNN.2016.7727298


33




Thanks for your time!Questions?

34

Backup Slides

35

Applications in imaging and cybersecurity

36

100110

100101

001

010100

0111000

11

000111

010101

010

110001

010111

010

010100

101010

010

1111000

100101

00

• The real world is filled with massive amounts ofdata

• Data needs to be filtered to capture relevantinformation

• Signatures or features need to be extracted fromdata

• Features can then be used to interpret activities

data features interpretationinformation

‘68899’‘zipcode’

Motivation: determine the local velocity in a flow field

SNN algorithms are highly parallel and can leverage the time/neuron tradeoff

Neural algorithms can match or best traditional ‘big O’

Severa et al, ICRC 2016; DOI: 10.1109/ICRC.2016.7738681

Spiking network algorithm for computing cross-correlations

37

Trading neurons for time and vice versa

O(n2) neurons, constant time O(n) neurons, O(n) time

38

39

Matrix operations are at the core of many neural computing operations

Naïve algorithm for matrix multiplication is O(N3).

kk

jkk w δ∑=∆Backpropagation: Graph

Analysis:

40

Strassen matrix multiplication

Standard: 8Ms, 4As → O(N3)Strassen: 7Ms, 18A/Ss→ O(N2+ε)

Strassen, Num Math, 1969

41

Strassen formulation of matrix multiply enables less than O(N3) neurons – resulting in less power consumption

“Neural” network for matrix multiplication

Resistive switching devices

Ta

Pt

+ +++

++

+ ++ ++

+ ++ +

+ +++

ON

Ta

Pt

+ +++

++

TaOx

+

OFF

V = I×RI = G×V

I=I1+I2I1

I2

DRAM NAND Flash PC-RAM STT-MRAM FeRAM ReRAM CBRAM

MaturityProduction

(20 nm)Production

(16 nm)Production

(45 nm)Production

(65 nm)Production(180 nm)

Production (180 nm)

Production(180nm)

Min device feature F (nm) 20 16 10 y > 10 y > 10 yStackable No Yes Yes No No Yes Yes Process complexity High/FE High/FE Low/BE High/BE High/BE Low/BE Low/BE

Go G

ReRAM is O(N) better than SRAM in energyconsumption for vector-matrix multiply computations

N row

s

M columns

SRAMs must fetch eachvector per dot product ~O(N2×M)

Analog computation:multiplier and adder at eachintersection; E ~ CV2 ~O(N×M)

w11

w21

w31

w41

w12

w22

w32

w42

w13

w23

w33

w43

w14

w24

w34

w44

V1=x1 +-

+-

+-

+-

V2=x2

V3=x3

V4=x4

I1=x1*w11 + … + x4*w41

Agarwal et al., Front. Neuroscience 2016, 9, 484

43

Slide Number 1AcknowledgmentsSlide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Translating neuroscience into the next generation of computing – Neural Machine learning (NML)Slide Number 9Slide Number 10Translating neuroscience into �the next generation of computingSlide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Translating neuroscience into �the next generation of computingData-driven computing methods are limited…�by data “Neurogenic deep learning” enables adaptation to changing dataTranslating neuroscience into �the next generation of computingSlide Number 21Slide Number 22Slide Number 23Translating neuroscience into �the next generation of computingSlide Number 25Slide Number 26Slide Number 27Slide Number 28Slide Number 29Slide Number 30Slide Number 31Slide Number 32Translating neuroscience into �the next generation of computingSlide Number 34Backup SlidesApplications in imaging and cybersecuritySlide Number 37Slide Number 38Slide Number 39Slide Number 40Slide Number 41Resistive switching devicesReRAM is O(N) better than SRAM in energy consumption for vector-matrix multiply computations

Neural-inspired computing algorithms and hardware for image...

Documents

Transcript of Neural-inspired computing algorithms and hardware for image...