Neural-inspired computing algorithms and hardware for image...

43
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. SAND2017-4198 C 1 Conrad D. James, Ph.D. Sandia National Laboratories The Salishan Conference on High-Speed Computing, April 2017 Neural-inspired computing algorithms and hardware for image analysis and cybersecurity applications

Transcript of Neural-inspired computing algorithms and hardware for image...

  • Photos placed in horizontal position with even amount of white space

    between photos and header

    Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. SAND2017-4198 C

    1

    Conrad D. James, Ph.D. Sandia National Laboratories

    The Salishan Conference on High-Speed Computing, April 2017

    Neural-inspired computing algorithms and hardware for image analysis and cybersecurity applications

  • Acknowledgments

    Partnerships: David Follett, Duncan Townsend (Lewis Rhodes Labs); Pamela Follett; Isaac Richter,Engin Ipek (U. Rochester); Felix Wang (UIUC); David Lidsky, Marek Osinski (UNM);

    2

    Algorithms: Brad Aimone, Ojas Parekh, Nadine Miner, Sandra Faust, Steve Verzi, FrancesChance, Tu-Thach Quach, Chris Lamb, Meghan Galiardi, Samuel Mulder, William Severa,Kristofor Carlson, Michael Smith, Cynthia Phillips, Jacob Hobbs, Robert Abbott; Jeffrey Piersol,

    Architecture: John Naegle, Alex Hsia, Sapan Agarwal, Craig Vineyard, Fred Rothganger,Jonathon Donaldson, Gabriel Popoola, Aaron Hill

    Learning Hardware: Matt Marinella, Thomas Beechem, Ron Goeke, Alec Talin, Paul Kotula,Farid El Gabaly, Elliot Fuller, Jim Stevens, David Hughart, Andy Armstrong, David Henry, GaadiHaase, Steve Wolfley, Seth Decker, Christopher Saltonstall, Jamison Wagner, John Niroula,Derek Wilke, Michael Van Heukelom, Patrick Finnegan, Carl Smith, Robin Jacobs-Gedrim

    Modeling & Applications: Steve Plimpton, Justin Doak, Richard Schiek, Brian Tierney,Robert Bondi, Harry Hjalmarson, Tim Draelos, Jonathan Cox, Joe Ingram, Jason Wheeler

    MESA

  • 3

    Data-driven computing (machine learning)is necessary for real-world problems

    C. Lampert, VRML 2013

    Data-driven computingConventional numerical computing

    yann.lecun.com

  • 4

    Data-driven (neural-inspired) computing has a complicated history…and mixed results

    James et al., BICA 2017, 19, 49

  • 5

    Neural-inspired algorithms are achieving success but several challenges remain

    Karpathy etc. NIPS 2014, 1889 Kemelmacher et al., CVPR 2016

  • Advanced Scientific & Data-DrivenComputing

    Deployable National Security Applications

    Computing & Information Science

    Nanodevices & µSystems

    Neural computing at Sandia Labsleverages a large research foundation

    Computational Neuroscience

    Algorithms Theory

    Enabling Hardware

    Neural Machine-Learning Algorithms

    Formal Neural Computing Theory

    Configurable CMOS Neural Architectures

    Adaptive post-CMOS Neural Architectures

    UQ of Neural Algorithms

    Neural Computing Capabilities

    Modeling & Simulation

    UQ/SA

    Beyond Moore Computing Devices

    Micro-Sensors

    Non-Von Neumann Architectures

    MESA

    Neural-inspired Communication

    Adaptive memory management

    Robust machine learning

    Cybersecurity

    Embedded Pattern Recognition Systems

    Smart Sensor Technology

    Machine Learning

    CMOS & BEOL Technology

    6

  • Hardware Acceleration of Adaptive Neural Algorithms (HAANA)

    Neurogenesis deep learning: Draelos et al, IJCNN 2017Spiking network algorithms: Severa et al., ICRC 2016Digital neuromorphic architecture: Smith et al., IJCNN 2017

    Resistive switching model: Mickel et al, Adv Mater 2014Electrochemical transistor: Fuller et al., Adv Mater 2016Resistive crossbar accelerator: Agarwal et al., IJCNN 2016

    7

  • Translating neuroscience into the next generation of computing – Neural Machine learning (NML)

    8

    Identify critical aspectsof computationSimulate at high level of neural fidelity

    Identify neurobiological circuits of interestFormalize & optimize neural algorithms

    Translate into NMLalgorithm

  • 9

    Cox, Aimone, James; Complex Adaptive Systems, Nov. 2015; Procedia Comp Sci 61, 349

    Example cyber problem: file identificationusing deep neural networks

    *.aes*.elf *.pdf

    *.jpeg*.gif *.png

    *.doc*.gzip *.html

  • Limitations of supervised machine learning

    10

    PSDByte ValueEntropy

    3 features - Predicted ClassHTML PNG JPEG GIF PDF DOC ELF GZIP AES

    Act

    ual C

    lass

    HTML 100PNG 91 1 1 1 1 1 4JPEG 1 99GIF 100PDF 1 3 1 95DOC 1 99ELF 100GZIP 100AES 5 95

    Cox, Aimone, James; Complex Adaptive Systems, Nov. 2015; Procedia Comp Sci 61, 349

    • Supervised learning requires subjectmatter experts to hand-craft features

    • Data-driven algorithms are limited…bythe data

  • Translating neuroscience into the next generation of computing

    11

    Identify critical aspectsof computationSimulate at high level of neural fidelity

    Identify neurobiological circuits of interestFormalize & optimize neural algorithms

    Translate into NMLalgorithm

  • 12

    entorhinal cortex

    dentate gyrus

    CA3

    CA1

    Leveraging computational neuroscience models to develop new algorithms

    dentate gyrus

    similar EC inputs

    increased sparsity leads to lossless decorrelation of

    DG outputs

    Vineyard et al., IJCNN 2016, DOI: 10.1109/IJCNN.2016.7727884

    wikipedia

  • 13

    Modeling the pattern separation function of the hippocampus

    Severa et al., Neural Computation 2017, 29, 94

    dentate gyrus

    similar EC inputs

    increased sparsity leads to decorrelated DG outputs

  • 14

    Quantifying the sparsity transformation in the hippocampus

    Severa et al., Neural Computation 2017, 29, 94

    Increased samplingfrom input to output →increased sparsity

    This approach is feasible for well-defined inputs – need to formalize an algorithm to account for unknown inputs

    ?

    ?

  • 15

    Algorithm inspiration: neurogenesis in biological networks

    Dieni et al, Nature Comm 2016, 7:11313

    Adult neurogenesis improves information capacity…?

    entorhinal cortex

    dentate gyrus

    CA3

    CA1wikipedia

  • 16

    Severa et al., Neural Computation 2017, 29, 94

    Neurogenesis may provide flexible encoding strategies for particular brain regions

    Aimone, Deng and Gage, Neuron 2011, 70, 589

    Mature neuron

    Immature neuron

    Information space

    large pInformation space

    intermediate pInformation space

  • Translating neuroscience into the next generation of computing

    17

    Identify critical aspectsof computationSimulate at high level of neural fidelity

    Identify neurobiological circuits of interestFormalize & optimize neural algorithms

    Translate into NMLalgorithm

  • Data-driven computing methods are limited…by data

    Encoding / classification

    path

    Decoding / reconstruction

    path

    784

    300

    250

    200

    1,7

    0

    23

    4

    5 6

    8

    9

    18

    00112233445566778899

    Draelos et al, ICLR 2016, IJCNN 2017

  • Encoding / classification

    path

    Decoding / reconstruction

    path

    784

    300

    250

    200

    1,7

    0

    23

    4

    5 6

    8

    9

    “Neurogenic deep learning” enables adaptation to changing data

    19

    00112233445566778899

    Draelos et al, ICLR 2016, IJCNN 2017

  • Translating neuroscience into the next generation of computing

    20

    Identify critical aspectsof computationSimulate at high level of neural fidelity

    Identify neurobiological circuits of interestFormalize & optimize neural algorithms

    Translate into NMLalgorithm

  • 21

    Categorizing cyber data under imperfect conditions with sparse coding

    • Training data is not always available, fragmented

    • Limited expertise; hand-engineered features

    ≈ ∗

    Sparse Representation

    Learned DictionaryFragment

    *.aes*.elf *.pdf

    *.jpeg*.gif *.png

    *.doc*.gzip *.html

    𝑦𝑦 ≈ 𝐴𝐴𝐴𝐴min𝑥𝑥

    𝑦𝑦 − 𝐴𝐴𝐴𝐴 2 + 𝑆𝑆(𝐴𝐴)

  • 22

    Wang et al., in preparation

    Generating local and global features from file fragments

    • Byte dictionary patches & sparse representations of fragments – local features

    Graves et al., ASRU2013Hochreiter & Schmidhuber, Neur Comp 1997

    • Long short-term memory (LSTM) networks are used to improve long-range correlations – global features

  • 23

    Sparse dictionary learning and LSTM networks for file fragment ID – compared to SVM

    • Averaged F1 score = 53.12%• NLP (SVM) approach achieved 49.1 ± 3.15% (Fitzgerald et al., DI 2012)

    Wang et al., in preparation

  • Translating neuroscience into the next generation of computing

    24

    Identify critical aspectsof computationSimulate at high level of neural fidelity

    Identify neurobiological circuits of interestFormalize & optimize neural algorithms

    Translate into NMLalgorithm

  • Optimization of algorithm performance

    Neural algorithm operations arecomputationally expensive (energyand time) due to training; manymatrix-vector operations

    25

    min𝑥𝑥

    𝑦𝑦 − 𝐴𝐴𝐴𝐴 2 + 𝑆𝑆(𝐴𝐴)Sparse coding:

    kk

    jkk w δ∑=∆Backpropagation:

    Make better/smarter algorithms:

    Lee et al., Proc World Cong Eng Comp Sci 2013

    Hardware accelerate algorithms:

    { } { }{ }gFfFFgf ⋅= −1*

    ∑−

    =

    1

    0

    1 N

    k

    jk kN

    j ω

  • 26

    Hardware acceleration of spiking algorithms for time-dependent data processing

    Example: liquid state machine(LSM); a tool for data transformation; randomly connected spiking neurons encode complex temporal dynamics

    Maass et al, Neural Comp 2002, 14, 2531

    Spiking algorithms are often inefficient on conventional hardware…

  • Hardware acceleration of algorithm operations

    27

    Ovtcharov et al.,(Microsoft), FPGAacceleration of CNNs,2015

    Gokmen & Vlasov., (IBM), Resistivecrossbar acceleration of DNNs, 2016

    Coates et al., (Nvidia,Stanford), Deeplearning with GPUs,2013

    Gokhale et al., (Purdue), nnX for accelerating DNNswith ARMs, 2013

  • 28

    Spiking Temporal Processing Unit (STPU)

    t=t1 t=t2

    Impart complex temporal dynamics into neural networks

    Smith et al, “A Novel Digital Neuromorphic Architecture…”, IJCNN 2017

  • 29

    Emulation of a LSM mapped onto an STPU architecture

    • Test data: spoken digits (0-9)• Implement the liquid on the STPU• Use a classifier to categorize the spoken digits

    Assemble an array of LIF neurons and combine with a synaptic map W

    Smith et al, “A Novel Digital Neuromorphic Architecture…”, IJCNN 2017

    Linear Model

    3x3x15 5x5x5 4x5x10 2x2x20

    LinearSVM

    0.906 0.900 0.900 0.914

    LDA 0.921 0.922 0.922 0.946Ridge Regress

    0.745 0.717 0.717 0.897

    Logistic Regress

    0.431 0.254 0.254 0.815

  • 30

    Implementing synaptic connections in hardware for non-spiking neural algorithms

    Agarwal et al, IJCNN 2016, DOI: 10.1109/IJCNN.2016.7727298

    G1,2G1,1

    G2,1

    G3,1

    G2,2

    G3,2

    G1,3

    G2,3

    G3,3

    V1

    V2

    V3

    I1=ΣVi,1Gi,1 I2=ΣVi,2Gi,2 I3=ΣVi,3Gi,3

    wij wkl wij=Gij=(Rij)-1

    Agarwal et al., Front. Neuroscience 2016, 9, 484

    Use variable resistors to implement neural network weights in hardware– saves energy O(n3) to O(n2)

  • Designing, modeling, and fabricating devices with improved neural computing characteristics

    31

    Fuller et al., Adv Mater 2016, 10.1002/adma.201604310van de Brugt et al., Nat Materials 2017, 10.1038/nmat4856

    Current

    VoltageVREAD VSET

    VRESET SET

    RESET

    Mickel et al., Adv Mater, 26, 4486, 2014Landon et al., APL 2015, 107, 023108

    𝑇𝑇𝑠𝑠 = 𝑇𝑇𝑅𝑅𝑅𝑅 + 𝜎𝜎𝑉𝑉2𝑑𝑑𝐸𝐸

    2𝑘𝑘𝐸𝐸𝑑𝑑𝑜𝑜1 −

    𝑘𝑘𝐸𝐸𝑘𝑘𝐹𝐹

    𝑟𝑟𝐹𝐹2

    4𝑑𝑑𝐸𝐸𝑑𝑑𝑜𝑜

    Filament surface temperature (Ts):

    Ta

    Pt

    + +++

    ++

    TaOx

    +

    G vs t

  • Model hardware-acceleration to assess the impact on algorithm performance

    Python wrapper

    32

    aeself

    htmlpdf

    gzip

    jpeggif png

    doc

    0

    90

    98Linear Asymmetric, ν = 1

    Asymmetric, ν = 5 Symmetric, ν = 5

    File Categorization A

    ccuracy

    Read Noise (σRN) Read Noise (σRN)

    Writ

    e No

    ise

    (σWN)

    Writ

    e No

    ise

    (σWN)

    Agarwal et al, IJCNN 2016, DOI: 10.1109/IJCNN.2016.7727298

  • Translating neuroscience into the next generation of computing

    33

    Identify critical aspectsof computationSimulate at high level of neural fidelity

    Identify neurobiological circuits of interestFormalize & optimize neural algorithms

    Translate into NMLalgorithm

  • Thanks for your time!Questions?

    34

  • Backup Slides

    35

  • Applications in imaging and cybersecurity

    36

    100110

    100101

    001

    010100

    0111000

    11

    000111

    010101

    010

    110001

    010111

    010

    010100

    101010

    010

    1111000

    100101

    00

    • The real world is filled with massive amounts ofdata

    • Data needs to be filtered to capture relevantinformation

    • Signatures or features need to be extracted fromdata

    • Features can then be used to interpret activities

    data features interpretationinformation

    ‘68899’‘zipcode’

  • Motivation: determine the local velocity in a flow field

    SNN algorithms are highly parallel and can leverage the time/neuron tradeoff

    Neural algorithms can match or best traditional ‘big O’

    Severa et al, ICRC 2016; DOI: 10.1109/ICRC.2016.7738681

    Spiking network algorithm for computing cross-correlations

    37

  • Trading neurons for time and vice versa

    O(n2) neurons, constant time O(n) neurons, O(n) time

    38

  • 39

    Matrix operations are at the core of many neural computing operations

    Naïve algorithm for matrix multiplication is O(N3).

    kk

    jkk w δ∑=∆Backpropagation: Graph

    Analysis:

  • 40

    Strassen matrix multiplication

    Standard: 8Ms, 4As → O(N3)Strassen: 7Ms, 18A/Ss→ O(N2+ε)

    Strassen, Num Math, 1969

  • 41

    Strassen formulation of matrix multiply enables less than O(N3) neurons – resulting in less power consumption

    “Neural” network for matrix multiplication

  • Resistive switching devices

    Ta

    Pt

    + +++

    ++

    + ++ ++

    + ++ +

    + +++

    ON

    Ta

    Pt

    + +++

    ++

    TaOx

    +

    OFF

    V = I×RI = G×V

    I=I1+I2I1

    I2

    DRAM NAND Flash PC-RAM STT-MRAM FeRAM ReRAM CBRAM

    MaturityProduction

    (20 nm)Production

    (16 nm)Production

    (45 nm)Production

    (65 nm)Production(180 nm)

    Production (180 nm)

    Production(180nm)

    Min device feature F (nm) 20 16 10 y > 10 y > 10 yStackable No Yes Yes No No Yes Yes Process complexity High/FE High/FE Low/BE High/BE High/BE Low/BE Low/BE

    Go G

  • ReRAM is O(N) better than SRAM in energyconsumption for vector-matrix multiply computations

    N row

    s

    M columns

    SRAMs must fetch eachvector per dot product ~O(N2×M)

    Analog computation:multiplier and adder at eachintersection; E ~ CV2 ~O(N×M)

    w11

    w21

    w31

    w41

    w12

    w22

    w32

    w42

    w13

    w23

    w33

    w43

    w14

    w24

    w34

    w44

    V1=x1 +-

    +-

    +-

    +-

    V2=x2

    V3=x3

    V4=x4

    I1=x1*w11 + … + x4*w41

    Agarwal et al., Front. Neuroscience 2016, 9, 484

    43

    Slide Number 1AcknowledgmentsSlide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Translating neuroscience into the next generation of computing – Neural Machine learning (NML)Slide Number 9Slide Number 10Translating neuroscience into �the next generation of computingSlide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Translating neuroscience into �the next generation of computingData-driven computing methods are limited…�by data “Neurogenic deep learning” enables adaptation to changing dataTranslating neuroscience into �the next generation of computingSlide Number 21Slide Number 22Slide Number 23Translating neuroscience into �the next generation of computingSlide Number 25Slide Number 26Slide Number 27Slide Number 28Slide Number 29Slide Number 30Slide Number 31Slide Number 32Translating neuroscience into �the next generation of computingSlide Number 34Backup SlidesApplications in imaging and cybersecuritySlide Number 37Slide Number 38Slide Number 39Slide Number 40Slide Number 41Resistive switching devicesReRAM is O(N) better than SRAM in energy consumption for vector-matrix multiply computations