Neural-inspired computing algorithms and hardware for image...
Transcript of Neural-inspired computing algorithms and hardware for image...
-
Photos placed in horizontal position with even amount of white space
between photos and header
Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. SAND2017-4198 C
1
Conrad D. James, Ph.D. Sandia National Laboratories
The Salishan Conference on High-Speed Computing, April 2017
Neural-inspired computing algorithms and hardware for image analysis and cybersecurity applications
-
Acknowledgments
Partnerships: David Follett, Duncan Townsend (Lewis Rhodes Labs); Pamela Follett; Isaac Richter,Engin Ipek (U. Rochester); Felix Wang (UIUC); David Lidsky, Marek Osinski (UNM);
2
Algorithms: Brad Aimone, Ojas Parekh, Nadine Miner, Sandra Faust, Steve Verzi, FrancesChance, Tu-Thach Quach, Chris Lamb, Meghan Galiardi, Samuel Mulder, William Severa,Kristofor Carlson, Michael Smith, Cynthia Phillips, Jacob Hobbs, Robert Abbott; Jeffrey Piersol,
Architecture: John Naegle, Alex Hsia, Sapan Agarwal, Craig Vineyard, Fred Rothganger,Jonathon Donaldson, Gabriel Popoola, Aaron Hill
Learning Hardware: Matt Marinella, Thomas Beechem, Ron Goeke, Alec Talin, Paul Kotula,Farid El Gabaly, Elliot Fuller, Jim Stevens, David Hughart, Andy Armstrong, David Henry, GaadiHaase, Steve Wolfley, Seth Decker, Christopher Saltonstall, Jamison Wagner, John Niroula,Derek Wilke, Michael Van Heukelom, Patrick Finnegan, Carl Smith, Robin Jacobs-Gedrim
Modeling & Applications: Steve Plimpton, Justin Doak, Richard Schiek, Brian Tierney,Robert Bondi, Harry Hjalmarson, Tim Draelos, Jonathan Cox, Joe Ingram, Jason Wheeler
MESA
-
3
Data-driven computing (machine learning)is necessary for real-world problems
C. Lampert, VRML 2013
Data-driven computingConventional numerical computing
yann.lecun.com
-
4
Data-driven (neural-inspired) computing has a complicated history…and mixed results
James et al., BICA 2017, 19, 49
-
5
Neural-inspired algorithms are achieving success but several challenges remain
Karpathy etc. NIPS 2014, 1889 Kemelmacher et al., CVPR 2016
-
Advanced Scientific & Data-DrivenComputing
Deployable National Security Applications
Computing & Information Science
Nanodevices & µSystems
Neural computing at Sandia Labsleverages a large research foundation
Computational Neuroscience
Algorithms Theory
Enabling Hardware
Neural Machine-Learning Algorithms
Formal Neural Computing Theory
Configurable CMOS Neural Architectures
Adaptive post-CMOS Neural Architectures
UQ of Neural Algorithms
Neural Computing Capabilities
Modeling & Simulation
UQ/SA
Beyond Moore Computing Devices
Micro-Sensors
Non-Von Neumann Architectures
MESA
Neural-inspired Communication
Adaptive memory management
Robust machine learning
Cybersecurity
Embedded Pattern Recognition Systems
Smart Sensor Technology
Machine Learning
CMOS & BEOL Technology
6
-
Hardware Acceleration of Adaptive Neural Algorithms (HAANA)
Neurogenesis deep learning: Draelos et al, IJCNN 2017Spiking network algorithms: Severa et al., ICRC 2016Digital neuromorphic architecture: Smith et al., IJCNN 2017
Resistive switching model: Mickel et al, Adv Mater 2014Electrochemical transistor: Fuller et al., Adv Mater 2016Resistive crossbar accelerator: Agarwal et al., IJCNN 2016
7
-
Translating neuroscience into the next generation of computing – Neural Machine learning (NML)
8
Identify critical aspectsof computationSimulate at high level of neural fidelity
Identify neurobiological circuits of interestFormalize & optimize neural algorithms
Translate into NMLalgorithm
-
9
Cox, Aimone, James; Complex Adaptive Systems, Nov. 2015; Procedia Comp Sci 61, 349
Example cyber problem: file identificationusing deep neural networks
*.aes*.elf *.pdf
*.jpeg*.gif *.png
*.doc*.gzip *.html
-
Limitations of supervised machine learning
10
PSDByte ValueEntropy
3 features - Predicted ClassHTML PNG JPEG GIF PDF DOC ELF GZIP AES
Act
ual C
lass
HTML 100PNG 91 1 1 1 1 1 4JPEG 1 99GIF 100PDF 1 3 1 95DOC 1 99ELF 100GZIP 100AES 5 95
Cox, Aimone, James; Complex Adaptive Systems, Nov. 2015; Procedia Comp Sci 61, 349
• Supervised learning requires subjectmatter experts to hand-craft features
• Data-driven algorithms are limited…bythe data
-
Translating neuroscience into the next generation of computing
11
Identify critical aspectsof computationSimulate at high level of neural fidelity
Identify neurobiological circuits of interestFormalize & optimize neural algorithms
Translate into NMLalgorithm
-
12
entorhinal cortex
dentate gyrus
CA3
CA1
Leveraging computational neuroscience models to develop new algorithms
dentate gyrus
similar EC inputs
increased sparsity leads to lossless decorrelation of
DG outputs
Vineyard et al., IJCNN 2016, DOI: 10.1109/IJCNN.2016.7727884
wikipedia
-
13
Modeling the pattern separation function of the hippocampus
Severa et al., Neural Computation 2017, 29, 94
dentate gyrus
similar EC inputs
increased sparsity leads to decorrelated DG outputs
-
14
Quantifying the sparsity transformation in the hippocampus
Severa et al., Neural Computation 2017, 29, 94
Increased samplingfrom input to output →increased sparsity
This approach is feasible for well-defined inputs – need to formalize an algorithm to account for unknown inputs
?
?
-
15
Algorithm inspiration: neurogenesis in biological networks
Dieni et al, Nature Comm 2016, 7:11313
Adult neurogenesis improves information capacity…?
entorhinal cortex
dentate gyrus
CA3
CA1wikipedia
-
16
Severa et al., Neural Computation 2017, 29, 94
Neurogenesis may provide flexible encoding strategies for particular brain regions
Aimone, Deng and Gage, Neuron 2011, 70, 589
Mature neuron
Immature neuron
Information space
large pInformation space
intermediate pInformation space
-
Translating neuroscience into the next generation of computing
17
Identify critical aspectsof computationSimulate at high level of neural fidelity
Identify neurobiological circuits of interestFormalize & optimize neural algorithms
Translate into NMLalgorithm
-
Data-driven computing methods are limited…by data
Encoding / classification
path
Decoding / reconstruction
path
784
300
250
200
1,7
0
23
4
5 6
8
9
18
00112233445566778899
Draelos et al, ICLR 2016, IJCNN 2017
-
Encoding / classification
path
Decoding / reconstruction
path
784
300
250
200
1,7
0
23
4
5 6
8
9
“Neurogenic deep learning” enables adaptation to changing data
19
00112233445566778899
Draelos et al, ICLR 2016, IJCNN 2017
-
Translating neuroscience into the next generation of computing
20
Identify critical aspectsof computationSimulate at high level of neural fidelity
Identify neurobiological circuits of interestFormalize & optimize neural algorithms
Translate into NMLalgorithm
-
21
Categorizing cyber data under imperfect conditions with sparse coding
• Training data is not always available, fragmented
• Limited expertise; hand-engineered features
≈ ∗
Sparse Representation
Learned DictionaryFragment
*.aes*.elf *.pdf
*.jpeg*.gif *.png
*.doc*.gzip *.html
𝑦𝑦 ≈ 𝐴𝐴𝐴𝐴min𝑥𝑥
𝑦𝑦 − 𝐴𝐴𝐴𝐴 2 + 𝑆𝑆(𝐴𝐴)
-
22
Wang et al., in preparation
Generating local and global features from file fragments
• Byte dictionary patches & sparse representations of fragments – local features
Graves et al., ASRU2013Hochreiter & Schmidhuber, Neur Comp 1997
• Long short-term memory (LSTM) networks are used to improve long-range correlations – global features
-
23
Sparse dictionary learning and LSTM networks for file fragment ID – compared to SVM
• Averaged F1 score = 53.12%• NLP (SVM) approach achieved 49.1 ± 3.15% (Fitzgerald et al., DI 2012)
Wang et al., in preparation
-
Translating neuroscience into the next generation of computing
24
Identify critical aspectsof computationSimulate at high level of neural fidelity
Identify neurobiological circuits of interestFormalize & optimize neural algorithms
Translate into NMLalgorithm
-
Optimization of algorithm performance
Neural algorithm operations arecomputationally expensive (energyand time) due to training; manymatrix-vector operations
25
min𝑥𝑥
𝑦𝑦 − 𝐴𝐴𝐴𝐴 2 + 𝑆𝑆(𝐴𝐴)Sparse coding:
kk
jkk w δ∑=∆Backpropagation:
Make better/smarter algorithms:
Lee et al., Proc World Cong Eng Comp Sci 2013
Hardware accelerate algorithms:
{ } { }{ }gFfFFgf ⋅= −1*
∑−
=
1
0
1 N
k
jk kN
j ω
-
26
Hardware acceleration of spiking algorithms for time-dependent data processing
Example: liquid state machine(LSM); a tool for data transformation; randomly connected spiking neurons encode complex temporal dynamics
Maass et al, Neural Comp 2002, 14, 2531
Spiking algorithms are often inefficient on conventional hardware…
-
Hardware acceleration of algorithm operations
27
Ovtcharov et al.,(Microsoft), FPGAacceleration of CNNs,2015
Gokmen & Vlasov., (IBM), Resistivecrossbar acceleration of DNNs, 2016
Coates et al., (Nvidia,Stanford), Deeplearning with GPUs,2013
Gokhale et al., (Purdue), nnX for accelerating DNNswith ARMs, 2013
-
28
Spiking Temporal Processing Unit (STPU)
t=t1 t=t2
Impart complex temporal dynamics into neural networks
Smith et al, “A Novel Digital Neuromorphic Architecture…”, IJCNN 2017
-
29
Emulation of a LSM mapped onto an STPU architecture
• Test data: spoken digits (0-9)• Implement the liquid on the STPU• Use a classifier to categorize the spoken digits
Assemble an array of LIF neurons and combine with a synaptic map W
Smith et al, “A Novel Digital Neuromorphic Architecture…”, IJCNN 2017
Linear Model
3x3x15 5x5x5 4x5x10 2x2x20
LinearSVM
0.906 0.900 0.900 0.914
LDA 0.921 0.922 0.922 0.946Ridge Regress
0.745 0.717 0.717 0.897
Logistic Regress
0.431 0.254 0.254 0.815
-
30
Implementing synaptic connections in hardware for non-spiking neural algorithms
Agarwal et al, IJCNN 2016, DOI: 10.1109/IJCNN.2016.7727298
G1,2G1,1
G2,1
G3,1
G2,2
G3,2
G1,3
G2,3
G3,3
V1
V2
V3
I1=ΣVi,1Gi,1 I2=ΣVi,2Gi,2 I3=ΣVi,3Gi,3
wij wkl wij=Gij=(Rij)-1
Agarwal et al., Front. Neuroscience 2016, 9, 484
Use variable resistors to implement neural network weights in hardware– saves energy O(n3) to O(n2)
-
Designing, modeling, and fabricating devices with improved neural computing characteristics
31
Fuller et al., Adv Mater 2016, 10.1002/adma.201604310van de Brugt et al., Nat Materials 2017, 10.1038/nmat4856
Current
VoltageVREAD VSET
VRESET SET
RESET
Mickel et al., Adv Mater, 26, 4486, 2014Landon et al., APL 2015, 107, 023108
𝑇𝑇𝑠𝑠 = 𝑇𝑇𝑅𝑅𝑅𝑅 + 𝜎𝜎𝑉𝑉2𝑑𝑑𝐸𝐸
2𝑘𝑘𝐸𝐸𝑑𝑑𝑜𝑜1 −
𝑘𝑘𝐸𝐸𝑘𝑘𝐹𝐹
𝑟𝑟𝐹𝐹2
4𝑑𝑑𝐸𝐸𝑑𝑑𝑜𝑜
Filament surface temperature (Ts):
Ta
Pt
+ +++
++
TaOx
+
G vs t
-
Model hardware-acceleration to assess the impact on algorithm performance
Python wrapper
32
aeself
htmlpdf
gzip
jpeggif png
doc
0
90
98Linear Asymmetric, ν = 1
Asymmetric, ν = 5 Symmetric, ν = 5
File Categorization A
ccuracy
Read Noise (σRN) Read Noise (σRN)
Writ
e No
ise
(σWN)
Writ
e No
ise
(σWN)
Agarwal et al, IJCNN 2016, DOI: 10.1109/IJCNN.2016.7727298
-
Translating neuroscience into the next generation of computing
33
Identify critical aspectsof computationSimulate at high level of neural fidelity
Identify neurobiological circuits of interestFormalize & optimize neural algorithms
Translate into NMLalgorithm
-
Thanks for your time!Questions?
34
-
Backup Slides
35
-
Applications in imaging and cybersecurity
36
100110
100101
001
010100
0111000
11
000111
010101
010
110001
010111
010
010100
101010
010
1111000
100101
00
• The real world is filled with massive amounts ofdata
• Data needs to be filtered to capture relevantinformation
• Signatures or features need to be extracted fromdata
• Features can then be used to interpret activities
data features interpretationinformation
‘68899’‘zipcode’
-
Motivation: determine the local velocity in a flow field
SNN algorithms are highly parallel and can leverage the time/neuron tradeoff
Neural algorithms can match or best traditional ‘big O’
Severa et al, ICRC 2016; DOI: 10.1109/ICRC.2016.7738681
Spiking network algorithm for computing cross-correlations
37
-
Trading neurons for time and vice versa
O(n2) neurons, constant time O(n) neurons, O(n) time
38
-
39
Matrix operations are at the core of many neural computing operations
Naïve algorithm for matrix multiplication is O(N3).
kk
jkk w δ∑=∆Backpropagation: Graph
Analysis:
-
40
Strassen matrix multiplication
Standard: 8Ms, 4As → O(N3)Strassen: 7Ms, 18A/Ss→ O(N2+ε)
Strassen, Num Math, 1969
-
41
Strassen formulation of matrix multiply enables less than O(N3) neurons – resulting in less power consumption
“Neural” network for matrix multiplication
-
Resistive switching devices
Ta
Pt
+ +++
++
+ ++ ++
+ ++ +
+ +++
ON
Ta
Pt
+ +++
++
TaOx
+
OFF
V = I×RI = G×V
I=I1+I2I1
I2
DRAM NAND Flash PC-RAM STT-MRAM FeRAM ReRAM CBRAM
MaturityProduction
(20 nm)Production
(16 nm)Production
(45 nm)Production
(65 nm)Production(180 nm)
Production (180 nm)
Production(180nm)
Min device feature F (nm) 20 16 10 y > 10 y > 10 yStackable No Yes Yes No No Yes Yes Process complexity High/FE High/FE Low/BE High/BE High/BE Low/BE Low/BE
Go G
-
ReRAM is O(N) better than SRAM in energyconsumption for vector-matrix multiply computations
N row
s
M columns
SRAMs must fetch eachvector per dot product ~O(N2×M)
Analog computation:multiplier and adder at eachintersection; E ~ CV2 ~O(N×M)
w11
w21
w31
w41
w12
w22
w32
w42
w13
w23
w33
w43
w14
w24
w34
w44
V1=x1 +-
+-
+-
+-
V2=x2
V3=x3
V4=x4
I1=x1*w11 + … + x4*w41
Agarwal et al., Front. Neuroscience 2016, 9, 484
43
Slide Number 1AcknowledgmentsSlide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Translating neuroscience into the next generation of computing – Neural Machine learning (NML)Slide Number 9Slide Number 10Translating neuroscience into �the next generation of computingSlide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Translating neuroscience into �the next generation of computingData-driven computing methods are limited…�by data “Neurogenic deep learning” enables adaptation to changing dataTranslating neuroscience into �the next generation of computingSlide Number 21Slide Number 22Slide Number 23Translating neuroscience into �the next generation of computingSlide Number 25Slide Number 26Slide Number 27Slide Number 28Slide Number 29Slide Number 30Slide Number 31Slide Number 32Translating neuroscience into �the next generation of computingSlide Number 34Backup SlidesApplications in imaging and cybersecuritySlide Number 37Slide Number 38Slide Number 39Slide Number 40Slide Number 41Resistive switching devicesReRAM is O(N) better than SRAM in energy consumption for vector-matrix multiply computations