Scalable Multi-Precision Simulation of Spiking …...Scalable Multi-Precision Simulation of Spiking...
Transcript of Scalable Multi-Precision Simulation of Spiking …...Scalable Multi-Precision Simulation of Spiking...
Scalable Multi-Precision
Simulation of Spiking Neural
Networks on GPU with
OpenCL
Dmitri Yudanov
(Advanced Micro Devices, USA)
Leon Reznik
(Rochester Institute of Technology, USA)
WCCI 2012, IJCNN, June 12
Motivation
OpenCL. SNN Simulation Platform
GPU Device Architecture
SNN Simulation Architecture
Results: Verification and Performance
Next Simulator Architecture
Conclusion
Q&A
Agenda
SNN simulation scalability domains: ◦ Network size
◦ Connection count
◦ SNN component models (neuron, synapse, gap junction etc)
◦ Simulation methods (event-driven, time-driven, numerical methods)
◦ Precision
Simulation flexibility and programmability for
heterogeneous environment. OpenCL.
Configuration: ◦ GPU Radeon™ HD 7970 (code-named Tahiti). OpenCL
◦ Izhikevich neuron model
◦ Parker-Sochacki simulation method
Motivation
OpenCL. Simulation Platform
Open Computing Language. Open
standard maintained by Khronos
Group
Four models:
Platform model
Memory model
Programming model
Execution model
Based on B Gaster et al. Heterogeneous Computing
with OpenCL.: Morgan Kaufmann Pub, 2011.
Tahiti GPU Architecture: High Level View
Based on AFDS11 presentation: M Houston and M Mantor. (2011, June) Fusion Developer Summit: AMD Graphics Core Next.
Tahiti GPU Architecture: Compute Unit
Based on AFDS11 presentation: M Houston and M Mantor. (2011, June) Fusion Developer Summit: AMD Graphics Core Next.
Simulation: Computation Flow
Simulation: Update
PS solver is based on sequential implementation of R Stewart and W Bair, "Spiking neural network simulation: numerical integration
with the Parker-Sochacki method," Journal of Computational Neuroscience, vol. 27, no. 1, pp. 115-133, August 2009.
Simulation: Expand
Simulation: Sort
Modified from T Harada and L Howes. (2011, Dec.) Heterogeneous Compute.[Online].
http://www.heterogeneouscompute.org/wordpress/wpcontent/uploads/2011/06/RadixSort.pdf
Radix sort example: 1 bit radix. LSD sort.
Simulation: Address
Results: Verification and Testbench
A unit test for each kernel
A unified integration test
with complete host-device
verification
A variety of compilation
modes
C++ preprocessor-driven
optimizations
XML-driven search script
for the best performing
variant.
User Interface:
Perl script + XML
Microsoft VS
Results: Performance
Network
Size
(neurons)
Average
Synapses
per
Neuron
Average
Events
per Step
Average
Spikes
per Step
Total
Synapse
Count
(millions)
GPU
Time
per Step,
(ms)
CPU
Time
per Step,
(ms)
Time
Ratio
2,100,000 90 230,000 2,522 190 13.5 659 48
131,000 1,458 370,000 257 191 5.7 279 48
16,000 11,677 300,000 25 191 3.2 283 88
Size-connection scalability in multi-precision networks with per-WF precision
allocation.
1000 iterations, 250 us step
Randomly-connected SNN with only AMPA synapses.
GPU: Radeon™ HD 7970, CPU: AMD Phenom™ II, 3.2 GHz (single thread)
Simulator: Next Architecture
Out-of-order flow
with event-based
synchronization
Target-oriented
synaptic matrix
partitioning
Mixed hybrid and
time-driven simulation
flows
Variety of neuron
models
STDP
Just-in-time spike-to-
event expansion
Conclusion
Object-oriented design
Out-of-order execution flows
STDP feature
Linux support
Application examples
User interface (possibly a library with extensions to PyNN)
APU support
Other: root-cause Newton-Raphson divergence, just-in-time spike-to-event expansion,
sort radix scalability.
Multi-precision scalable (neurons, connections, precision) SNN
parallel simulator.
OpenCL, Tahiti architecture.
Fully verified with CPU original implementation.
Up to 90x faster compared to a single thread on CPU.
Future Work (in the order of importance)
Selected Bibliography
R. Stewart and W. Bair, "Spiking neural network simulation: numerical integration with the Parker-Sochacki
method," Journal of Computational Neuroscience, vol. 27, no. 1, pp. 115-33, Aug. 2009.
E. M. Izhikevich, "Simple model of spiking neurons," Neural Networks, IEEE Transactions on, vol. 14, pp.
1569--1572, 2003.
B Gaster, D R Kaeli, L Howes, and P Mistry, Heterogeneous Computing with OpenCL.: Morgan Kaufmann
Pub, 2011.
T Harada and L Howes. (2011, Dec.) “Introduction to GPU Radix Sort.” Heterogeneous Compute. [Online].
http://www.heterogeneouscompute.org/wordpress/wpcontent/uploads/2011/06/RadixSort.pdf
M Houston and M Mantor. (2011, June) Fusion Developer Summit: AMD Graphics Core Next. [Online].
http://developer.amd.com/afds
D Yudanov, M Shaaban, R Melton, and L Reznik, "GPU-based simulation of spiking neural networks with
real-time performance & high accuracy," in The 2010 International Joint Conference on Neural Networks
(IJCNN), 2010, pp. 1-8.
Q&A
Lee Howes, Dr. Wu-Chun Feng
Thanks to
Code: http://code.google.com/p/neurosim