Scalable Multi-Precision Simulation of Spiking …...Scalable Multi-Precision Simulation of Spiking...

Scalable Multi-Precision

Simulation of Spiking Neural

Networks on GPU with

OpenCL

Dmitri Yudanov

(Advanced Micro Devices, USA)

Leon Reznik

(Rochester Institute of Technology, USA)

WCCI 2012, IJCNN, June 12

Motivation

OpenCL. SNN Simulation Platform

GPU Device Architecture

SNN Simulation Architecture

Results: Verification and Performance

Next Simulator Architecture

Conclusion

Q&A

Agenda

SNN simulation scalability domains: ◦ Network size

◦ Connection count

◦ SNN component models (neuron, synapse, gap junction etc)

◦ Simulation methods (event-driven, time-driven, numerical methods)

◦ Precision

Simulation flexibility and programmability for

heterogeneous environment. OpenCL.

Configuration: ◦ GPU Radeon™ HD 7970 (code-named Tahiti). OpenCL

◦ Izhikevich neuron model

◦ Parker-Sochacki simulation method

Motivation

OpenCL. Simulation Platform

Open Computing Language. Open

standard maintained by Khronos

Group

Four models:

Platform model

Memory model

Programming model

Execution model

Based on B Gaster et al. Heterogeneous Computing

with OpenCL.: Morgan Kaufmann Pub, 2011.

Tahiti GPU Architecture: High Level View

Based on AFDS11 presentation: M Houston and M Mantor. (2011, June) Fusion Developer Summit: AMD Graphics Core Next.

Tahiti GPU Architecture: Compute Unit

Based on AFDS11 presentation: M Houston and M Mantor. (2011, June) Fusion Developer Summit: AMD Graphics Core Next.

Simulation: Computation Flow

Simulation: Update

PS solver is based on sequential implementation of R Stewart and W Bair, "Spiking neural network simulation: numerical integration

with the Parker-Sochacki method," Journal of Computational Neuroscience, vol. 27, no. 1, pp. 115-133, August 2009.

Simulation: Expand

Simulation: Sort

Modified from T Harada and L Howes. (2011, Dec.) Heterogeneous Compute.[Online].

http://www.heterogeneouscompute.org/wordpress/wpcontent/uploads/2011/06/RadixSort.pdf

Radix sort example: 1 bit radix. LSD sort.



Simulation: Address

Results: Verification and Testbench

A unit test for each kernel

A unified integration test

with complete host-device

verification

A variety of compilation

modes

C++ preprocessor-driven

optimizations

XML-driven search script

for the best performing

variant.

User Interface:

Perl script + XML

Microsoft VS

Results: Performance

Network

Size

(neurons)

Average

Synapses

per

Neuron

Average

Events

per Step

Average

Spikes

per Step

Total

Synapse

Count

(millions)

GPU

Time

per Step,

(ms)

CPU

Time

per Step,

(ms)

Time

Ratio

2,100,000 90 230,000 2,522 190 13.5 659 48

131,000 1,458 370,000 257 191 5.7 279 48

16,000 11,677 300,000 25 191 3.2 283 88

Size-connection scalability in multi-precision networks with per-WF precision

allocation.

1000 iterations, 250 us step

Randomly-connected SNN with only AMPA synapses.

GPU: Radeon™ HD 7970, CPU: AMD Phenom™ II, 3.2 GHz (single thread)

Simulator: Next Architecture

Out-of-order flow

with event-based

synchronization

Target-oriented

synaptic matrix

partitioning

Mixed hybrid and

time-driven simulation

flows

Variety of neuron

models

STDP

Just-in-time spike-to-

event expansion

Conclusion

Object-oriented design

Out-of-order execution flows

STDP feature

Linux support

Application examples

User interface (possibly a library with extensions to PyNN)

APU support

Other: root-cause Newton-Raphson divergence, just-in-time spike-to-event expansion,

sort radix scalability.

Multi-precision scalable (neurons, connections, precision) SNN

parallel simulator.

OpenCL, Tahiti architecture.

Fully verified with CPU original implementation.

Up to 90x faster compared to a single thread on CPU.

Future Work (in the order of importance)

Selected Bibliography

R. Stewart and W. Bair, "Spiking neural network simulation: numerical integration with the Parker-Sochacki

method," Journal of Computational Neuroscience, vol. 27, no. 1, pp. 115-33, Aug. 2009.

E. M. Izhikevich, "Simple model of spiking neurons," Neural Networks, IEEE Transactions on, vol. 14, pp.

1569--1572, 2003.

B Gaster, D R Kaeli, L Howes, and P Mistry, Heterogeneous Computing with OpenCL.: Morgan Kaufmann

Pub, 2011.

T Harada and L Howes. (2011, Dec.) “Introduction to GPU Radix Sort.” Heterogeneous Compute. [Online].


M Houston and M Mantor. (2011, June) Fusion Developer Summit: AMD Graphics Core Next. [Online].

http://developer.amd.com/afds

D Yudanov, M Shaaban, R Melton, and L Reznik, "GPU-based simulation of spiking neural networks with

real-time performance & high accuracy," in The 2010 International Joint Conference on Neural Networks

(IJCNN), 2010, pp. 1-8.

Q&A

Lee Howes, Dr. Wu-Chun Feng

Thanks to

Code: http://code.google.com/p/neurosim






Scalable Multi-Precision Simulation of Spiking …...Scalable Multi-Precision Simulation of Spiking...

Documents

Transcript of Scalable Multi-Precision Simulation of Spiking …...Scalable Multi-Precision Simulation of Spiking...