Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
-
Upload
omer-kilic -
Category
Technology
-
view
911 -
download
4
description
Transcript of Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded. Erlang in the Heterogeneous Computing World
Omer Kilic || @OmerK
Slide 2 of 46
Outline
• Challenges in modern computing systems
• Heterogeneous computing
• Co-processors and accelerators
• Programming models and tools
• Alternate architectures
• Parallella Vision System
• Erlang Embedded Project
• Q&A
10/12/2013 Build Stuff 2013
Slide 3 of 46
Challenges: Software
• Frequency wall
• Memory bottlenecks
• Software complexity
10/12/2013 Build Stuff 2013
Slide 4 of 46
Amdahl’s Law
• “…the maximum speed-up through parallel processing is set by the amount of code which has to run serial”
10/12/2013 Build Stuff 2013
Slide 5 of 46
Challenges: Hardware
• Yield issues
• Wiring and interconnect
• Thermal density
•Power consumption
End of Moore’s law imminent…
10/12/2013 Build Stuff 2013
Slide 6 of 46
Challenges
“With nearly 10 billion devices connected to the internet and predictions for exponential growth, we’ve reached a point where the space, power, and cost demands of traditional technology are no longer sustainable.”
10/12/2013 Build Stuff 2013
Meg Whitman President and CEO, HP
Slide 7 of 46
Internet of Things
10/12/2013 Build Stuff 2013
Slide 8 of 46
Device Architectures (I)
10/12/2013 Build Stuff 2013
Slide 9 of 46
Device Architectures (II)
10/12/2013 Build Stuff 2013
Slide 10 of 46
Heterogeneous Computing (I)
• Special purpose, highly specialised architectures will outperform general purpose processing devices
– Possibly by orders of magnitude
– In terms of energy efficiency as well as raw speed
– Parallel execution is key
• Non-programmable/pseudo-programmable accelerators: ASIC, DSP, GPU, …
• Fully programmable accelerators: FPGAs
10/12/2013 Build Stuff 2013
Slide 11 of 46
Open Compute Project
10/12/2013 Build Stuff 2013
Slide 12 of 46
Heterogeneous Computing (II)
10/12/2013 Build Stuff 2013
Slide 13 of 46
GPUs
10/12/2013 Build Stuff 2013
Slide 14 of 46
Anatomy of a GPU
10/12/2013 Build Stuff 2013
Slide 15 of 46
Co-processors: NetFPGA 10G
10/12/2013 Build Stuff 2013
Slide 16 of 46
Co-processors: Generic COTS devices
10/12/2013 Build Stuff 2013
Slide 17 of 46
Landscape of accelerator programming
10/12/2013 Build Stuff 2013
Interface CUDA OpenCL DirectCompute RenderScript
Originator NVIDIA Khronos (Apple) Microsoft Google
Year 2007 2008 2009 2011
Area HPC, desktop Desktop, mobile, embedded, HPC
Desktop Mobile
OS Windows, Linux, Mac OS
Windows, Linux, Mac OS (10.6+)
Windows (Vista+) Android (3.0+)
Devices GPUs (NVIDIA) CPUs, GPUs, custom
GPUs (NVIDIA, AMD)
CPUs, GPUs, DSPs
Work unit Kernel Kernel Compute shader Compute script
Language CUDA C/C++ OpenCL C HLSL Script C
Distributed Source, PTX Source Source, bytecode LLVM bitcode
From: “The landscape of accelerator programming: a view from ARM”, Lokhmotov, A., 3rd UK GPU Computing Conference, London
Slide 18 of 46
Accelerator types
• Programmable accelerators
– CPU Vector extensions: x86/SSE/AVX, PowerPC/VMX, ARM/NEON
– GPUs supporting general-purpose computing (GPGPUs)
– Sony/Toshiba/IBM Cell (Sony PlayStation 3, HPC)
– ClearSpeed CSX (HPC, embedded)
– Adapteva Epiphany (HPC, mobile)
– Intel MIC (HPC)
10/12/2013 Build Stuff 2013
Slide 19 of 46
Programming accelerators
• Proprietary low-level APIs, typically C-based:
– Vector intrinsics
– NVIDIA CUDA
– ATI Brook+
– ClearSpeed Cn
• No software portability, obsolescence risk.
10/12/2013 Build Stuff 2013
Slide 20 of 46
OpenCL (I)
“OpenCL (Open Computing Language) is an open, royalty-free standard for general-purpose parallel programming of heterogeneous systems. OpenCL provides a uniform programming environment for
software developers to write efficient, portable code for high-performance compute servers, desktop computer systems and handheld devices using a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and
other parallel processors such as DSPs.”
10/12/2013 Build Stuff 2013
Slide 21 of 46
OpenCL (II)
• Allows you to write C like code which executes on GPUs and many other devices
– CPUs, FPGAs, various other architectures
• Key point is data parallelism: applying the same function to a large amount of data
• Allows us to leverage devices like GPUs from Erlang easily with a minimal wrapper
10/12/2013 Build Stuff 2013
Slide 22 of 46
The Parallella Board
10/12/2013 Build Stuff 2013
Slide 23 of 46
Shiny prototype!
10/12/2013 Build Stuff 2013
Slide 24 of 46
The Parallella Board
10/12/2013 Build Stuff 2013
Slide 25 of 46
Epiphany Architecture
10/12/2013 Build Stuff 2013
Slide 26 of 46
Epiphany-IV 64-core 28nm (E64G401)
• 64 High Performance RISC CPU Cores • 800 MHz Operating Frequency • 100 GFLOPS Peak Performance • 1.6 TB/s Local Memory Bandwidth • 102 GB/s Network-On-Chip Bisection Bandwidth • 6.4 GB/s Off-Chip Bandwidth • 2 MB On-Chip Distributed Shared Memory • 2 Watt Maximum Chip Power Consumption • IEEE Floating Point Instruction Set • Fully-featured ANSI-C/C++ programmable • GNU/Eclipse based tool chain • Source synchronous LVDS off chip links for host or direct chip-to-
chip interfacing. • Chip to chip links for integrating up to 64 chips on a single board
10/12/2013 Build Stuff 2013
Slide 27 of 46
Parallella Vision Demo - Overview
10/12/2013 Build Stuff 2013
Slide 28 of 46
Parallella Vision Demo - Cameras
10/12/2013 Build Stuff 2013
Slide 29 of 46
Parallella Vision Demo - Architecture
10/12/2013 Build Stuff 2013
Slide 30 of 46
OpenCL and Erlang
• Erlang is not that great for crunching image data.
– This is where OpenCL fits in.
• Erlang provides an environment around OpenCL. Our server implementation collect frames, offloads processing to Epiphany and send results back.
– Low latency distributed communications and message passing between processes and nodes
– Monitoring and supervision facilities
– “Glue” between heterogeneous nodes
10/12/2013 Build Stuff 2013
Slide 31 of 46
OpenCL on the Parallella
• Parallella is a little different than standard GPUs
– Work sizes are different (smaller amount of cores compared to GPU)
– Requires some forethought into structuring your kernels
10/12/2013 Build Stuff 2013
Slide 32 of 46
Parallella and Erlang
• Ubuntu armhf packages up and running
– Will be included in the standard distro image
• Vision Demo code available now
– https://github.com/esl/parcv
10/12/2013 Build Stuff 2013
Slide 34 of 46
Embedded Landscape
10/12/2013 Build Stuff 2013
Slide 35 of 46
#include <stats.h>
Source: http://embedded.com/electronics-blogs/programming-pointers/4372180/Unexpected-trends
10/12/2013 Build Stuff 2013
Slide 36 of 46
External Interfaces in Erlang
10/12/2013 Build Stuff 2013
Slide 37 of 46
Accessing hardware
• Peripherals are memory mapped
• Access via /dev/mem…
– Faster, needs root, potentially dangerous!
• …or by kernel modules/sysfs
– Slower, doesn’t need root, easier, relatively safer
Generally very messy…
10/12/2013 Build Stuff 2013
Slide 38 of 46
Introducing…
Erlang/ALE
10/12/2013 Build Stuff 2013
http://github.com/esl/erlang-ale
Actor
Library for
Embedded
Slide 39 of 46
Erlang/ALE
• Brings embedded peripheral interfaces into the Erlang domain
• Provides easy to use, familiar abstractions for Erlang programmers
• Uses Raspberry Pi as reference platform, easy to port it to other embedded platforms
• Open source (Apache version 2)
10/12/2013 Build Stuff 2013
Slide 40 of 46
Beta release
• Based on pihwm
– http://omerk.github.io/pihwm
• GPIO and GPIO interrupts, SPI, I2C and PWM peripherals supported
• Documentation, supporting material and educational package under development
10/12/2013 Build Stuff 2013
Slide 41 of 46
ALE Example: Blink!
{ok, _} = gpio:start_link(?LED_PIN, output),
blink() ->
gpio:write(?LED_PIN, 1),
timer:sleep(1000),
gpio:write(?LED_PIN, 0),
timer:sleep(1000).
10/12/2013 Build Stuff 2013
Slide 42 of 46
ALE Example: Interrupts
{ok, _} = gpio:start_link(?IN_PIN, input),
ok = gpio:set_int(?IN_PIN, rising),
handle_info({gpio_interrupt, _Pin, _Condition}, State) ->
blink().
10/12/2013 Build Stuff 2013
Slide 43 of 46
Hardware Projects – Demo Board
10/12/2013 Build Stuff 2013
Slide 44 of 46
Packages for Embedded Architectures
https://www.erlang-solutions.com/downloads/download-erlang-otp
10/12/2013 Build Stuff 2013
Slide 45 of 46
10/12/2013 Build Stuff 2013
Erlang
Slide 46 of 46
Thank you
• http://erlang-embedded.com
• @ErlangEmbedded
10/12/2013 Build Stuff 2013
The world is concurrent. Things in the world don't share data. Things communicate with messages. Things fail.
- Joe Armstrong Father of Erlang
“