HSA Overview
-
Upload
hsa-foundation -
Category
Technology
-
view
8.720 -
download
0
Transcript of HSA Overview
2 | Heterogeneous System Architecture | June 2012
INTRODUCING HETEROGENEOUS SYSTEM ARCHITECTURE (HSA)
HSA is a purpose designed architecture to enable the
software ecosystem to combine and exploit the
complementary capabilities of sequential programming
elements (CPUs) and parallel processing elements (such as
GPUs) to deliver new capabilities to users that go beyond
the traditional usage scenarios
AMD is making HSA an open standard to jumpstart the
ecosystem
3 | Heterogeneous System Architecture | June 2012
EFFECTIVE COMPUTE OFFLOAD IS MADE EASY BY HSA
Data Parallel Workloads
Graphics Workloads
APP Accelerated Software Applications
Serial and Task Parallel Workloads
Accelerated Processing Unit
4 | Heterogeneous System Architecture | June 2012
AMD HSA FEATURE ROADMAP
Architectural
Integration
Unified Address Space
for CPU and GPU
Fully coherent memory
between CPU & GPU
GPU uses pageable
system memory via
CPU pointers
Optimized
Platforms
Bi-Directional Power
Mgmt between CPU
and GPU
GPU Compute C++
support
HSA Memory
Management Unit
Physical
Integration
Integrate CPU & GPU
in silicon
Unified Memory
Controller
Common
Manufacturing
Technology
System
Integration
GPU compute context
switch
Quality of service
GPU graphics pre-
emption
5 | Heterogeneous System Architecture | June 2012
HSA COMPLIANT FEATURES
Optimized
Platforms
Bi-Directional Power
Mgmt between CPU
and GPU
GPU Compute C++
support
HSA Memory
Management Unit
Support OpenCL C++ directions and Microsoft’s upcoming C++ AMP language.
This eases programming of both CPU and GPU working together to process
parallel workloads, such as Computer Vision, Video Encoding/Transcoding, etc.
CPU and GPU can share system memory. This means all system memory is
accessible by both CPU or GPU, depending on need. In today’s world, only a
subset of system memory can be used by the GPU.
Enables “power sloshing” where CPU and GPU are able to dynamically lower or
raise their power and performance, depending on the activity and which one is
more suited to the task at hand.
6 | Heterogeneous System Architecture | June 2012
HSA COMPLIANT FEATURES
The unified address space provides ease of programming for developers to create
applications. For HSA platforms, a pointer is really a pointer and does not require
separate memory pointers for CPU and GPU.
The GPU can take advantage of the CPU virtual address space. With pageable
system memory, the GPU can reference the data directly in the CPU domain. In
prior architectures, data had to be copied between the two spaces or page-locked
prior to use.
Allows for data to be cached by both the CPU and the GPU, and referenced by
either. In all previous generations, GPU caches had to be flushed at command
buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU
and GPU in an APU share a high speed coherent bus.
Architectural
Integration
Unified Address Space
for CPU and GPU
Fully coherent memory
between CPU & GPU
GPU uses pageable
system memory via
CPU pointers
7 | Heterogeneous System Architecture | June 2012
GPU tasks can be context switched, making the GPU a multi-tasker. Context
switching means faster application, graphics and compute
interoperation. Users get a snappier, more interactive experience.
As more applications enjoy the performance and features of the GPU, it is important
that interactivity of the system is good. This means low latency access to the GPU
from any process.
With context switching and pre-emption, time criticality is added to the tasks
assigned to the processors. Direct access to the hardware for multi-users or
multiple applications are either prioritized or equalized.
FULL HSA FEATURES
System
Integration
GPU compute context
switch
Quality of service
GPU graphics pre-
emption
8 | Heterogeneous System Architecture | June 2012
PROBLEM
~10M+
CPU Coders
~4M+
Apps
Good User
Experiences
Developers historically program CPUs
~100K
GPU
Coders
~200+
Apps
Significant niche
Value
GPU/HW blocks hard to program
Not all workloads accelerate
Developer
Return (Differentiation in
Performance,
Power,
Features,
Time2Market)
Developer Investment (Effort, Time, New skills)
HSA + SDKs =
Productivity & Performance with low Power
Few M HSA
Coders
Few K
Apps
Wide range of
Differentiated
Experiences
SOLUTION
UNLEASHING DEVELOPER INNOVATION
9 | Heterogeneous System Architecture | June 2012
HSA SOLUTION STACK
Application SW
Drivers
Differentiated HW CPU(s) GPU(s) Other
Accelerators
HSA Finalizer
Legacy
Drivers
Application
Domain Specific Libs
(Bolt, OpenCV™, … many others)
HSA Runtime
DirectX
Runtime
Other
Runtime
HSAIL
GPU ISA
OpenCL™
Runtime
How we deliver the HSA value
proposition
Overall Vision:
– Make GPU easily accessible
Support mainstream languages
Expandable to domain specific
languages
– Make compute offload efficient
Direct path to GPU (avoid Graphics
overhead)
Eliminate memory copy
Low-latency dispatch
– Make it ubiquitous
Drive HSA as a standard through
HSA Foundation
Open Source key components
Knl Driver
Ctl
HSA Software
10 | Heterogeneous System Architecture | June 2012
HSA INTERMEDIATE LAYER - HSAIL
HSAIL is a virtual ISA for parallel programs
Finalized to native ISA by a JIT compiler or “Finalizer”
Allow rapid innovations in native GPU architectures
HSAIL will be constant across implementations
Explicitly parallel
Designed for data parallel programming
Support for exceptions, virtual functions, and other high level language features
Syscall methods
GPU code can call directly to system services, IO, printf, etc
Debugging support
11 | Heterogeneous System Architecture | June 2012
C++ AMP
C++ AMP: a data parallel programming model initiated by Microsoft for accelerators
First announced at the 2011 AFDS
C++ based higher level programming model with advanced C++11 features
Single source model to well integrate host and device programming
Implicit programming model that is “future proofed” to enable HSA features, e.g. avoiding
host-to-device copies
A C++ AMP implementation available from the Microsoft Visual Studio 11 suite under a beta
release
12 | Heterogeneous System Architecture | June 2012
C++ AMP AND HSA
Compute-focused efficient HSA implementation to replace a graphics-centric implementation
for C++ AMP
E.g. low latency dispatch, HSAIL enabled
The shared virtual memory in HSA eliminates the data copies between host and device in
existing C++ AMP programs without any source changes.
Additional advanced C++ features on GPU, e.g.
More data types
Function calls
Virtual functions
Arbitrary control flow
Exceptional handling
Device and platform atomics
13 | Heterogeneous System Architecture | June 2012
OPENCL™ AND HSA
HSA is an optimized platform architecture for OpenCL™
Not an alternative to OpenCL™
OpenCL™ on HSA will benefit from
Avoidance of wasteful copies
Low latency dispatch
Improved memory model
Pointers shared between CPU and GPU
HSA also exposes a lower level programming interface, for those that want the
ultimate in control and performance
Optimized libraries may choose the lower level interface
14 | Heterogeneous System Architecture | June 2012
HSA TAKING PLATFORM TO PROGRAMMERS
Balance between CPU and GPU for performance and power efficiency
Make GPUs accessible to wider audience of programmers
Programming models close to today’s CPU programming models
Enabling more advanced language features on GPU
Shared virtual memory enables complex pointer-containing data structures (lists, trees,
etc) and hence more applications on GPU
Kernel can enqueue work to any other device in the system (e.g. GPU->GPU, GPU->CPU)
• Enabling task-graph style algorithms, Ray-Tracing, etc
Clearly defined HSA memory model enables effective reasoning for parallel programming
HSA provides a compatible architecture across a wide range of programming models and
HW implementations.
15 | Heterogeneous System Architecture | June 2012
THE HSA FOUNDATION - BRINGING ABOUT THE NEXT GENERATION PLATFORM
An open standardization body to bring about broad industry support for Heterogeneous
Computing via the full value chain Silicon IP to ISV.
GPU computing as a first class co-processor to the CPU through architecture definition
Architectural support for special purpose hardware accelerators ( Rasterizer, Security
Processors, DSP, etc.)
Own and evolve the specifications and conformance suite
Bring to market strong development solutions to drive innovative advanced content and
applications
Cultivate programing talent via HSA developer training and academic programs
17 | Heterogeneous System Architecture | June 2012
Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies,
omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not
limited to product and roadmap changes, component and motherboard version changes, new model and/or product
releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the
like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise
this information and to make changes from time to time to the content hereof without obligation to notify any person of such
revisions or changes.
NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO
RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
INFORMATION.
ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE
EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT,
INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names
used in this presentation are for informational purposes only and may be trademarks of their respective owners.
OpenCL is a trademark of Apple Inc. used by permission by Khronos.
© 2012 Advanced Micro Devices, Inc.