Introduction Parallel Heterogeneous Computing Final

download Introduction Parallel Heterogeneous Computing Final

of 35

Transcript of Introduction Parallel Heterogeneous Computing Final

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    1/35

    Introduction to Parallel andHeterogeneous Computing

    Benedict R. Gaster| October, 2010

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    2/35

    | Introduction to Parallel and Heterogeneous Computing| October, 20102

    Agenda

    Motivation

    A little terminology

    Hardware in a heterogeneous world

    Software in a heterogeneous world

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    3/35

    | Introduction to Parallel and Heterogeneous Computing| October, 20103

    The Free Lunch is Over

    Herb Sutter (2005)

    Hardware can no longer depend on getting:

    Increased clock speed

    Execution optimization (i.e. instruction levelparallelism)

    Larger caches

    How has and is this being addressed?

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    4/35

    | Introduction to Parallel and Heterogeneous Computing| October, 20104

    Solution

    Parallelism

    (lots of it!)

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    5/35

    | Introduction to Parallel and Heterogeneous Computing| October, 20105

    Quick stop to cover a bit of terminology

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    6/35

    | Introduction to Parallel and Heterogeneous Computing| October, 20106

    Definitions

    Parallelism

    A property of a computation where portions of thecalculations are independent of each other, allowingthem to be executed at the same time.

    For example, consider the following pseudo code:

    Assignments a, b, c, andd are independent, socan be run in parallel

    float a = E + A;float b = E + B;float c = E + C;float d = E + D;float r = a + b + c + d;

    float a = E + A;float b = E + B;float c = E + C;float d = E + D;float r = a + b + c + d;

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    7/35

    | Introduction to Parallel and Heterogeneous Computing| October, 20107

    Definitions

    Concurrency

    A logical programming abstraction used to arbitratecommunication between multiple processing entities(like processes or threads).

    For example, concurrency can be used to build user

    interfaces and other asynchronous tasks. Concurrency is NOT the same as parallelism

    Does no preclude running tasks in parallel, it is not anecessary component.

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    8/35

    | Introduction to Parallel and Heterogeneous Computing| October, 20108

    Definitions

    Heterogenous Computing

    A system comprised of two or more compute engineswith signficant structural differences

    In our case, a low latency x86 CPU and a highthroughput Radeon GPU

    Fusion

    Bringing together two or more components and joiningthem into a single unified whole

    In our case, combining CPUs and GPUs on a single

    silicon die for higher performance and lower power

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    9/35

    | Introduction to Parallel and Heterogeneous Computing| October, 20109

    Hardware in a heterogeneous world

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    10/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201010

    AMD Balanced Platform Advantage

    Delivers optimal performance for a wide range ofplatform configurations

    Other HighlyParallel Workloads

    Graphics Workloads

    Serial/Task-ParallelWorkloads

    CPU is ideal for scalar processing

    Out of order x86 cores with low

    latency memory access

    Optimized for sequential andbranching algorithms

    Runs existing applications very well

    GPU is ideal for parallel processing

    GPU shaders optimized for

    throughput computing

    Ready for emerging workloads

    Media processing, simulation, naturalUI, etc

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    11/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201011

    Three Eras of Processor Performance

    Single-CoreEra

    Single-thread

    Performance

    ?

    Time

    we arehere

    o

    Enabled by: Moores Law

    Voltage Scaling MicroArchitecture

    Constrained by:Power

    Complexity

    Multi-CoreEra

    Throughpu

    tPerformance

    Time(# of Processors)

    we are

    here

    o

    Enabled by: Moores Law

    Desire for Throughput 20 years of SMP arch

    Constrained by:Power

    Parallel SW availabilityScalability

    HeterogeneousSystems Era

    Targeted

    Application

    Performance

    Time(Data-parallel exploitation)

    we are

    here

    o

    Enabled by: Moores Law

    Abundant data parallelism Power efficient GPUs

    Temporarilyconstrained by:Programming models

    Communication overheads

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    12/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201012

    GPU SP ALU Performance

    HD4870

    HD5870

    CPU

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    13/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201013

    GPU DP ALU Performance

    HD4870

    HD5870

    CPU

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    14/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201014

    GPU BW Performance expectations over time

    250

    0

    100

    200

    50

    150

    300

    HD5870

    HD4870

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    15/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201015

    GPU Computing Efficiency Trend

    7.50

    4.56

    4.50

    2.24

    2.21

    0.92

    2.01

    1.06

    1.07

    0.42

    GFLOPS/W

    GFLOPS/mm2

    14.47GFLOPS/W

    7.90GFLOPS/mm2

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    16/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201016

    Fusion APUs: Putting it all together

    System-levelProgrammable

    Multi-CoreEra

    HeterogeneousSystems Era

    Single-ThreadEra

    FusionAPU

    HeterogeneousComputing

    Throughput Performance

    ProgrammerAccessibility

    GraphicsDriver-basedprograms

    OCL/DCDriver-basedprograms

    Power-efficient

    Data Parallel

    Execution

    High Performance

    Task Parallel Execution

    Microprocessor Advancement

    GPU

    Advancement

    Unaccep

    table

    ExpertsO

    nly

    Mainstre

    am

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    17/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201017

    Why AMD Fusion APUs?A balanced approach is optimal

    The GPU is theGame Changer

    Enormous parallelcomputing capacity

    Outstandingperformance perwatt per dollar

    Very efficient

    hardware threading

    SIMD architecturewell matched tomedia workloads:video, audio, graphics

    Positioned to enablethe emergence ofimmersive mediabased experiences

    X86 CPU ownsthe SW Universe

    Windows, MacOSand Linux Franchises

    Many thousandsof applications

    Well matched tobranchy scalar code

    Establishedprogramming andmemory model

    Mature tool chain

    Backward compatiblefor 15 years of

    applications and OSs Highly Programmable Power Efficient Massive Throughput Best of both worlds

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    18/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201018

    PC with Discrete GPU

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    19/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201019

    Fusion APU Based PC

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    20/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201020

    The Benefits of Fusion

    Unparalleled processing capabilities in mobile form

    factors Shared memory for the CPU and GPU

    Eliminates copies, increasing performance

    Reduces dispatch overhead

    Lower latency from the GPU to memory

    Power efficient design

    Enables architectural innovations between CPU, GPU and

    the Memory System Scalable architecture that can target a broad range ofplatforms from mobile to data center

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    21/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201021

    These machines are being built but

    Heterogeneous systems are being built and there is no

    question that we will build more of them

    There are new emerging workloads that contain enoughparallelism to use them, but

    This not enough!

    The question then becomes:

    How do we program 10, 100, or even >1000 cores?

    The future of performance is entirely about software!

  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    22/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201022

    AMD Fusion Developer Summit

    Find out more about

    Fusion APUs;

    Programming models for Fusion; and

    Much more

    June 13-16, 2011Seattle, Washington, USA

    http://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspx

    http://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    23/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201023

    OpenCL Programming Webinar Series

    Designed to help advance your experience in parallel

    programming, with a focus on OpenCL

    Much of what will be taught is useful for parallelprograming in general

    Beginners Tacks Advanced Tracks

    Introduction to OpenCL Device Fission Extension forOenCL

    OpenCL Programming in Detail Optimization Techniques I

    Using OpenCL C Language Optimization Techniques II

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx

    http://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    24/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201024

    Software in a heterogeneous world

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    25/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201025

    What lies ahead?

    Guy Steele (2009)

    The Future Is Parallel: Whats a Programmer to Do?

    Million dollar question with many (many) answers!

    Taskparallelism

    OpenMP

    MPI

    OpenCL

    JavaThreads

    TaskParallelLibrary

    Cuda

    ConcurrentML

    ThreadBuildingBlocks

    Cilk

    POSIX

    Win32Threads

    Kite

    Brook+

    AcceleratorX10

    FortressData

    parallelism

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    26/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201026

    Different types of parallelism

    Braided parallelism

    Task-decomposition

    Data-decomposition

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    27/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201027

    Task-decomposition

    Divides the problem by type of task to be done

    For example, in modern games:

    Computations are organized as tasks/jobs

    Some maybe fine-grained (short-running)

    Others long-running and data-parallel

    Tasking runtime must account for:

    Task dependencies

    Synchronization

    Load balancing

    Etc

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    28/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201028

    44

    Load balancing - work Stealing

    Internally, most tasking runtimes use

    Work stealing implementation

    Work stealing has provably

    Good locality

    Work distribution properties

    1 2 3

    Seminal reference:Cilk: an efficient multithreadedruntime system

    Blumofe et alSIGPLAN Notices1995

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    29/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201029

    Popular task runtimes (CPU only)

    Unmanaged C/C++

    Intels Thread Building Blocks

    Apples Grand Central Dispatch

    OpenMP Parallelism should not be tacked on!

    Managed languages

    Microsofts Task Parallel library for .NET4

    Different OS, different options!

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    30/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201030

    Data-decomposition

    Divides the problem into elements to be processed

    assigning a subset of elements to a parallel worker

    For example, in modern games:

    Particle systems

    1000 maybe 100,000, even millions

    forces and actions computed independently (localitycan be used to describe interaction)

    Data-parallel execution must account for:

    Local communication Synchronization

    Etc

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    31/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201031

    Popular data-parallel languages

    Unmanaged C/C++

    Khronos Open Compute Language (OpenCL) (CPU+GPU)

    NVIDIAs Cuda (GPU only)

    OpenMP Parallelism should not be tacked on! (CPU only)

    Managed languages

    Microsofts Accelerator II for .NET4 (CPU + GPU via DX9)

    AMDs Aparapi (A PARallel API) for Java (CPU + GPU viaOpenCL)

    Different OS, different options!

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    32/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201032

    Task and data-parallelism together

    Reference:Aaron Lefohn.Programming Larrabee: Beyond Data Parallelism.Beyond Programmable Shading Course. SIGGRAPH 2008.

    Braided Parallelism

    Job graph from DICEs

    Battlefield Bad Company 2

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    33/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201033

    Fusion APUs: Putting it all together

    System-levelProgrammable

    Multi-CoreEra

    HeterogeneousSystems Era

    Single-ThreadEra

    Fusion

    APU

    HeterogeneousComputing

    Throughput Performance

    ProgrammerA

    ccessibility

    GraphicsDriver-basedprograms

    OCL/DCDriver-basedprograms

    Power-efficient

    Data Parallel

    Execution

    High Performance

    Task Parallel Execution

    Microprocessor Advancement

    GPU

    Advancement

    Unaccep

    table

    ExpertsO

    nly

    Mainstre

    am

    Braided Parallelisma natural programming

    model for heterogeneouscomputing

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    34/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201034

    Conclusion and Questions

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
  • 8/2/2019 Introduction Parallel Heterogeneous Computing Final

    35/35

    | Introduction to Parallel and Heterogeneous Computing| October, 201035

    Trademark Attribution

    AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in theUnited States and/or other jurisdictions. Other names used in this presentation are for identificationpurposes only and may be trademarks of their respective owners.

    2009 Advanced Micro Devices, Inc. All rights reserved.

    http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspxhttp://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx