02 Gpu Architecture Overview s07

download 02 Gpu Architecture Overview s07

of 22

Transcript of 02 Gpu Architecture Overview s07

  • 8/3/2019 02 Gpu Architecture Overview s07

    1/22

    GPU Architecture OverviewGPU Architecture Overview

    John OwensJohn Owens

    UC DavisUC Davis

  • 8/3/2019 02 Gpu Architecture Overview s07

    2/22

    The Right-Hand TurnThe Right-Hand Turn

    [H&P Figure 1.1]

  • 8/3/2019 02 Gpu Architecture Overview s07

    3/22

    3

    Why? [Architecture Reasons]Why? [Architecture Reasons]

    ILP increasingly difficult to extract frominstruction stream

    Control hardware dominates processors

    Complex, difficult to build and verify Takes substantial fraction of die

    Scales poorly

    Pay for max throughput, sustain average throughput

    Quadratic dependency checking Control hardware doesnt do any math!

    Intel Core Duo: 48 GFLOPS, ~10 GB/s

    NVIDIA G80: 330 GFLOPS, 80+ GB/s

  • 8/3/2019 02 Gpu Architecture Overview s07

    4/22

    4

    AMDAMD DeerhoundDeerhound (K8L)(K8L)

    chip-architect.com

  • 8/3/2019 02 Gpu Architecture Overview s07

    5/22

    5

    Why? [Technology Reasons]Why? [Technology Reasons]

    Industry moving from instructions persecond to instructions per watt

    Power wall now all-important

    Traditional proc techniques are not power-efficient

    We can continue to put more transistors ona chip

    but we cant scale their voltage like we used to

    and we cant clock them as fast

  • 8/3/2019 02 Gpu Architecture Overview s07

    6/22

    Go ParallelGo Parallel

    Time of architecturalinnovation GPUs let us explore using

    hundreds of processors now, not

    10 years from now Major CPU vendors

    supporting multicore

    Interest in general-purpose

    programmability on GPUs Universities must teach

    thinking in parallel

  • 8/3/2019 02 Gpu Architecture Overview s07

    7/22

    7

    WhatWhats Different about the GPU?s Different about the GPU?

    The future of the desktop is parallel We just dont know what kind of parallel

    GPUs and multicore are different Multicore: Coarse, heavyweight threads, better

    performance per thread GPUs: Fine, lightweight threads, single-thread

    performance is poor

    A case for the GPU

    Interaction with the world is visual GPUs have a well-established programming model

    Market for GPUs is 500M+ total/year

  • 8/3/2019 02 Gpu Architecture Overview s07

    8/22

    GPU

    The Rendering PipelineThe Rendering Pipeline

    Application

    Rasterization

    Geometry

    Composite

    Compute 3D geometryMake calls to graphics API

    Transform geometry from 3D to

    2D (in parallel)

    Generate fragments from 2D

    geometry (in parallel)

    Combine fragments into image

  • 8/3/2019 02 Gpu Architecture Overview s07

    9/22

    GPU

    TheThe ProgrammableProgrammable PipelinePipeline

    Application

    Rasterization

    Geometry

    Composite

    Compute 3D geometryMake calls to graphics API

    Transform geometry from 3D to

    2D [vertex programs]

    Generate fragments from 2D

    geometry [fragment programs]

    Combine fragments into image

  • 8/3/2019 02 Gpu Architecture Overview s07

    10/22

    DirectX 10 PipelineDirectX 10 Pipeline

    VertexVertex

    BufferBuffer

    InputInput

    AssemblerAssemblerVertexVertex

    ShaderShaderSetupSetup

    RasterizerRasterizerOutputOutput

    MergerMerger

    PixelPixel

    ShaderShaderGeometryGeometry

    ShaderShader

    IndexIndex

    BufferBufferTextureTexture TextureTexture

    RenderRender

    TargetTargetDepthDepth

    StencilStencilTextureTexture

    StreamStream

    BufferBuffer

    Stream outStream out

    MemoryMemory

    memorymemory

    programmableprogrammable

    fixedfixed

    SamplerSampler SamplerSampler SamplerSampler

    ConstantConstant ConstantConstant ConstantConstant

    Courtesy David Blythe, MicrosoftCourtesy David Blythe, Microsoft

  • 8/3/2019 02 Gpu Architecture Overview s07

    11/22

    11

    Characteristics of GraphicsCharacteristics of Graphics

    Large computational requirements Massive parallelism

    Graphics pipeline designed for independent

    operations

    Long latencies tolerable

    Deep, feed-forward pipelines

    Hacks are OKcan tolerate lack of accuracy

    GPUs are good at parallel, arithmetically

    intense, streaming-memory problems

  • 8/3/2019 02 Gpu Architecture Overview s07

    12/22

    Application

    Vertex

    Geometry

    Rasterization

    Fragment

    Display

    Command

    Application/

    Command (CPU)

    Command

    Vertex

    Geometry

    Raster-

    ization

    Fragment

    Display

    GPU

    Mem

    Mem

    Graphics HardwareGraphics HardwareTask ParallelTask Parallel

  • 8/3/2019 02 Gpu Architecture Overview s07

    13/22

    13

    Rage 128Rage 128

  • 8/3/2019 02 Gpu Architecture Overview s07

    14/22

    Triangle Setup

    L2 Tex

    Shader Instruction Dispatch

    Fragment Crossbar

    Memory

    Partition

    Memory

    Partition

    Memory

    Partition

    Memory

    Partition

    Z-Cull

    NVIDIA GeForce 6800 3D PipelineNVIDIA GeForce 6800 3D Pipeline

    Courtesy Nick Triantos, NVIDIA

    Vertex

    Fragment

    Composite

  • 8/3/2019 02 Gpu Architecture Overview s07

    15/22

    Programmable PipelineProgrammable Pipeline

    Application

    Command

    Per-Surface

    Tessellation

    Per-Vertex

    Primitive Assembly

    Per-Primitive

    Rasterization

    Per-Fragment

    Image Composition?

    Per-Pixel

    Display

    Per-Texel Texture

    Memory

    Pixel Ops

    Object Space

    Image Space

    Texture Spaces

    FB

    [From Akeley and Hanrahan, Real-Time Graphics Architectures]

  • 8/3/2019 02 Gpu Architecture Overview s07

    16/22

    16

    Transform A

    to B

    Process A toA

    Generalizing the PipelineGeneralizing the Pipeline

    Transform A to B Ex: Rasterization (triangles

    to fragments)

    Historically fixed function

    Process A to A

    Ex: Fragment program

    Recently programmable,

    and becoming more so

  • 8/3/2019 02 Gpu Architecture Overview s07

    17/22

    17

    GeForce 8800 GPUGeForce 8800 GPU

    Global Memory

    Thread Execution Manager

    Input Assembler

    Host

    Parallel DataCache

    Parallel DataCache

    Parallel DataCache

    Parallel DataCache

    Parallel DataCache

    Parallel DataCache

    Parallel DataCache

    Parallel DataCache

    Load/store

    Thread Processors Thread ProcessorsThread ProcessorsThread ProcessorsThread ProcessorsThread ProcessorsThread ProcessorsThread Processors

    [courtesy of Ian Buck, NVIDIA]

    Built aroundprogrammable units

    Unified shader

  • 8/3/2019 02 Gpu Architecture Overview s07

    18/22

    Application

    Vertex

    Geometry

    Rasterization

    Fragment

    Display

    Command

    Application/

    Command (CPU)

    Command

    Rasterization

    Display

    GPU

    Mem

    Mem

    Programmable

    UnifiedUnified ShadersShaders

  • 8/3/2019 02 Gpu Architecture Overview s07

    19/22

    19

    http://www.neoptica.com/NeopticaWhitepaper.pdf

    http://www.graphicshardware.org/previous/www_2006/presentations/pharr-keynote-gh06.pdf

    Towards Programmable GraphicsTowards Programmable Graphics

    Fixed function Configurable, but not programmable

    Programmable shading

    Shader-centric

    Programmable shaders, but fixed pipeline

    Programmable graphics

    Customize the pipeline

    Neoptica asserts the major obstacle is programmingmodels and tools

  • 8/3/2019 02 Gpu Architecture Overview s07

    20/22

    YesterdayYesterdays Vendor Supports Vendor Support

    High-Level Graphics Language

    OpenGL

    D3D

    Low-Level Device Driver

  • 8/3/2019 02 Gpu Architecture Overview s07

    21/22

    TodayTodays New Vendor Supports New Vendor Support

    High-Level Graphics Language

    OpenGL

    D3D

    Compute

    Low-Level Device Driver

    High-Level

    Compute Lang.

    Low-LevelAPI

    CUDA

    CTM HAL

    CTM CAL

  • 8/3/2019 02 Gpu Architecture Overview s07

    22/22

    22

    Architecture SummaryArchitecture Summary

    GPU is a massively parallel architecture Many problems map well to GPU-style computing

    GPUs have large amount of arithmetic capability

    Increasing amount of programmability in the pipeline

    New features map well to GPGPU Unified shaders

    Direct access to compute units in new APIs

    Challenge: How do we make the best use of GPU hardware?

    Techniques, programming models, languages,evaluation tools