Qualcomm Hexagon Architecture

download Qualcomm Hexagon Architecture

of 10

Transcript of Qualcomm Hexagon Architecture

  • 8/20/2019 Qualcomm Hexagon Architecture

    1/23

    1Qualcomm Technologies, Inc. All Rights Reserved

    Qualcomm Technologies, Inc. All Rights Reserved

    Qualcomm Hexagon

    DSP: An architectureoptimized for mobilemultimedia andcommunications

    Lucian Codrescu

    Sr. Director, TechnologyQualcomm Technologies, Inc.

  • 8/20/2019 Qualcomm Hexagon Architecture

    2/23

    2Qualcomm Technologies, Inc. All Rights Reserved

    Camera

    Display

    JPEG

    Video

    Other 

    • aDSP: Real-timemedia & sensor

    processing

    Hexagon™

    DSP processors in Snapdragon products

    Multimedia Fabric System Fabric

    KraitCPU

     AdrenoGPU

    Krait

    CPU

    KraitCPU

    Krait

    CPU

    2MB L2

    Misc.Connectivity

    Modem

    Snapdragon800

    Fabric & Memory Controller 

    LPDDR3 LPDDR3

    HexagonaDSP

    HexagonmDSP

    • mDSP: Dedicatedmodem processing

     Audio

    Sensors

  • 8/20/2019 Qualcomm Hexagon Architecture

    3/23

    3Qualcomm Technologies, Inc. All Rights Reserved

    Expansion of Hexagon DSP use cases beyond audio

    Image EnhancementCamera, Still, VideoHexagonV4 based products

    VideoHexagonV5 based products

    SensorsHexagonV5 based products

    Computer Vision & Augmented RealityHexagonV4 based products

    HexagonV2/V3

    Voice Audio

    Hexagon DSP is evolving for use beyond voice and audio to

    computer vision, video and imaging features

  • 8/20/2019 Qualcomm Hexagon Architecture

    4/23

    4Qualcomm Technologies, Inc. All Rights Reserved

    The Hexagon DSP evolutionGenerational improvements in performance and power efficiency driven byboth architecture and implementation

    V4L28nm

     Apr 2011

    V3M45nm

    June 2009

    V265nm

    Dec 2007

    V3C45nm Aug

    2009

    V3L45nm Nov

    2009

    V4M

    28nmDec 2010

    V4C28nm

    Dec 2010

    V5A28nm

    Dec 2012

    V165nm

    Oct 2006

    Time

    V5H28nm

    Dec 2012

  • 8/20/2019 Qualcomm Hexagon Architecture

    5/23

    5 Qualcomm Technologies, Inc. All Rights Reserved

    Requirements

    • Require fixed real-timeperformance level(fps, Mbit/sec, etc.)

    • Extremely aggressivepower & area targets

    Key characteristics ofmodem & multimedia applications

    Characteristics

    • Mix of signal processing& control code

    − For modem, Qualcomm does not

    use a split CPU/DSP architecture. All processing is done on HexagonDSP

    − Multimedia apps have significantcontrol in the RTOS & frameworks

    • Heavy L2$ misses

    − Multimedia is data intensive

    − Modem is code intensive

  • 8/20/2019 Qualcomm Hexagon Architecture

    6/23

    6Qualcomm Technologies, Inc. All Rights Reserved

    Hexagon DSP blends features targeted to modem &multimedia

    Hexagon

    DSP

    VLIW• Need multi-issue to

    meet performance• Low complexity for

     Area & Power 

    Innovate in ISA to

    maximize IPC• More work/VLIW packet

    reduces energy/instruction

    • Keep the pipelines full forMIPS/mm2

    • Target both SignalProcessing & Control code

    Multi-Threading• To reduce L2$ miss

    penalty without the needfor a large L2

    • Increasesinstructions/VLIW packetbecause compiler doesn’tneed to schedule latency

  • 8/20/2019 Qualcomm Hexagon Architecture

    7/23

    7Qualcomm Technologies, Inc. All Rights Reserved

    Instruction Unit

    VLIW: Area & power efficient multi-issue

    Data Unit(Load/Store/ ALU)

    Data Unit(Load/Store/ ALU)

    ExecutionUnit

    (64-bitVector)

    ExecutionUnit

    (64-bitVector)

    Data Cache

    L2Cache

    / TCM

    InstructionCache

    • Dual 64-bitload/storeunits

    •  Also 32-bit ALU

    Variable sizedinstruction packets(1 to 4 instructions

    per Packet)

    • Dual 64-bit execution units• Standard 8/16/32/64bit data

    types• SIMD vectorized MPY / ALU

    / SHIFT, Permute, BitOps• Up to 8 16b MAC/cycle• 2 SP FMA/cycle

    Register FileRegister File

    Register File/Thread

    • Unified 32x32bit

    General RegisterFile is best forcompiler.

    • No separate Addressor Accum Regs

    • Per-Thread

    DeviceDDR

    Memory

  • 8/20/2019 Qualcomm Hexagon Architecture

    8/23

    8Qualcomm Technologies, Inc. All Rights Reserved

    Maximizing the signal processing code work/packetExample from inner loop of FFT: Executing 29 “simple RISC ops” in 1 cycle

    { R17:16 = MEMD(R0++M1)MEMD(R6++M1) = R25:24R20 = CMPY(R20, R8):

  • 8/20/2019 Qualcomm Hexagon Architecture

    9/23

    9Qualcomm Technologies, Inc. All Rights Reserved

    Maximizing the control code work/packetHexagon DSP ISA improves control code efficiencyover traditional VLIW

    Example C code

    void example(int *ptr, int val) {

    if (ptr!=0) {

    *ptr = *ptr + val + 2;

    }}

    p0 = cmp.eq(r0,#0)

    {

    if (!p0) r2=memw(r0)

    if (p0) jumpr:nt r31

    }

    r2 = add(r2,#2)

    r1 = add(r1,r2)

    {

    memw(r0) = r1

     jumpr r31

    }

    Instr /Packet =7 instr/5 packets = 1.4

    Tradional VLIW

     Assembly Code

    ①②

    ③④

    {

    p0 = cmp.eq (r0,#0)

    if (!p0.new) r2=memw(r0)

    if (p0.new) jumpr:nt r31

    }

    r2 = add(r2,#2)

    r1 = add(r1,r2)

    {

    memw(r0) = r1

     jumpr r31

    }

    Hexagon DSP:

    Dot-New Predication

    ②③

    {

    p0 = cmp.eq(r0,#0)

    if (!p0.new) r2=memw(r0)

    if (p0.new) jumpr:nt r31

    }

    r1 = add(r1,add(r2,#2))

    {

    memw(r0) = r1

     jumpr r31

    }

    Hexagon DSP:

    Compound ALU

    {

    p0 = cmp.eq(r0,#0)

    if (!p0.new) r2=memw(r0)

    if (p0.new) jumpr:nt r31

    }

    {

    r1 = add(r1,add(r2,#2))

    memw(r0) = r1.new

     jumpr r31

    }

    Hexagon DSP:

    New-Value Store

    Instr/Packet =

    7 instr/2packets = 3.5

  • 8/20/2019 Qualcomm Hexagon Architecture

    10/23

    10Qualcomm Technologies, Inc. All Rights Reserved

    High avg. instructions/packet for targeted use casesCompound instructions count as 2

     AudioComputer

    Vision Video Imaging Control

     

    5

    1

    1 5

    2

    2 5

    3

    3 5

    4

    4 5

    5

    A

     

    Source: Qualcomm internal measurements

  • 8/20/2019 Qualcomm Hexagon Architecture

    11/23

    11Qualcomm Technologies, Inc. All Rights Reserved

    • Hexagon V5 includes three hardware threads

    •  Architected to look like a multi-core with communicationthrough shared memory

    Programmer’s view of Hexagon DSP HWmulti-threading

    Thread 0

    DU

    XU

    Shared Data Cache

    L2

    Cache /

    TCM

    Register File

    DU

    XU

    Thread 1

    DU

    XU

    Register File

    DU

    XU

    Thread 2

    DU

    XU

    Register File

    DU

    XU

    Shared Instruction Cache

  • 8/20/2019 Qualcomm Hexagon Architecture

    12/23

    12Qualcomm Technologies, Inc. All Rights Reserved

    • Number of threads match execution pipe depth(three threads three execute stages)

    •  All instructions complete before next packet dispatch• Compiler schedules for zero-latency which helps to increase

    instructions/VLIW packet

    Hexagon DSP V1-V4: Interleaved multi-threadingSimple round-robin thread scheduling

    Thread 0 Dispatch

    T0: { Ld Ld Add Cmp }

    Thread 1 Dispatch

    T1: { St Ld Mpy Add }

    T0: { Ld Ld Add Cmp }

    Thread 2 Dispatch

    T2: { Ld Add Jump }

    T1: { St Ld Mpy Add }

    T0: { Ld Ld Add Cmp }

  • 8/20/2019 Qualcomm Hexagon Architecture

    13/23

    13Qualcomm Technologies, Inc. All Rights Reserved

    • Remove a thread from IMT rotation

    − On L2 cache misses

    −When in wait-for-interrupt or offmode

    •  Additional forwarding to support2-cycle packets

    • VLIW packets with dependenciesbetween long latency instructionswill stall

    − But many VLIW packets with

    simple instructions cancomplete in 2 processor clocks

    Hexagon DSP V5: Dynamic HW multi-threadingRecover some performance when threads idle or stalled

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    DhrystoneDMIPS/MHz

    IMT DMT

    0

    1

    2

    3

    4

    5

    6

    7

    8

    Coremarks/MHz

    IMT DMT

    Source: Qualcomm internal measurements

  • 8/20/2019 Qualcomm Hexagon Architecture

    14/23

    14Qualcomm Technologies, Inc. All Rights Reserved

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

       A  v  e  r  a  g  e   I  n  s

       t  r  u  c   t   i  o  n  s   /   C  y  c   l  e

    IPC_DMT

    IPC_IMT

    Hexagon DSP instructions per cycle

    Single-Threaded Apps

    Multi-Threaded Apps

    Source: Qualcomm internal measurements

  • 8/20/2019 Qualcomm Hexagon Architecture

    15/23

    15 Qualcomm Technologies, Inc. All Rights Reserved

    Qualcomm Hexagon DSP architectureHighly efficient mobile application processor—designed for moreperformance per MHz

    Clock Rate (MHz) 430-520 100-233 300-700

    DSP Performance (BDTImark2000) 4730-5720 1810-4220 5440-12660*

    * - Projected best case score for 3-threads

    Source: BDTI - For more detailed information see www.BDTI.com. All scores ©2013 BDTI

    0

    2

    4

    6

    8

    10

    12

    14

    1618

    20

    Mobile Competitor Qualcomm HXGN V4 (1 thread) Qualcomm HXGN V4 (3 threads)

       D   S   P   P  e  r   f  o  r  m

      a  n  c  e  p  e  r   M   H  z

       B   D   T   I  m  a  r   k

       2   0   0   0   ™   /   M   H  z

  • 8/20/2019 Qualcomm Hexagon Architecture

    16/23

    16Qualcomm Technologies, Inc. All Rights Reserved

    Hexagon DSP Power Benefits

  • 8/20/2019 Qualcomm Hexagon Architecture

    17/23

    17Qualcomm Technologies, Inc. All Rights Reserved

    MP3 playback power for competitive smartphones

    Compet itor A Qualcomm /Hexagon-based

    Competi tor B Competi tor C Competit or D Competi tor E Competi to r F Competi to r G

    • Power measured at the battery for various phones

    • Includes everything: DSP, CPU, memory, analog components, etc

    Power 

       L  o  w  e  r   i  s

       b  e   t   t  e  r

    Source: Qualcomm internal measurements

  • 8/20/2019 Qualcomm Hexagon Architecture

    18/23

    18Qualcomm Technologies, Inc. All Rights Reserved

    32% Less Power*7% Less Time52% Less CPU

     Augmented Reality Java App finding objects inimage using FastCV Feature Detect

    Comparison of Feature Detect run on:•  App CPU (ARM/Neon)•  App DSP (Hexagon)

    Computer vision offload – ARM/neon to Hexagon DSP

    Detection Time (%) Total Device Power (%)CPU Utilization (%)

    Source: Qualcomm internal measurements. * Power measured at the device battery

  • 8/20/2019 Qualcomm Hexagon Architecture

    19/23

    19Qualcomm Technologies, Inc. All Rights Reserved

    • Excellent near-linear power scalability(as threads go idle, power used by the thread is nearly eliminated)

    •  Achieved through optimized clock tree design & clock gating

    Hexagon DSP power for different thread utilizations

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    Dhrystone Power,IMT Mode

     Actual

    Ideal

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    FIR Power,IMT Mode

     Actual

    Ideal

    Source: Qualcomm internal measurements

  • 8/20/2019 Qualcomm Hexagon Architecture

    20/23

    20Qualcomm Technologies, Inc. All Rights Reserved

    Hexagon DSP Software Development

  • 8/20/2019 Qualcomm Hexagon Architecture

    21/23

    21Qualcomm Technologies, Inc. All Rights Reserved

    Independent Algorithm Developers on Hexagon DSP

  • 8/20/2019 Qualcomm Hexagon Architecture

    22/23

    22Qualcomm Technologies, Inc. All Rights Reserved

     Announcing the Hexagon DSP SDKSee the Hexagon DSP SDK in action at Uplinq2013 (www.uplinq.com)

    Visit http://developer.qualcomm.com for more information.

  • 8/20/2019 Qualcomm Hexagon Architecture

    23/23

    23Qualcomm Technologies, Inc. All Rights Reserved

    For more information on Qualcomm, visit us at:www.qualcomm.com & www.qualcomm.com/blog

    ©2013 Qualcomm Technologies, Inc.Qualcomm and Hexagon are trademarks of QUALCOMM Incorporated, registered in the United Statesand other countries. All QUALCOMM Incorporated trademarks are used with permission. Otherproduct and brand names may be trademarks or registered trademarks of their respective owners.Hexagon is a product of Qualcomm Technologies, Inc.

    Thank youFollow us on: