Qualcomm Hexagon Architecture
-
Upload
rahul-sharma -
Category
Documents
-
view
220 -
download
1
Transcript of Qualcomm Hexagon Architecture
-
8/20/2019 Qualcomm Hexagon Architecture
1/23
1Qualcomm Technologies, Inc. All Rights Reserved
Qualcomm Technologies, Inc. All Rights Reserved
Qualcomm Hexagon
DSP: An architectureoptimized for mobilemultimedia andcommunications
Lucian Codrescu
Sr. Director, TechnologyQualcomm Technologies, Inc.
-
8/20/2019 Qualcomm Hexagon Architecture
2/23
2Qualcomm Technologies, Inc. All Rights Reserved
Camera
Display
JPEG
Video
Other
• aDSP: Real-timemedia & sensor
processing
Hexagon™
DSP processors in Snapdragon products
Multimedia Fabric System Fabric
KraitCPU
AdrenoGPU
Krait
CPU
KraitCPU
Krait
CPU
2MB L2
Misc.Connectivity
Modem
Snapdragon800
Fabric & Memory Controller
LPDDR3 LPDDR3
HexagonaDSP
HexagonmDSP
• mDSP: Dedicatedmodem processing
Audio
Sensors
-
8/20/2019 Qualcomm Hexagon Architecture
3/23
3Qualcomm Technologies, Inc. All Rights Reserved
Expansion of Hexagon DSP use cases beyond audio
Image EnhancementCamera, Still, VideoHexagonV4 based products
VideoHexagonV5 based products
SensorsHexagonV5 based products
Computer Vision & Augmented RealityHexagonV4 based products
HexagonV2/V3
Voice Audio
Hexagon DSP is evolving for use beyond voice and audio to
computer vision, video and imaging features
-
8/20/2019 Qualcomm Hexagon Architecture
4/23
4Qualcomm Technologies, Inc. All Rights Reserved
The Hexagon DSP evolutionGenerational improvements in performance and power efficiency driven byboth architecture and implementation
V4L28nm
Apr 2011
V3M45nm
June 2009
V265nm
Dec 2007
V3C45nm Aug
2009
V3L45nm Nov
2009
V4M
28nmDec 2010
V4C28nm
Dec 2010
V5A28nm
Dec 2012
V165nm
Oct 2006
Time
V5H28nm
Dec 2012
-
8/20/2019 Qualcomm Hexagon Architecture
5/23
5 Qualcomm Technologies, Inc. All Rights Reserved
Requirements
• Require fixed real-timeperformance level(fps, Mbit/sec, etc.)
• Extremely aggressivepower & area targets
Key characteristics ofmodem & multimedia applications
Characteristics
• Mix of signal processing& control code
− For modem, Qualcomm does not
use a split CPU/DSP architecture. All processing is done on HexagonDSP
− Multimedia apps have significantcontrol in the RTOS & frameworks
• Heavy L2$ misses
− Multimedia is data intensive
− Modem is code intensive
-
8/20/2019 Qualcomm Hexagon Architecture
6/23
6Qualcomm Technologies, Inc. All Rights Reserved
Hexagon DSP blends features targeted to modem &multimedia
Hexagon
DSP
VLIW• Need multi-issue to
meet performance• Low complexity for
Area & Power
Innovate in ISA to
maximize IPC• More work/VLIW packet
reduces energy/instruction
• Keep the pipelines full forMIPS/mm2
• Target both SignalProcessing & Control code
Multi-Threading• To reduce L2$ miss
penalty without the needfor a large L2
• Increasesinstructions/VLIW packetbecause compiler doesn’tneed to schedule latency
-
8/20/2019 Qualcomm Hexagon Architecture
7/23
7Qualcomm Technologies, Inc. All Rights Reserved
Instruction Unit
VLIW: Area & power efficient multi-issue
Data Unit(Load/Store/ ALU)
Data Unit(Load/Store/ ALU)
ExecutionUnit
(64-bitVector)
ExecutionUnit
(64-bitVector)
Data Cache
L2Cache
/ TCM
InstructionCache
• Dual 64-bitload/storeunits
• Also 32-bit ALU
Variable sizedinstruction packets(1 to 4 instructions
per Packet)
• Dual 64-bit execution units• Standard 8/16/32/64bit data
types• SIMD vectorized MPY / ALU
/ SHIFT, Permute, BitOps• Up to 8 16b MAC/cycle• 2 SP FMA/cycle
Register FileRegister File
Register File/Thread
• Unified 32x32bit
General RegisterFile is best forcompiler.
• No separate Addressor Accum Regs
• Per-Thread
DeviceDDR
Memory
-
8/20/2019 Qualcomm Hexagon Architecture
8/23
8Qualcomm Technologies, Inc. All Rights Reserved
Maximizing the signal processing code work/packetExample from inner loop of FFT: Executing 29 “simple RISC ops” in 1 cycle
{ R17:16 = MEMD(R0++M1)MEMD(R6++M1) = R25:24R20 = CMPY(R20, R8):
-
8/20/2019 Qualcomm Hexagon Architecture
9/23
9Qualcomm Technologies, Inc. All Rights Reserved
Maximizing the control code work/packetHexagon DSP ISA improves control code efficiencyover traditional VLIW
Example C code
void example(int *ptr, int val) {
if (ptr!=0) {
*ptr = *ptr + val + 2;
}}
p0 = cmp.eq(r0,#0)
{
if (!p0) r2=memw(r0)
if (p0) jumpr:nt r31
}
r2 = add(r2,#2)
r1 = add(r1,r2)
{
memw(r0) = r1
jumpr r31
}
Instr /Packet =7 instr/5 packets = 1.4
Tradional VLIW
Assembly Code
①②
③④
⑤
{
p0 = cmp.eq (r0,#0)
if (!p0.new) r2=memw(r0)
if (p0.new) jumpr:nt r31
}
r2 = add(r2,#2)
r1 = add(r1,r2)
{
memw(r0) = r1
jumpr r31
}
Hexagon DSP:
Dot-New Predication
①
②③
④
{
p0 = cmp.eq(r0,#0)
if (!p0.new) r2=memw(r0)
if (p0.new) jumpr:nt r31
}
r1 = add(r1,add(r2,#2))
{
memw(r0) = r1
jumpr r31
}
Hexagon DSP:
Compound ALU
①
②
③
{
p0 = cmp.eq(r0,#0)
if (!p0.new) r2=memw(r0)
if (p0.new) jumpr:nt r31
}
{
r1 = add(r1,add(r2,#2))
memw(r0) = r1.new
jumpr r31
}
Hexagon DSP:
New-Value Store
①
②
Instr/Packet =
7 instr/2packets = 3.5
-
8/20/2019 Qualcomm Hexagon Architecture
10/23
10Qualcomm Technologies, Inc. All Rights Reserved
High avg. instructions/packet for targeted use casesCompound instructions count as 2
AudioComputer
Vision Video Imaging Control
5
1
1 5
2
2 5
3
3 5
4
4 5
5
A
Source: Qualcomm internal measurements
-
8/20/2019 Qualcomm Hexagon Architecture
11/23
11Qualcomm Technologies, Inc. All Rights Reserved
• Hexagon V5 includes three hardware threads
• Architected to look like a multi-core with communicationthrough shared memory
Programmer’s view of Hexagon DSP HWmulti-threading
Thread 0
DU
XU
Shared Data Cache
L2
Cache /
TCM
Register File
DU
XU
Thread 1
DU
XU
Register File
DU
XU
Thread 2
DU
XU
Register File
DU
XU
Shared Instruction Cache
-
8/20/2019 Qualcomm Hexagon Architecture
12/23
12Qualcomm Technologies, Inc. All Rights Reserved
• Number of threads match execution pipe depth(three threads three execute stages)
• All instructions complete before next packet dispatch• Compiler schedules for zero-latency which helps to increase
instructions/VLIW packet
Hexagon DSP V1-V4: Interleaved multi-threadingSimple round-robin thread scheduling
Thread 0 Dispatch
T0: { Ld Ld Add Cmp }
Thread 1 Dispatch
T1: { St Ld Mpy Add }
T0: { Ld Ld Add Cmp }
Thread 2 Dispatch
T2: { Ld Add Jump }
T1: { St Ld Mpy Add }
T0: { Ld Ld Add Cmp }
-
8/20/2019 Qualcomm Hexagon Architecture
13/23
13Qualcomm Technologies, Inc. All Rights Reserved
• Remove a thread from IMT rotation
− On L2 cache misses
−When in wait-for-interrupt or offmode
• Additional forwarding to support2-cycle packets
• VLIW packets with dependenciesbetween long latency instructionswill stall
− But many VLIW packets with
simple instructions cancomplete in 2 processor clocks
Hexagon DSP V5: Dynamic HW multi-threadingRecover some performance when threads idle or stalled
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
DhrystoneDMIPS/MHz
IMT DMT
0
1
2
3
4
5
6
7
8
Coremarks/MHz
IMT DMT
Source: Qualcomm internal measurements
-
8/20/2019 Qualcomm Hexagon Architecture
14/23
14Qualcomm Technologies, Inc. All Rights Reserved
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
A v e r a g e I n s
t r u c t i o n s / C y c l e
IPC_DMT
IPC_IMT
Hexagon DSP instructions per cycle
Single-Threaded Apps
Multi-Threaded Apps
Source: Qualcomm internal measurements
-
8/20/2019 Qualcomm Hexagon Architecture
15/23
15 Qualcomm Technologies, Inc. All Rights Reserved
Qualcomm Hexagon DSP architectureHighly efficient mobile application processor—designed for moreperformance per MHz
Clock Rate (MHz) 430-520 100-233 300-700
DSP Performance (BDTImark2000) 4730-5720 1810-4220 5440-12660*
* - Projected best case score for 3-threads
Source: BDTI - For more detailed information see www.BDTI.com. All scores ©2013 BDTI
0
2
4
6
8
10
12
14
1618
20
Mobile Competitor Qualcomm HXGN V4 (1 thread) Qualcomm HXGN V4 (3 threads)
D S P P e r f o r m
a n c e p e r M H z
B D T I m a r k
2 0 0 0 ™ / M H z
-
8/20/2019 Qualcomm Hexagon Architecture
16/23
16Qualcomm Technologies, Inc. All Rights Reserved
Hexagon DSP Power Benefits
-
8/20/2019 Qualcomm Hexagon Architecture
17/23
17Qualcomm Technologies, Inc. All Rights Reserved
MP3 playback power for competitive smartphones
Compet itor A Qualcomm /Hexagon-based
Competi tor B Competi tor C Competit or D Competi tor E Competi to r F Competi to r G
• Power measured at the battery for various phones
• Includes everything: DSP, CPU, memory, analog components, etc
Power
L o w e r i s
b e t t e r
Source: Qualcomm internal measurements
-
8/20/2019 Qualcomm Hexagon Architecture
18/23
18Qualcomm Technologies, Inc. All Rights Reserved
32% Less Power*7% Less Time52% Less CPU
Augmented Reality Java App finding objects inimage using FastCV Feature Detect
Comparison of Feature Detect run on:• App CPU (ARM/Neon)• App DSP (Hexagon)
Computer vision offload – ARM/neon to Hexagon DSP
Detection Time (%) Total Device Power (%)CPU Utilization (%)
Source: Qualcomm internal measurements. * Power measured at the device battery
-
8/20/2019 Qualcomm Hexagon Architecture
19/23
19Qualcomm Technologies, Inc. All Rights Reserved
• Excellent near-linear power scalability(as threads go idle, power used by the thread is nearly eliminated)
• Achieved through optimized clock tree design & clock gating
Hexagon DSP power for different thread utilizations
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Dhrystone Power,IMT Mode
Actual
Ideal
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FIR Power,IMT Mode
Actual
Ideal
Source: Qualcomm internal measurements
-
8/20/2019 Qualcomm Hexagon Architecture
20/23
20Qualcomm Technologies, Inc. All Rights Reserved
Hexagon DSP Software Development
-
8/20/2019 Qualcomm Hexagon Architecture
21/23
21Qualcomm Technologies, Inc. All Rights Reserved
Independent Algorithm Developers on Hexagon DSP
-
8/20/2019 Qualcomm Hexagon Architecture
22/23
22Qualcomm Technologies, Inc. All Rights Reserved
Announcing the Hexagon DSP SDKSee the Hexagon DSP SDK in action at Uplinq2013 (www.uplinq.com)
Visit http://developer.qualcomm.com for more information.
-
8/20/2019 Qualcomm Hexagon Architecture
23/23
23Qualcomm Technologies, Inc. All Rights Reserved
For more information on Qualcomm, visit us at:www.qualcomm.com & www.qualcomm.com/blog
©2013 Qualcomm Technologies, Inc.Qualcomm and Hexagon are trademarks of QUALCOMM Incorporated, registered in the United Statesand other countries. All QUALCOMM Incorporated trademarks are used with permission. Otherproduct and brand names may be trademarks or registered trademarks of their respective owners.Hexagon is a product of Qualcomm Technologies, Inc.
Thank youFollow us on: