ARM big.LITTLE™ Technology Unleashed€¦ · Gaming Audio Playback . 4 Mobile Application...
Transcript of ARM big.LITTLE™ Technology Unleashed€¦ · Gaming Audio Playback . 4 Mobile Application...
1
ARM® big.LITTLE™ Technology Unleashed An Improved User Experience Delivered
Govind Wathan Product Specialist
Cortex®-A Mobile & Consumer CPU Products
2
Introduction to big.LITTLE Technology
Benefits of big.LITTLE Technology
Future big.LITTLE systems
Summary
Questions
Agenda
3
Mobile users spend a high amount of time on a
range of mobile applications*:
38% on web browsing and Facebook
32% on gaming
16% on audio, video and utility
Common “building blocks” in workloads:
Short bursts of high intensity
Long periods of sustained high intensity
Low intensity
Mobile Application Workloads
Measured on a Quad Cortex-A7 Symmetric Multiprocessing platform
* Source: Flurry Analytics Time
Time
Time
Pow
er
Pow
er
Pow
er
Web Browsing
Gaming
Audio Playback
4
Mobile Application Workloads
Sustained Performance Envelope
Category 2
Sustained Performance
at Thermal Limit
Category 3
Long-use Low-Intensity
Workloads
Category 1
Burst of High Intensity
Workloads
Example: Web Browsing
Example: Castlemaster
Example: Audio Playback
Power
Applications require a mix of performance levels
Mobile users want a better user experience but not at a cost of reduced battery life
5
Mobile Application Workload Profiles Perc
enta
ge o
f Tim
e S
pent
in
DV
FS
Stat
es
Category 2
Sustained Performance
at Thermal Limit
Category 3
Long-use Low-Intensity
Workloads
Category 1
Burst High Intensity Workloads
High
Mid
Low
WFI / PowerDown
Measured on a Quad Cortex-A7 Symmetric Multiprocessing platform
Applications require a mix of performance levels
Mobile users want a better user experience but not at a cost of reduced battery life
Idle
Example: Web Browsing
Example: Castlemaster
Example: Audio Playback
6
Heterogeneous Computing
2x higher performance vs. LITTLE only
Up to 75% CPU power savings vs. big only
Architecturally Identical Processors
High performance tuned big cores
Low power tuned LITTLE cores
Hardware Coherency
Cache Coherent Interconnect (CCI)
L1 and L2 snooping between clusters
Seamless & Automatic Task Allocation
big.LITTLE Technology
“Right Task on the Right Core”
L2 Cache L2 Cache
Cache Coherent Interconnect
Interrupt Control
Up to 40% SOC power savings*
* Measured across a set of casual games and common use-cases on an ARM
Partner 4xCortex-A15.4xCortex-A7 big.LITTLE device
big Cluster
LITTLE Cluster
7
Introduction to big.LITTLE Technology
Benefits of big.LITTLE Technology
Future big.LITTLE systems
Summary
Questions
Agenda
8
0%
20%
40%
60%
80%
100%
120%
140%
160%
180%
0%
20%
40%
60%
80%
100%
120%
140%
160%
180%
ClusterMigration
big.LITTLE MP
Power
big.LITTLE MP Software Evolution
big.LITTLE
Cluster
Migration
CPU
Migration
Global Task
Scheduling
(big.LITTLE MP)
1
2
2
2
2
1
1
1
1 1
2
3
4
2
3
4
1
2
3
4
Improving Performance and Efficiency
2012 H1 2013 H2 2013
Measured Power and Performance on big.LITTLE Devices
(big.LITTLE MP relative to Cluster Migration)
-29% -38% +20%
+60% 5
6
7
8
Performance
Web
Browsing
Intensive
Gaming
Web Browsing
Intensive
Gaming
(Lower is Better) (Higher is Better)
9
big.LITTLE MP
Delivers higher power efficiency
Extends battery life
Improves user experience
0%
20%
40%
60%
80%
100%
120%
140%
160%
180%
0%
20%
40%
60%
80%
100%
120%
140%
160%
180%
ClusterMigration
big.LITTLE MP
Power
Measured Power and Performance on big.LITTLE Devices
(big.LITTLE MP relative to Cluster Migration)
-29% -38% +20%
+60%
Performance
Web
Browsing
Intensive
Gaming
Web Browsing
Intensive
Gaming
(Lower is Better) (Higher is Better)
10
Asphalt 7 DungeonDefenders
Video Playback
Normalized Jank* (Less is Better)
LITTLE only
big.LITTLE
big.LITTLE MP Improves User Experience (UX)
* Measure of variance in frame rate
Measurements conducted on the same big.LITTLE platform
58% 65% 47% UX
Improvement
0%
20%
40%
60%
80%
100%
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
DVFS states: Web Browsing with Audio
LITTLE core Idle LITTLE core Low Frequency
LITTLE core Mid Frequency LITTLE core High Frequency
big core Idle big core Low Frequency
big core Mid Frequency big core High Frequency
Short bursts of performance on big cores enable
sustained levels of smooth user-experience
LITTLE cores handle background tasks and audio
LITTLE Cluster big Cluster
†
†
†
11
0.00
0.50
1.00
1.50
2.00
4x4 big.LITTLE MP vs. 4x4 Cluster Migration
Efficiency
SoC thermal budget constrains Cortex-A15
cores to lower frequency resulting in lower
benchmark performance
35% average improvement in power efficiency across
Single-Thread and Multi-Thread workloads
1.2GHz
Cortex-A15 MP4 Cortex-A7 MP4
1.3 GHz
Frequency residency profile while running Antutu CPU
big.LITTLE MP
Cluster
Migration
big.LITTLE MP Delivers Higher Power Efficiency
Cortex-A15 and Cortex-A7 clusters at peak
performance within the thermal budget
1.7GHz 1.1GHz
1.4GHz
1.2GHz
Cortex-A15 MP4
A7 cores not running
due to cluster migration
Cortex-A7 MP4
Cluster
Migration
Power Efficiency
12
big.LITTLE MP Extends Battery Life
0%
50%
100%
150%
200%Cluster Migration
big.LITTLE MP
0%
20%
40%
60%
80%
100%
A7 CPU0 A7 CPU1 A7 CPU2 A7 CPU3 A15 CPU4 A15 CPU5
DVFS states : Temple run
LITTLE core idle LITTLE core low frequency
LITTLE core Med frequency LITTLE core high frequency
big core idle big core low frequency
big core Med frequency big core high frequency
LITTLE Cluster big Cluster
Cores in the big cluster are powered down
Single-thread performance on highly efficient
LITTLE cores enable increased power savings
Relative battery life on big.LITTLE MP
13
big.LITTLE MP Software
http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git
Linaro Landing Teams for Club and Core Members
Provides Software Support under NDA
Exclusive Landing Teams for each Member company
Services and Support Offered through ARM
Active Assist Design Review – big.LITTLE system
Technical Support & Application Notes
big.LITTLE MP Integration and Tuning Guides
On-site Software Training
big.LITTLE MP Support and Services Available
14
Agenda
Introduction to big.LITTLE Technology
Benefits of big.LITTLE Technology
Future big.LITTLE systems
Summary
Questions
15
Improved performance on big.LITTLE ARMv8
Cortex-A57: Highest performance big CPU in thermal envelope
Cortex-A53: Most energy efficient LITTLE CPU
ARMv8-A Enables 64-bit big.LITTLE
0
500
1000
1500
0 200 400 600 800 1000 1200Performance (Spec2000)
P
ow
er
(mW
)
Higher performance at same power
Extended range of efficiency
Cortex-A15 (ARMv7-A big)
Cortex-A7 (ARMv7-A LITTLE)
Cortex-A57 (ARMv8-A big)
Cortex-A53 (ARMv8-A LITTLE)
SpecInt2000 Power vs. Performance*
*SpecInt2000 on iso-process & 32-bit
16
SoC
Extending big.LITTLE MP for Thermal Management ARM Intelligent Power Allocation (IPA)
Tdie
Tskin Power transforms to heat
SoC
Device
IPA
Elements:
Proactive temperature control
big LITTLE GPU
Real time CPU & GPU
performance requests Performance Requests
Power estimation
Dynamic power allocation
big LITTLE GPU
Dynamic Allocation by:
•Performance required
•Thermal headroom
Allocated Performance
17
Intelligent Power Allocation in Action
Device temperature is below threshold
There are no constraints on power / performance
Every actor runs at max required frequency
Median filtered chart for clarity
Runnin
g
Fre
quency
Three consecutive runs of GLB TRex
Time
Max “big” freq
“big” running freq
Max “LITTLE” freq
“LITTLE” running freq
Max GPU freq
GPU running freq
18
Intelligent Power Allocation in Action
High load on GPU & low load on CPU
GPU gets allocated most of the power
Median filtered chart for clarity
Runnin
g
Fre
quency
Three consecutive runs of GLB TRex
Time
Max “big” freq
“big” running freq
Max “LITTLE” freq
“LITTLE” running freq
Max GPU freq
GPU running freq
19
Runnin
g
Fre
quency
Three consecutive runs of GLB TRex
Time
Max “big” freq
“big” running freq
Max “LITTLE” freq
“LITTLE” running freq
Max GPU freq
GPU running freq
Median filtered chart for clarity
Intelligent Power Allocation in Action
High load on CPU & low load on GPU
CPU gets allocated most of the power
20
Runnin
g
Fre
quency
Three consecutive runs of GLB TRex
Time
Max “big” freq
“big” running freq
Max “LITTLE” freq
“LITTLE” running freq
Max GPU freq
GPU running freq
Median filtered chart for clarity
Intelligent Power Allocation in Action
Device temperature gets hotter
IPA reduces available power to actors
This maintains temperature control
21
Intelligent Power Allocation in Action
0
10
20
30
40
1st Run 2nd Run 3rd Run Average
IPA
vs. T
raditio
nal
(Rela
tive
Perf
orm
ance
)
13% Improvement
34% Improvement
36% Improvement
28% Improvement
Median filtered chart for clarity
Runnin
g
Fre
quency
Three consecutive runs of GLB TRex
Time
Max “big” freq
“big” running freq
Max “LITTLE” freq
“LITTLE” running freq
Max GPU freq
GPU running freq
22
big.LITTLE Mobile 2015
GIC-400
I/O Coherent
Masters
Cortex-A57 Cortex-A53
DMC-400 Peripherals
MMU-400
MMU-400
DRAM (2 * x32 DDR3-1600)
DisplayDisplay
NIC-400
NIC-400
Mali T720
GPU
CoreLink CCI-400
TZC-400
MMU-400
23
ARM big.LITTLE Mobile Roadmap
Present Future ARM IP
CCI-400 Next-Gen Cache Coherent Interconnects
Cortex-A57 Next-Gen High Performance “big” CPUs
Cortex-A53 Next-Gen Power Efficient “LITTLE” CPUs
Global Task Scheduling ARM Software +
Cortex-A17
Cortex-A15
Cortex-A7
Intelligent Power Allocation
64-bit Android L Support +
24
Agenda
Introduction to big.LITTLE Technology
Benefits of big.LITTLE Technology
Future big.LITTLE systems
Summary
Questions
25
big.LITTLE is fast becoming the de-facto power optimization technology in mobile
big.LITTLE processing technology delivers best-in-class performance and energy
efficiency in devices today
Improved user-experience and prolonged battery life measured on real
smartphone devices
Devices transitioning to advanced big.LITTLE Technology with additional features
and IP support
Summary