Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends...

17
Feb 23, 2011 Gans Srinivasa Intel Corp Thanks to: Ravi Iyer, Scott Hahn, Bhushan Chitlur Doug Carmean, Pranav Mehta, Alon Naveh, Henry Gabb Power-Performance Uplift Using Hetero General compute & Domain Specific Areas 1

Transcript of Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends...

Page 1: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Feb 23, 2011

Gans Srinivasa

Intel Corp

Thanks to: Ravi Iyer, Scott Hahn, Bhushan Chitlur

Doug Carmean, Pranav Mehta, Alon Naveh, Henry Gabb

Power-Performance Uplift Using Hetero General compute & Domain Specific Areas

1

Page 2: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE,

EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.

Intel may make changes to specifications and product descriptions at any time, without notice.

All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.

Intel, Intel Inside, Xeon, and the Intel logo are trademarks of Intel Corporation in the United States and other countries.

*Other names and brands may be claimed as the property of others.

Copyright © 2011 Intel Corporation.

2

Page 3: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Agenda

Power-Performance

Servers to Clients

Heterogeneous Architecture

The Future Challenges

3

Page 4: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Energy Efficient Computing: Overall processor PowerTransition to Multicore

Source: Fred Pollack, Keynote – MICRO’32

Parallelization

UC PAR Lab Presentation – Krste Asanovic – May24,2010

Mutlicore PartsIntel 6C

What is Next?Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and

brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice.

Power Constrained Era

4

Page 5: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Trends in “The Cloud”Rapid Cloud Growth: Reduces the hassle for users and providers.

User needs only a browser: No worry about software installation, patches etc

HW & Service Providers: Focus on Total Cost of Ownership

SW Providers: Apps in controlled env and no shrink warp worries

Trend: Cloud needs more Energy efficiency, more communication bandwidth, lower TCO.

Today we have homogenous Multicore. Scaling will be limited by power and area.

Power efficient small cores have emerged and show potential for power efficient performance.

What could we do spanning Servers to Clients/Mobile Devices?

SW Service

Search,Apps

Providers

WALL POWERED

30–35% DataCenteroperating

cost is Power USERS:

Less hassle on

SW and HW front

Future Workloads Demand more performance and are

widely different*

COMMUntrusted

(Internet)Trusted

(Enterprise)

10Gbe

onwards

10Gbe

onwards

Mobile Client*Dominant application platform

Thinner form-factors and longer battery life:“Whole day” Streaming/media/audio workloads

* http://parlab.eecs.berkeley.edu/publication/137; http://parlab.eecs.berkeley.edu/5

Page 6: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Current Data Center

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110

20

40

60

80

100

120

140

Bill

ion

s (

kW

h / y

ea

r)

Historical

Trends

Current

Efficiency

Trends

Projected Data Center Energy Use

forecast

2.9% of projected

total U.S. electricity use

1.5% of total US.

electricity usage

0.8% of total US

electricity usage

EPA study for Congress on Data Center Consumption

DiskNIC NICDisk

EP Xeon Server Node Today

QPI

PCIe DDR

Ethernet

R R

R R R…

S S

S S S S

LB LB

A A A…

A A A…

Data Center

Layer 3

Layer 2

A single Layer 2 domain

A = Rack of 40 Servers

Source – NY Times, 06.14.2006

Acer AR160 F1(Intel Xeon X5670, 2.93 GHz)

1U

Intel Xeon X5670

Six-Core, 2.93 GHz, 12 MB L3 cache

2933

12 cores, 2 chips, 6 cores/chip

24 (2 / core)

3-Nov-10

Line-powered

Many “modern day data centers” operate around 20% load.

Reducing idle to 50W @10c/kwh for 10000 nodes saves ¼ Millon $ /yr

on energy cost. Potential is very high.

How can we go beyond this and achieve higher power-performance

efficiency?

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands

may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice.

0

50

100

150

200

250

300

0% 20% 40% 60% 80% 100%

Average Active Power (W)

http://www.spec.org/power_ssj2008/results/res2010q4/power_ssj2008-20101012-00300.html

Page 7: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Heterogeneous Space

Workloads are not homogeneous

Energy efficiency increasingly important

Frequency to Multicore play is running out of steam

What is Next?Heterogeneous architectures (mix of

different cores and accelerators)

7Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and

brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice.

Page 8: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

~3400MB/Sec~3400MB/SecDRAM Bandwidth

512KB 8-way512KB 2-wayL2 Cache*

24KB 6-way32KB 8-wayL1 D-Cache*

1.6GHz2GHzFrequency

Atom** (Diamondville)

Core2 Quad** (Woodcrest)

Parameters

~3400MB/Sec~3400MB/SecDRAM Bandwidth

512KB 8-way512KB 2-wayL2 Cache*

24KB 6-way32KB 8-wayL1 D-Cache*

1.6GHz2GHzFrequency

Atom** (Diamondville)

Core2 Quad** (Woodcrest)

Parameters

Big & Small cores Research

Recent measurements (SPECcpu and Bio workloads)Comparison of Atom to Core 2 Duo

Observations: Varying perf difference (1.03x – 2.63x) across applications

Shows the use case for heterogeneity

Knowledge of performance/power difference can help in scheduling

Validation:

Small Core Perf =

50% Big Core

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and

brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice.

0

1

2

3

4

5

6

7

8

9

10

eo

n

pe

rlb

mk

hm

me

r

pe

rlb

en

ch

de

nb

en

ch

con

sum

er

cra

fty

tele

com

au

tom

otiv

e

ga

p

sje

ng

ph

ylip

2

vort

ex

gzi

p

go

bm

k

om

ne

tpp

mcf

bzi

p2

pa

rse

r

two

lf

fast

a1

bzi

p2

vpr

ast

ar

ne

two

rkin

g

gcc

mcf

tigr

CP

I

0

0.5

1

1.5

2

2.5

3

Pe

rfo

rma

nce

Ra

tio

CPI Core2

CPI Atom

Performance ratio

DRAM

MCH

Big

L1

L2

Big

L1

L2

Small

L1

L2

Small

L1

L2

8

** to mimic a configuration where both cores integrated into the same

Heterogeneous uncore architecture, we configured core freq, cache and memory

Subsystem to be as close as possible as shown in the table.

Page 9: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Hetero Core Solution Space Multiple compute elements that are tightly coupled

– Big core, Atom, Smaller Cores and Hardware accelerators

– Sharing cache coherency and common memory

– Can be managed by a single OS or User level or Other means

Power is key

– Objective is to minimize energy per task

– Ideally, each task is performed in most efficient compute element

– Using a smaller core reduces power by an order of magnitude. Performance/Power improves.

Big

Accel1

SmallTiny

Accel2

With all big cores, scalability is a problem because of power wall

With all small cores, it single thread performance that suffers

With heterogeneous (big + small), we can strike a power/performance balance and

continue to scale

Pe

rfo

rma

nce

Apps range

ISA

ISA extension A ISA extension B

Courtesy: Prof Uri Wiser, Technion.

TinyTinySmall

SmallSmall

BigBig

Small

TinyTiny

9

Accel3

Page 10: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Heterogeneous Architecture Design Space

Performance asymmetry

– Cores have different performance and power

– E.g., asymmetric cache sizes, clock speeds, uarch

– Apps can run anywhere, but get different performance

Functional asymmetry– Cores have different ISAs

– Difference can be in many dimensions

– Instructions, registers, data types, addressing modes, memory architecture, exception handling, I/O

– Can have various degrees of difference

Disjoint ISAsSame ISA Overlapping ISAs

Degree of functional asymmetry

10

Page 11: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Heterogeneous Architecture Space

Disjoint ISAsSame ISA Overlapping ISAs

Degree of asymmetry

Cores with same ISA, same uArch executing at different speeds

Cores with Subset ISA, diff uArch executing at different speeds

Cores with overlapping ISA, diff uArch (embedded HW accelerators, (Xeon+MIC )

Heterogeneous Cores (cell, IA/Gen)

diff ISA,diff uArch

Traditional SMP OSDriver Model

(e.g. OpenCL)

Hetero OS

Homogeneous cores same ISA, same uArch shared resources (SMP)

same ISA, same uArch platform resource differences

(NUCA/NUMA)

11

Page 12: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

HeteroOS Research work @ Intel

Cost effective solution for ST and MT

Various scheduling options examined and continue the work

Fault and migrate to support Instruction based asymmetry

Reference: Operating System Support fro Overlapping ISA Heterogeneous Multi-core Architectures – HPCA 2010

- Scott Hahn etal Intel corp.

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

4 big 4 small 2.33 GHz

4 big 4 small 2 GHz

4 big 4 small 1.66 GHz

4 big 4 small 1.33 GHz

4 big 4 small 1 GHz

Speedup o

ver

sto

ck L

inux

SPEC OMP2001

SPECjbb2005

Kernbench

x264

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and

brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice.

0.6

0.8

1

1.2

1.4

1.6

2 big 2.66 GHz 1 big 2.66 GHz 4 small 1 GHz

1 big 2.66 GHz 4 small 1.33 GHz

1 big 2.66 GHz 4 small 1.66 GHz

1 big 2.66 GHz 4 small 2 GHz

No

rma

lize

d p

erf

orm

an

ce

SPEC OMP2001

SPECjbb2005

Kernbench (parallel make)

x264 (parallel media conversion)

Performance comparisons for two big cores only vs. one

big core, and four small cores at different frequencies

Page 13: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Many Core and Multi-Core

In Intel® MIC, each core is smaller and lower power, has lower single thread

performance, but higher aggregate performance

Many core relies on a high degree of parallelism to compensate for the lower speed of

each individual core

Relatively few specialized applications today are highly parallel, but those applications

will benefit from Intel® MIC

Many Integrated Cores at 1-1.2 GHz Multi-core Intel Xeon at 2.26-3.5 GHz

Die Size not to scale

Intel Confidential

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and

brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice.

Page 14: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Generalized Forms

Big

Core

Small

Core

Small

Core

Fixed Function

1

Fixed Function

2

Fixed Function

3

Big

Core

Small

Core

Small

Core

Fixed Function

1

Fixed Function

2

Fixed Function

3

Big

Core

Small

Core

Small

Core

Fixed Function

1

Fixed Function

2

Fixed Function

3

coherence domains

coherence domains

coherence

domains

coherence

domains

coherence

domains

coherence

domains

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and

brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice.14

Page 15: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

SW challenges

Should be tolerant of coherence domains

Scheduling policies

Hierarchical?

OS centered?

User level orchestrated?

Heterogeneous friendly

Big core / Little core

Fixed function

Migratory (similar)

Absence / Presence (dissimilar)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and

brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice.15

Let us turn this into our opportunity & Innovate

Page 16: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

HW Challenges

“On Chip Diversity”: Various IP cores, MEMs, Sensors– Interconnect fabrics for heterogeneous region to communicate

efficiently

– Along with existing cores – need continuity

– Cores communicating in a fault tolerant fashion

Network On Chip communication

Multiple Clock domains & Voltage islands

SOC and 3D chip/platform will be the norm

High Level of integrationA

cc

ele

rato

r2A

cc

ele

rato

r1

Heterogeneous

CMP +

Loosely/Tightly

coupled

Accelerators

Core3

Core1

Core4

Core2

L1 L1

L1 L1

L2 L2

L2 L2

NoC

Page 17: Power-Performance Uplift Using Heterogeneity in General ... · Trends Current Efficiency Trends Projected Data Center Energy Use forecast 2.9% of projected total U.S. electricity

Ac

ce

lera

tor2

Ac

ce

lera

tor1

Heterogeneous

CMP +

Loosely/Tightly

coupled

Accelerators

Core3

Core1

Core4

Core2

L1 L1

L1 L1

L2 L2

L2 L2

NoC

How Not To Do Hetero CMP

Big CPU

GPU, AccelSmall CPU

Future Workload

Let us get this mix rightMore Discussion and state of current work on Friday

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and

brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice.

17