TAP: Token-Based Adaptive Power Gating

1
TAP: Token-Based Adaptive Power Gating Andrew B. Kahng, Seokhyeong Kang, Tajana S. Rosing, and Richard Strong UC San Diego Motivation More leakage at advanced technology nodes More cores longer memory latencies Long memory accesses ( > 45ns) waste core power! Goals Power gate cores during memory accesses Zero performance hit on the application Adapt to application behavior and system utilization Maintain core-voltage noise fluctuations below 5% Keep core current below peak-current limit Token-Based Adaptive Power Gating Programmable Power Gating Switch (PPGS) Two-stage wake-up sequence First-stage header switches control peak current Peak current controls the wake-up latency More peak current more voltage noise State Retention Architectural registers saved in retention flip-flops SRAM-cell leakage reduced via source biasing Complex logic and non-essential flip-flops power gated Wake-up Sequence Token packet contains: Cache level of the miss ETA of response from next level Sent by cache controller PPGS: Receives tokens Assigns ETAs to each memory request Determines core stall window WUC: Wake-up Controller PPGS registers core state (idle/active) WUC determines safe wake-up modes Aggressive wake-up modes follow lower utilization Support from NSF, MARCO FCRP (MuSyC and GSRC centers), SRC, Oracle, and Qualcomm is gratefully acknowledged. Microarchitect ural monitoring Circuit level Power gating Power-gating controller enable_few enable_rest m[0] m[1] m[9] m[0-9] PPGS Token controlle r WUC Wake-up Mode Request Wake-up Mode Response Token Model for Core Wake-up Latency T = T 0 (w+βx+Υy+δz) α w: # of adjacent waking up cores x: # of diagonal waking up cores y: # of non-adjacent waking up cores z: # of adjacent cores at edge of chip Core Wake-up Stagger Two or more cores waking up at the same time increases wake-up latency WUC may add stagger between when two cores start waking up Stagger’s Effect of Core Wake-up Latency At 0T stagger, wake-up latency increases with the number of woken-up cores Stagger reduces wake-up latencies dependence on number of woken-up cores. A 1T (0.3ns) stagger reduces wake-up latency up to 66% For 2, 3, and 4 cores waking up simultaneously, a 3T stagger reduces wake-up latency by 18.8%, 31.9%, 40.3%, respectively Energy Savings Comparison Overview Core Power Gating System Design Modeling Core Wake-up Latency & Stagger Results TAP experiences 0% performance hit yields 22.39% energy savings for EV6 5.17X the energy savings of practical DVFS adapts to memory utilization (bzip2 vs mcf) Parameter Value Core Model Dec-Alpha EV6 @ 3.3GHz Functional Units 6ALU, 2IMULT, 2FPALU L1/L2 Priv. Caches 32KB 1cyc/256KB 4.5ns L3 Cache 8MB-16way 13ns Memory DDR3 2GB 50ns Core-to-L3 token Lat. 17.5ns Core-to-WUC Latency 5ns PPGS Wake-up Modes 4.5ns-9.1ns EV6 Pipeline Refill Lat. 2.12ns EV6 Core Wake-up Eng. 15,358pJ Assumptions & Sensitivity UCSDCS E A C TIVE M O DE PO W ER DO W N W AKE UP A C TIVE MODE RESTORE clock pow erdow n data retention clam p output enable few enable rest async-reset pow erdow n trigger 1 1T 1T: 1 clock cycle 2 3 Pow erdow n sequence 1T 2T 1T T charge 1T T restore W ake up sequence 4 5 6 7 8 9

description

TAP: Token-Based Adaptive Power Gating Andrew B. Kahng, Seokhyeong Kang, Tajana S. Rosing, and Richard Strong UC San Diego. enable_few. m[0]. m[0-9]. m[1]. UCSD CSE. m[9]. enable_rest. Core Power Gating. Power-gating controller. Overview. Motivation - PowerPoint PPT Presentation

Transcript of TAP: Token-Based Adaptive Power Gating

Page 1: TAP: Token-Based  Adaptive  Power  Gating

TAP: Token-Based Adaptive Power GatingAndrew B. Kahng, Seokhyeong Kang, Tajana S. Rosing, and Richard Strong

UC San Diego

• Motivation• More leakage at advanced technology nodes • More cores longer memory latencies• Long memory accesses ( > 45ns) waste core power!• Goals

• Power gate cores during memory accesses• Zero performance hit on the application• Adapt to application behavior and system utilization• Maintain core-voltage noise fluctuations below 5%• Keep core current below peak-current limit

• Token-Based Adaptive Power Gating

• Programmable Power Gating Switch (PPGS)• Two-stage wake-up sequence• First-stage header switches control peak current• Peak current controls the wake-up latency• More peak current more voltage noise• State Retention• Architectural registers saved in retention flip-flops• SRAM-cell leakage reduced via source biasing • Complex logic and non-essential flip-flops power gated• Wake-up Sequence

• Token packet contains: • Cache level of the miss• ETA of response from next level• Sent by cache controller• PPGS:• Receives tokens• Assigns ETAs to each memory request

• Determines core stall window• WUC: Wake-up Controller• PPGS registers core state (idle/active)• WUC determines safe wake-up modes• Aggressive wake-up modes follow lower utilization

Support from NSF, MARCO FCRP (MuSyC and GSRC centers), SRC, Oracle, and Qualcomm is gratefully acknowledged.

Microarchitecturalmonitoring

Circuit levelPower gating

ACTIVE MODE POWER DOWN WAKE UP ACTIVE MODE

RESTORE

clock

power down

data retention

clamp output

enable few

enable rest

async-reset

power down trigger1

1T

1T: 1 clock cycle

2

3

Power down sequence

1T

2T

1T

Tcharge

1T

Trestore

Wake up sequence

45

6

7

8

9

Power-gatingcontroller

enable_few

enable_rest

m[0]

m[1]

m[9]

m[0-9]

PPGSTokencontroller

WUC

Wake-up Mode Request

Wake-up Mode

Response

Toke

n

• Model for Core Wake-up Latency• T = T0(w+βx+Υy+δz)α

• w: # of adjacent waking up cores• x: # of diagonal waking up cores• y: # of non-adjacent waking up cores• z: # of adjacent cores at edge of chip• Core Wake-up Stagger• Two or more cores waking up at the same time increases wake-up latency• WUC may add stagger between when two cores start waking up

Stagger’s Effect of Core Wake-up Latency

• At 0T stagger, wake-up latency increases with the number of woken-up cores

• Stagger reduces wake-up latencies dependence on number of woken-up cores.

• A 1T (0.3ns) stagger reduces wake-up latency up to 66%• For 2, 3, and 4 cores waking up simultaneously, a 3T stagger

reduces wake-up latency by 18.8%, 31.9%, 40.3%, respectively

Energy Savings Comparison

Overview Core Power Gating

System Design

Modeling Core Wake-up Latency & Stagger

Results

• TAP • experiences 0% performance hit• yields 22.39% energy savings for EV6 • 5.17X the energy savings of practical DVFS • adapts to memory utilization (bzip2 vs mcf)

Parameter ValueCore Model Dec-Alpha EV6 @ 3.3GHzFunctional Units 6ALU, 2IMULT, 2FPALUL1/L2 Priv. Caches 32KB 1cyc/256KB 4.5nsL3 Cache 8MB-16way 13nsMemory DDR3 2GB 50nsCore-to-L3 token Lat. 17.5nsCore-to-WUC Latency 5nsPPGS Wake-up Modes 4.5ns-9.1nsEV6 Pipeline Refill Lat. 2.12nsEV6 Core Wake-up Eng. 15,358pJEV6 Leakage Power 0.916 WattsEV6 Leakage Reduction 97.65%EV6 PG Break Even Point 17.17ns

Assumptions & Sensitivity

UCSDCSE