Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors...

Runahead Execution:

An Alternative to Very Large Instruction Windows for Out-of-order

Processors

Onur Mutlu, The University of Texas at Austin

Jared Start, Microprocessor Research, Intel Labs

Chris Wilkerson, Desktop Platforms Group, Intel Corp

Yale N. Patt, The University of Texas at Austin

Presented by: Mark Teper

Outline The Problem Related Work The Idea: Runahead Execution Details Results Issues

Brief Overview Instruction

Window: Set of in-order

instructions that have not yet been commited

Scheduling Window Set of unexecuted

instructions needed to selected for execution

What can go wrong?

…

Program FlowInstruction Window

Scheduling Windows

ExecutionUnits

ExecutionUnits

…

The Problem

Program Flow

Unexecuted Instruction Executing Instruction

Long Running InstructionCommited Instruction

Instruction Window

Filling the Instruction Window

Better

IPC

Related Work Caches:

Alter size and structure of caches

Attempt to reduce unnecessary memory reads

Prefetching: Attempt to fetch data into

nearby cache before needed

Hardware & software techniques

Other techniques: Waiting instruction buffer

(WIB) Long-latency block

retirements

CPU

L1 Cache 1 Cycle

L2 Cache 10 Cycles

Memory 1000 cycles

RunAhead Execution Continue executing instructions during long stalls

Disregard results once data is available

…

Program Flow

Unexecuted Instruction Executing Instruction

Long Running InstructionCommited Instruction

Instruction Window

Benefits Acts as a high accuracy prefetcher

Software prefetchers have less information Hardware prefetchers can’t analyze code as

well

Biase predictors

Makes use of cycles that are otherwise wasted

Entering RunAhead Processors can enter run-ahead mode at any point

L2 Cache Misses used in paper

Architecture needs to be able to checkpoint and restore register state

Including branch-history register and return address stack

Handling Avoided Read Run Ahead trigger returns immediately

Value is marked as INV Processor continues fetching and executing

instructions

ld r1, [r2]

Add r3, r2, r2

Add r3, r1, r2

move r1, 0

R1

R2

R3

Executing Instruction in RunAhead Instructions are fetched and executed as

normal Instructions are committed retired out of

the instruction window in program order If the instructions registers are INV it can be

retired without executing No data is ever observable outside the CPU

Branches during RunAhead

Divergence Points: Incorrect INV value branch prediction

Predict Branch

Yes – Assume predictor is correct,Continue execution

Does BranchDepend on INV?

No - Evaluate branch

Was branch predictor correct?

Yes – Continue Execution No – Flush instruction queue

Exiting RunAhead Occurs when stalling memory access finally

returns Checkpointed architecture is restored All instructions in the machine are flushed

Processor starts fetching again at instruction which caused RunAhead execution Paper presented optimization where fetching

started slightly before stalled instruction returned

Biasing Branch Predictors RunAhead can cause branch predictors to

be biased twice on the same branch

Several Alternatives:(1)Always train branch predictors (2)Never train branch predictors (3)Create list of predicted branches(4)Create separate Branch Predictor

RunAhead Cache

RunAhead execution disregards stores Can’t produce externally observable results

However, this data is needed for communication Solution: Run-Ahead cache

Loop:

…

store r1, [r2]

add r1, r3, r1

store r1, [r4]

load r1, [r2]

bne r1, r5, Loop

Stores and Loads in Run Ahead

Loads1. If address is INV data

is automatically INV2. Next look in:

1. Store buffer2. RunAhead Cache

3. Finally go to memory1. In in cache treat as

valid2. If not treat as INV,

don’t stall

Stores1. Use store-buffer as

usual2. On Commit:

1. If address is INV ignore2. Otherwise write data to

RunAhead Cache

Run-Ahead Cache Results

Found that not passing data from stores to loads resulted in poor performance Significant number of INV results

Better

Details: Architecture

ResultsBetter

Results (2)Better

Issues

Some wrong assumptions about future machines Future baseline corresponds poorly to modern

architectures

Not a lot of details of architectural requirement for this technique Increase architecture size Increase power-requirements

Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors...

Documents

Transcript of Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors...