PIPE DREAM - Massachusetts Institute of...

16
May 9, 2005 PIPE DREAM Out-of-Order Speculative CPU Karthik Balakrishnan and Michal Karczmarek

Transcript of PIPE DREAM - Massachusetts Institute of...

Page 1: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

May 9, 2005

PIPE DREAM

Out-of-Order Speculative CPU

Karthik Balakrishnan and Michal Karczmarek

Page 2: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

The Plan

Out-of-orderSpeculativePhysical register fileSingle-issueNon-blocking, in-order memory unitBranch predictionPrecise exceptions

Page 3: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

The Basic Design: Data Flow

Page 4: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

The Basic Design: Branch Taken

Page 5: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

Register Rename Unit

Page 6: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

Re-order Buffer

Page 7: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

Design Exploration: Data Flow

Page 8: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

Design Exploration:- Branch Predictor

Use a standard 2-bit predictor

No global history (even though it was originally planned)

Carry a “branchTaken” bit to allow Execute to check if prediction was correct

2-bit Branch Predictor

Page 9: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

Design Exploration:- Branch Predictor

PCBranch History Table

WNT

WT

WT

SNTtake/don’t take

Hash on bottom bits of instruction PC

Initialize to WT

Predict all branch instructions and force take J and JAL

Page 10: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

Design Exploration:- Out of Order Scheduling

Find which instructions are ready to goDispatch memory operations in order (speculatively)Send stores to memory when retiredUse barrier instructions for COP0

Page 11: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

Design Exploration:- Out of Order Scheduling

Schedule in-order and out-of-order instructions separately

Be careful about the wrap-around

ROB Entries

Page 12: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

Place and RouteROB IPCF REGS

RENAME MMUEXEC

Page 13: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

Results

3616345170491744919610013110212796123vvadd

612756647368861725498767510113997909qsort

speculative memory unit

fast predict

fast mispredictbp/oooooobpbasic

Final clock period was 9.803nsFinal area 458569.6 um2

qsort retires 24249 instructions (ipc=0.4)vvadd retires 18026 instructions (ipc=0.5)vvadd loop: 10 instructions, 16 cyclesusing our own harness, still adapting to Chris’sNumber of cycles to complete benchmark:

Page 14: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

Interesting ProblemsBluespec infinite compile times: 12-entry ROB compilesin 2 mins; 14-entry ROB hangs forever

RWires are necessary but EVIL

Branch predict: Bug with predicting JR and JALR, noting it’s been “taken” but going to the wrong target!

Lots of bug in decode and execute: all fixed once we passed self_test, test_spim and test_all (renaming/out-of-order didn’t introduce execution bugs)

Page 15: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

Special Thanks To:

6884-bluespecChris

The End

Page 16: PIPE DREAM - Massachusetts Institute of Technologycsg.csail.mit.edu/6.884/projects/group4-presentation.pdf · Microsoft PowerPoint - group4-presentation.ppt [Read-Only] Author: ronny

Slowest Data PathretireInstrQ retireIPCQ