Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

47
Carnegie Mellon Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan and Todd C. Mowry School of Computer Science Carnegie Mellon University Dept. Elec. & Comp. Engineering University of Toronto

description

Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads. Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan † and Todd C. Mowry. School of Computer Science Carnegie Mellon University. † Dept. Elec. & Comp. Engineering University of Toronto. - PowerPoint PPT Presentation

Transcript of Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Page 1: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Carnegie Mellon

Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan† and Todd C. Mowry

School of Computer ScienceCarnegie Mellon University

†Dept. Elec. & Comp. EngineeringUniversity of Toronto

Page 2: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 2 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Motivation

Chip-level multiprocessing is becoming commonplace

We need parallel programs

UntraSPARC IV 2 UltraSparc III cores

IBM Power 4 SUN MAJC Sibyte SB-1250

Can multithreaded processors improve the performance of a single application?

Page 3: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 3 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Why Is Automatic Parallelization Difficult?

One solution: Thread-Level Speculation

Automatic parallelization today Must statically prove threads are independent Constructing proofs is difficult due to ambiguous data

dependences Complex control flow Pointers and indirect references Runtime inputs

Optimistic compiler? Limited only by true dependences

Page 4: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 4 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Example

while (...){…x=hash[index1];…hash[index2]=y;...

}

Time…= hash[19]…hash[21] =...check_dep()

Thread 2…= hash[33]…hash[30] =...check_dep()

Thread 3…= hash[3]…hash[10] =...check_dep()

Thread 1

…= hash[10]…hash[25] =...check_dep()

Thread 4

…= hash[31]…hash[12] =...check_dep()

Thread 5…= hash[9]…hash[44] =...check_dep()

Thread 6…= hash[27]…hash[32] =...check_dep()

Thread 7

…= hash[10]…hash[25] =...check_dep()

Thread 4 Retry

Processor 1 Processor 2 Processor 3 Processor 4

Page 5: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 5 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Frequently Dependent Scalars

…=a

a=……=a

a=…

Can identify scalars that always cause dependences

Time

ProducerConsumer

Page 6: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 6 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Frequently Dependent Scalars

…=a

a=…

…=aa=…

Dependent scalars should be synchronized [ASPLOS’02]

Time

Signal(a)

Wait(a)

ProducerConsumer

Page 7: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 7 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Frequently Dependent Scalars

…=a

a=…

Dataflow analysis allows us to deal with complex control flow [ASPLOS’02]

…=a

a=…

Time

ProducerConsumer

Page 8: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 8 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Communicating Memory-Resident Values

Synchronize?

Speculate?

Will speculation succeed?

Time Load *p

Store *qLoad *p

Store *q

ProducerConsumer

Page 9: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 9 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Speculation vs. SynchronizationSequential Execution Speculative Parallel Execution

Load *p

Speculation succeeds: efficient

Time

Load *p

Load *p

Load *p

Store *q

Store *q

Store *q

Store *q

Load *p Load *p Load *p Load *pStore *q Store *q Store *qStore *q

Page 10: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 10 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Speculation vs. SynchronizationSequential Execution Speculative Parallel Execution

Speculation fails: inefficient

Load *p

Time

Load *p

Load *p

Load *p

Store *q

Store *q

Store *q

Store *q

Load *pStore *q

Load *pStore *q

Load *pStore *q

Load *pStore *q

Load *pStore *q

Load *pStore *q

Load *pStore *q

Load *pStore *q

Load *pStore *q

Load *pStore *q

violation

Page 11: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 11 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Speculation vs. SynchronizationSequential Execution Speculative Parallel Execution

Frequent dependences: Synchronize Infrequent dependences: Speculate

Load *p

Time

Load *p

Load *p

Load *p

Store *q

Store *q

Store *q

Store *q

Load *pStore *q

Load *pStore *q Load *p

Store *q Load *pStore *q

Page 12: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 12 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Performance Potential

Reducing failed speculation improves performance

Detailed simulation:• TLS support• 4-processor CMP

• 4-way issue, out-of-order superscalar• 10-cycle communication latency

Original

Perfect memory valuePrediction

Norm

. Reg

iona

l Exe

c. T

ime

0

100

m88ksim ijp

eg

gzip_comp

gzip_decomp

vpr_place

gcc

mcfcrafty

parser

perlbmk ga

p

bzip2_compgo

Page 13: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 13 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Hardware vs. Compiler Inserted Synchronization

Store*qLoad *p

Memory

Store*q

Load *p

Memory

Store *q

Load *p

Memory

Speculation Hardware-insertedSynchronization[HPCA’02]

Compiler-insertedSynchronization[CGO’04]

Tim

e Signal()

(stall)

ProducerConsumer

ProducerConsumer

ProducerConsumer

Wait()

Page 14: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 14 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Issues in Synchronizing Memory-Resident Values

Static analysis Which instructions to synchronize? Inter-procedural dependences

Runtime Detecting and recovering from improper synchronization

Store *qLoad *p

ProducerConsumer

Time

Page 15: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 15 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Outline

Static analysis Runtime checks Results Conclusions

Load *p

ProducerConsumer

Store *q

Time

Page 16: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 16 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Compiler Passes

FrontEnd

BackEnd

foo.c

foo.exe

InsertSynchronization

Profile DataDependences

CreateThreads

ScheduleInstructions

Decide what to Synchronize

Page 17: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 17 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Example

work()

push (head, entry)

do { push (&set, element); work(); } while (test);

Page 18: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 18 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Example

work() { if (condition(&set)) push (&set, element);}

push (head, entry)

do { push (&set, element); work(); } while (test);

Page 19: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 19 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Example

work() { if (condition(&set)) push (&set, element);}

push(head,entry) { entry->next = *head; *head = entry; }

push(head,entry) { entry->next = *head; *head = entry; }

Load *head

Store *head

Load *head(work, push)

Load *head(push)

Store *head(work, push)

do { push (&set, element); work(); } while (test);

Store *head(push)

Page 20: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 20 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Compiler Passes

FrontEnd

BackEnd

InsertSynchronization

Profile DataDependences

ThreadCreating

InstructionScheduling

Decide what to Synchronize

foo.exe

foo.c

Page 21: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 21 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Example

work() { if (condition(&set)) push (&set, element);}

do { push (&set, element); work(); } while (test);

push(head,entry) { entry->next = *head; *head = entry; }

push(head,entry) { entry->next = *head; *head = entry; }

Load *head(push)

Store *head(push)

Load *head(work, push)

Store *head(work, push)

Profile Information=======================================================

=Source Destination FrequencyStore *head(push) Load *head(push) 990Store *head(push) Load *head(work, push) 10Store *head(work, push) Load *head(push) 10

Page 22: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 22 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Compiler Passes

FrontEnd

BackEnd

InsertSynchronization

Profile DataDependences

ThreadCreating

InstructionScheduling

Decide what to Synchronize

foo.exe

foo.c

Page 23: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 23 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Dependence Graph

Load *head(work, push)

Store *head(work, push)

99010

10

Load *head(push)

Store *head(push)

Pairs that need to be synchronized can be extracted from the dependence graph

Infrequent dependences: occur in less than 5% of iterations

Page 24: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 24 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Compiler Passes

FrontEnd

BackEnd

InsertSynchronization

Profile DataDependences

ThreadCreating

InstructionScheduling

Decide what to Synchronize

foo.exe

foo.c

Page 25: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 25 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Example

work() { if (condition(&set)) push (&set, element);}

do { push (&set, element); work(); } while (test);

push(head,entry) { entry->next = *head; *head = entry; }

push(head,entry) { entry->next = *head; *head = entry; }

Load *head(push)

Store *head(push)

990

Load *head(push)

Store *head(push)

Synchronize these

push_clone(head,entry) { wait(); entry->next = *head; *head = entry; signal(head, *head);}

push_clone(&set, element);

Page 26: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 26 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Outline

• Static analysisRuntime checks Results Conclusions

ProducerConsumer

Store *q Load *pTime

Page 27: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 27 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Runtime Checks

Store *q and Load *p access the same memory address No store modifies the forwarded address between Store *q and Load *p

Signal(q, *q);

Producer forwards the address to ensure a match between the load and the store

ProducerConsumer

Load *pStore *q

Time

Page 28: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 28 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Ensuring Correctness

Store *x

• Store *q and Load *p access the same memory address No store modifies the forwarded address between Store *q and load *p

ConsumerProducer

Hardware supportSimilar to memory conflict buffer [Gallagher et al, ASPLOS’94]

Load *pStore *q

Time

Page 29: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 29 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Ensuring Correctness

Hardware support: TLS hardware already knows which locations are stored to

• Store *q and Load *p access the same memory address No store modifies the forwarded address between Store *q and load *p

ConsumerProducer

Store *yLoad *pStore *q

Time

Page 30: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 30 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Outline

• Static analysis

• Runtime checksResults Conclusions

ProducerConsumer

Store *q Load *pTime

Page 31: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 31 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Crossbar

Experimental Framework

Underlying architecture 4-processor, single-chip multiprocessor speculation supported through coherence

Simulator superscalar, similar to MIPS R14K 10-cycle communication latency models all bandwidth and contention

Benchmarks SPECint95 and SPECint2000, -O3 optimization

detailed simulationC

C

P

C

P

Page 32: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 32 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Parallel Region CoveragePa

ralle

l Reg

ion

Cove

rage

0

100

go

m88ksim ijp

eg

gzip_comp

gzip_decomp

vpr_place

gcc

mcfcrafty

parser

perlbmk ga

p

bzip2_comp

Coverage is significantAverage coverage: 54%

Page 33: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 33 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Failed SpeculationSynchronization StallOtherBusy

U=No synchronization insertedC=Compiler-Inserted Synchronization

Seven benchmarks speed up by 5% to 46%

Compiler-Inserted Synchronization

0

100

go

m88ksim ijp

eg

gzip_comp

gzip_decomp

vpr_place

gcc

mcfcrafty

parser

perlbmk ga

p

bzip2_comp

U C U C U C U C U C U C U C U C U C U C U C U C U C

10% 46% 13% 5% 8% 5% 21%

Norm

. Reg

iona

l Exe

c. T

ime

Page 34: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 34 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Compiler- vs. Hardware-Inserted Synchronization

0

100

go

m88ksim ijp

eg

gzip_comp

gzip_decomp

vpr_place

gcc

mcf

crafty

parser

perlbmk ga

p

bzip2_comp

C H C H C H C H C H C H C H C H C H C H C H C H C H

C=Compiler-Inserted SynchronizationH=Hardware-Inserted Synchronization

Compiler and hardware [HPCA’02] each benefits different benchmarks

Norm

. Reg

iona

l Exe

c. T

ime

Failed SpeculationSynchronization StallOtherBusy

Hardwaredoes better

Compilerdoes better

Page 35: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 35 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Combining Hardware and Compiler Synchronization

C=Compiler-inserted synchronizationH=Hardware-inserted synchronizationB=Combining Both

The combination is more robust than each technique individually

0

100

go

m88ksim

gzip_comp

gzip_decomp

perlbmk ga

pC H B C H B C H B C H B C H B C H B

Norm

. Reg

iona

l Exe

c. T

ime

Failed SpeculationSynchronization StallOtherBusy

Page 36: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 36 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Related Work

Zhai et. al.CGO’04Cytron

ICPP’86

Compiler-inserted

Moshovos et. al.ISCA’97

Cintra & TorrellasHPCA’02

Steffan et. al.HPCA’02

Hardware-inserted

Centralized TableDistributed Table

Tsai & YewPACT’96

Page 37: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 37 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Conclusions

Compiler-inserted synchronization for memory-resident value communication:

Effective in reducing speculation failure Half of the benchmarks speedup by 5% to 46%

(regional) Combining hardware and compiler techniques is more

robust Neither consistently outperforms the other Can be combined to track the best performer

Memory-resident value communication should be addressed with the combined efforts of the compiler and the hardware

Page 38: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 38 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Questions?

Page 39: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 39 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

The Potential of Instruction Scheduling

0

100

go

m88ksim ijp

eg

gzip_comp_R

gzip_decomp

vpr_place

mcf

crafty

parser

perlbmk ga

p

gzip_comp gc

c

E=EarlyC=Compiler-Inserted SynchronizationL=Late

Failed SpeculationSynchronization StallOtherBusy

Scheduling instructions has addition benefit for some benchmarks

ECL ECL ECL ECL ECL ECL ECL ECL ECL ECL ECL ECL ECL ECL

Bzip2_comp

Page 40: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 40 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Program Performance

0

100

go

m88ksim ijp

eg

gzip_comp_R

gzip_decomp

vpr_place

gcc

mcfcrafty

parser

perlbmk ga

p

bzip2_comp

bzip2_decomp

twolf

gzip_comp

U=Un-optimizedC=Compiler-Inserted SynchronizationH=Hardware-Inserted SynchronizationB=Both compiler and hardware

Failed SpeculationSynchronization StallOtherBusy

UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHB

Page 41: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 41 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Which Technique Synchronizes This Load?

0

100

go

m88ksim ijp

eg

gzip_comp_R

gzip_decomp

vpr_place

gcc mc

f

crafty

parser

perlbmk ga

p

bzip2_comp

twolf

UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHB UCHBUCHBUCHBUCHBUCHB

gzip_comp

U=Un-optimizedC=Compiler-Inserted SynchronizationH=Hardware-Inserted SynchronizationB=Both compiler and hardware

Synchronized by neither techniqueSynchronized by compilerSynchronized by hardwareSynchronized by both

Page 42: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 42 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Ensuring Correctness

Hardware supportSimilar to memory conflict buffer [Gallagher et al, ASPLOS’94]

Store *q Load *pStore *x

• Store *q and Load *p access the same memory address No store modifies the forwarded address between Store *q and load *p

ConsumerProducer

Page 43: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 43 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Consumer

• Store *q and Load *p access the same memory address No store modifies the forwarded address between Store *q and load *p

Ensuring Correctness

Hardware support Use the forwarded value only if the synchronized pair is dependent

UseForwarded

Value

UseMemoryValue

LocalStore to *p

q == p

NO

YES

YES NO

Store *q Load *pStore *xSignal(q);

Signal(*q)

Producer

Page 44: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 44 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Issues in Synchronizing Memory-Resident Values

• Inserting synchronization using compilers

• Ensuring correctnessReducing synchronization cost

Store *q

Load *p

ConsumerProducer

Page 45: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 45 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Reducing Cost of Synchronization

Before Instruction Scheduling

Consumer

Producer

Instruction scheduling algorithms are described in [ASPLOS’02]

After Instruction Scheduling

Producer

Consumer

Page 46: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 46 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

The Potential of Instruction Scheduling

0

100

m88ksim ijp

eg

gzip_comp

gzip_decomp

vpr_place

gap

E = Perfectly predicting synchronized memory-resident valuesC = Compiler-inserted synchronizationL = Consumer stalls until previous thread commits

Scheduling instructions could offer additional benefit

E C L E C L E C L E C L E C L E C L

Failed SpeculationSynchronization StallOtherBusy

Norm

. Reg

iona

l Exe

c. T

ime

Page 47: Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Compiler Optimization of Memory-Resident Value Communication… - 47 - Zhai, Colohan, Steffan and

Mowry

Carnegie Mellon

Using More Accuracy of Profiling Information

0

100

C RU

U=No Instruction SchedulingC=Compiler-Inserted SynchronizationR=Compiler-Inserted Synchronization (Profiled with the ref input set)

Gzip_comp is the only benchmark sensitive to profiling input

gzip_comp

Failed SpeculationSynchronization StallOtherBusy

Norm

. Reg

iona

l Exe

c. T

ime