Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2:...

60
Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs Data General Corporation Tracing and Characterization of NT-based System Workloads Jason Casmira, David Kaeli - Northeastern University David Hunter - DEC Software Partners Engineering Group Analysis of Commercial and Technical Workloads on AlphaServer Platforms Zarka Cvetanovic Digital Equipment Corporation Characterizing TPC-D on a MIPS R10K Architecture Qiang Cao, Pedro Trancoso, and Josep Torrellas University of Illinois at Urbana Champaign

Transcript of Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2:...

Page 1: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Session 2: Tracing and Characterization

Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs Data General Corporation

Tracing and Characterization of NT-based System Workloads Jason Casmira, David Kaeli - Northeastern University David Hunter - DEC Software Partners Engineering Group

Analysis of Commercial and Technical Workloads on AlphaServer Platforms Zarka Cvetanovic Digital Equipment Corporation

Characterizing TPC-D on a MIPS R10K Architecture

Qiang Cao, Pedro Trancoso, and Josep Torrellas University of Illinois at Urbana Champaign

Page 2: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...
Page 3: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Data General

02/01/98 Page 1Prepared By: Darrell Suggs

Optimizing UNIX for OLTP on CC-NUMA

Darrell Suggs, PhDPerformance Architect

Data General Corp.

Data General

02/01/98 Page 2Prepared By: Darrell Suggs

Overview

• In late ‘96 our challenge was ...• Tune software today for “future” architecture

- no reasonable prototypes available- software development lead time is significant

- design issues very complex, exceed our intuitive abilities-

• Specific target- Architecture: 16-32 Intel Pentium Pro, CC-NUMA- Operating System: DG/UX, commercial/enterprise UNIX- Application: Oracle RDBMS

- Workload: TPC-C

Page 4: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Data General

02/01/98 Page 3Prepared By: Darrell Suggs

Product Status

• AV20000 Product Shipped for Revenue in ‘97• Demonstrated industry leading performance• First in a line of CC-NUMA products

Data General

02/01/98 Page 4Prepared By: Darrell Suggs

Basic Approach to SW Scaling

• Construct advanced analysis environment- Obtain architecture independent traces- Construct detailed cache simulation of target platform

- Simulate with model/traces looking for SW scaling issues

•• Use analysis environment to:

- Prototype changes in OS and App- Re-trace prototype software, verify increased scaling- Work with OS/APP developers to implement changes

•• Repeat until no more high leverage scaling issues found

Page 5: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

NUMA Building Block Architecture

SCIBoard

P6 CPUL2 Cache

P6 CPUL2 Cache

P6 CPUL2 Cache

P6 CPUL2 Cache

OPB OMCMemory

OPB

B

BB

PCI

3DB

SnoopRAM PIU-A PIU-DFar Memory

Cache

SCC SCI Directory

Link Chip Link Chip

Dual SCI Rings

FabricInterface

• High throughput coherentbridge between P6 busand SCI bus

• Far memory cache forreduced averagelatencies

• HA diagnostic features• Provides for globally

viewable and uniformlyaddressed cc memory(GCM)

SHV

PGS970925-6

Data General

02/01/98 Page 6Prepared By: Darrell Suggs

CC-NUMA Architecture

• Platform Characteristics- 16 Intel/Ppro with 1MB L2 Cache- Full service local memory controller (OMC)

- Far Memory Cache controller, 128MB Direct mapped (OMC like)- Distributed coherent memory -- single image- SCI directory based cache coherency at interconnect

- Local access latency: ~300ns- Remote access latency: ~3 to 5 microsecs

-

• Key scaling issue- Number of interconnect operations per unit of work

- Interconnect operation demand per second

Page 6: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Data General

02/01/98 Page 7Prepared By: Darrell Suggs

CC-NUMA Architecture

P6/L2 P6/L2 P6/L2 P6/L2

FarCache

SCI

InterConnect

Near Mem

P6/L2 P6/L2 P6/L2 P6/L2

FarCache

SCI

InterConnectNear

Mem

P6/L2 P6/L2 P6/L2 P6/L2

FarCache

SCI

InterConnectNear

Mem

P6/L2 P6/L2 P6/L2 P6/L2

FarCache

SCI

InterConnect

Near Mem

Local Bus

Local Bus

Local Bus

Local Bus

SCIRing

Data General

02/01/98 Page 8Prepared By: Darrell Suggs

CC-NUMA Architecture Simulation

• Construct detailed discrete event simulation- Models all system cache contents and protocols- Models all busses/interconnects and associated protocols

- E.g. full simulation of SCI protocol

- Specifically,- 16 L2’s, 4 system busses/far memory caches, 4 SCI directories7

• Model driven by physical, pre-L2, address traces- flexibility to change all cache geometries (except L1)- can examine impact of various protocol optimizations

• Simulation Tool - SES Workbench- Scientific and Engineering Software- Mature and flexible tool for commercial grade simulation

Page 7: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Data General

02/01/98 Page 9Prepared By: Darrell Suggs

Architecture Independent Traces

• Objective: Capture traces on existing HW to be- extensible to different architectures (diff L2’s, bus structs, cpu

counts, etc.)

- physical addresses for both user and kernel- long, contiguous traces for large cache simulation & continuity

- representative, but manageable, sample (30 to 60 secs)- pre-L2, post-L1

Data General

02/01/98 Page 10Prepared By: Darrell Suggs

Architecture Independent Traces

• Technique Overview- Use largest available SMP (quad P6)- Start with well balanced OLTP configuration (TPC-C)

- Trace all processes executing on SMP- annotate traces to identify individual PID’s

- Process traces to identify independent process address streams

- Simulate HW by assigning processes to simulated CPU’s

Page 8: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Data General

02/01/98 Page 11Prepared By: Darrell Suggs

Trace Environment

L2 L2 L2 L2

MemCtl

P6/L1 P6/L1 P6/L1 P6/L1

Pod

Database

I/OCtl

LogicAnalyzer

NFS Server

Network

Trace Storage

System Bus

Trace Operation

- Run workload to steady state- Capture all system bus accesses, filling analyzer buffer- Logic analyzer triggers CPU feedback to “halt” cpu- Logic analyzer dumps trace buffer to disk, no cpu activity occurs- Analyzer frees cpu’s to resume work, captures addresses til full- Repeat start/capture/stop repeatedly- Results in long, contiguous traces. 30 to 60 system secs.

Hundreds of millions of accesses captured.

Buffer FullFeedback

Data General

02/01/98 Page 12Prepared By: Darrell Suggs

Process Simulation

CPU PID Address

0 128 0x1230

0 128 0x1240

1 321 0x8820

3 161 0x4210

3 161 0x4220

2 421 0x0500

1 321 0x8830

1 006 0x0070

2 421 0x0510

3 161 0x4230

1 006 0x0080

Traced Data

PID PID PID PID PID006 128 161 321 421------ ----- ----- ----- -----

0x0070 0x1230 0x4210 0x8820 0x05000x0080 0x1240 0x4220 0x8830 0x0510

0x4230

Post Processed Data

L2 0 L2 1 L2 N

HardwareSimulations

Architectureof choice

Page 9: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Data General

02/01/98 Page 13Prepared By: Darrell Suggs

Architecture Independent Traces

• Issues with trace technique- Post-L1 data is filtered

- pre-L1 data is too dense to handle 30 second sample (100’s of GB)- compensate for L1 filter by flushing L1’s on context switch- capture all addresses accessed, not every access to each address

- Increased process count for number of CPU’s- overloaded scheduler has high context switch rate

- compensate by configuring “run to block” scheduling

- I/O Service times skewed due to start/stop

- Start/stop perturbs environment- minimal impact on sequence of physical addresses per process

Data General

02/01/98 Page 14Prepared By: Darrell Suggs

CC-NUMA Software Scaling Issues

• Motivating Issues- Major HW issue: high interconnect latency- Major SW issue: long access time for shared data

- Key scaling leverage: ** Interconnect operations **- Basic NUMA optimizations were already applied

• Classes of shared data- True sharing: locks, write shared data

- False sharing: write shared data on cache line with read-only data- “Partner data”: data should be on same cache line

- e.g. a lock structure and the data that it guards

• Approach- Find & fix all high frequency false sharing/partner data- Develop algorithmic changes to minimize true sharing

Page 10: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Data General

02/01/98 Page 15Prepared By: Darrell Suggs

Interconnect Operation Trends

• Intial SW interconnect ops- 15,000/TPC-C (new order)

• Reduced (via simulation/analysis/prototype) to- 6,700/TPC-C (as measured via simulation)

• Actual system measurement- 6,600/TPC-C (with prototype changes productized)

-

• System performance improvement- 35% increase in TPM

•• Areas where performance problems persisted:

- I/O device drivers, controllers, etc- The main area ignored in simulation

Data General

02/01/98 Page 16Prepared By: Darrell Suggs

Additional Benefits of Techniques

• Simulation/analysis feedback to HW design- cache geometries- protocol optimizations

- HW buffers and other low-level resource tuning-

• Framework for studying advanced architecture design- Supporting coarse grain block-diagram tradeoffs

- Early positioning of product performance- Understanding other OS/SW issues with CC-NUMA and high

processor count SMP

Page 11: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Tracing and Characterization ofNT-based System Workloads

J. Casmira, D. Kaeli Northeastern University

D. Hunter Digital Equipment Corp.

Page 12: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Outline

• Overview

• Workloads

• Results

• Conclusions

Page 13: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Overview

• Issues with trace-driven simulation– results only as good as input trace (GIGO)

– typically only capture application behavior

• Existing trace tools– Shade

– ATOM

– SimOS

Page 14: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Current Technology

• Trace driven studies using OS-rich traces– ISCA96 27% ; ISCA97 16%

– HPCA96 0% ; HPCA97 8%

Page 15: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Workload Instruction Counts

0

2000000

4000000

6000000

8000000

10000000

12000000

idea

cdpl

ay

OQ

1

OQ

2

OQ

3

OQ

4

OQ

5

Applications

Inst

ruct

ion

Cou

nt

App Only

App & DLL

App & OS

Page 16: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

What is PatchWrx?

• Dynamic Execution Tracing Tool Suite– system instrumentation

– trace capture

– stream reconstruct

• DEC Alpha 21064 Windows NT platforms

• Low overhead with minimum slowdown– 2X when instrumented; 4X while tracing

Page 17: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

How Does PWX Work?

• Instrument NT binary images

• Using DEC Alpha PALcalls– reserve trace buffer at boot time

– log branch instruction trace entries

• Using instrumented images & trace log,reconstruct original stream

Page 18: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Workloads

• BYTEmark benchmarks– typical “industry standard” benchmark

• MS Internet Explorer– web-browser application

• MS CD Player– NT packaged utility/application

• Oracle 7.3– 3rd party NT database

Page 19: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Characteristics

• Instruction Counts and Basic Block Sizes

• Instruction cache performance

• Instruction mix

• Application only

• Application and DLLs

• Application, DLLs, and OS

Page 20: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Average Basic Block Sizes

0

5

10

15

20

25

30

idea

strin

g

neur

al

float

assi

gn

cdpl

ay

OQ

1

OQ

2

OQ

3

OQ

4

OQ

5Applications

Avg

. Siz

e in

Inst

ruct

ions

App Only

App & DLL

App & OS

Page 21: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Cache Miss Rates

0

2

4

6

8

10

12

148k

neur

al

128k

neur

al

8kcd

play

128k

cdpl

ay 8k OQ

2

128k

OQ

2

8k OQ

5

128k

OQ

5

Applications, Cache Sizes

Mis

s R

ate App Only

App & DLL

App & OS

Page 22: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Workload Instruction C

omposition

0%

20%

40%

60%

80%

100%App & OS

neural

App Onlyneural

App & OScdplay

App Onlycdplay

App & OSOQ2

App OnlyOQ2

App & OSOQ5

App OnlyOQ5

Applica

tions

Percent Composition

BS

R/JS

R

BR

BR

XX

LD/S

T

OTH

ER

Page 23: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Summary

• OS can dominate execution in commercialapplications

• OS reduces the average basic block length

• OS can dramatically change the cache behavior

• OS can significantly alter the instruction mix

• OS must be included in trace-driven simulations toprovide an accurate picture of applicationexecution

Page 24: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Future Work

• Full D-Stream Reconstruction

• FX!32

• Multiprocessor Traces

• Microsoft Windows NT 5.0

• DEC Alpha 21164

Page 25: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Analysis of Commercial andAnalysis of Commercial andTechnical Workloads onTechnical Workloads onAlphaServer PlatformsAlphaServer Platforms

Zarka Cvetanovic

Digital Equipment Corporation

February 1, 1998

Page 26: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 2

GoalsGoals

◆ highlight differences between commercialand technical workloads on AlphaServers

◆ identify architectural components that areimportant for commercial performance

Page 27: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 3

IntroductionIntroduction

◆ systems: AlphaServer 4100, 8400

◆ tools: CPU/platform performance counters

◆ workloads:◆ commercial: TPC-C, SPECweb96, Laddis

◆ technical: SPEC95 (rates, parallel), NASParallel, Streams

Page 28: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 4

Cycles Per Instruction (CPI)Cycles Per Instruction (CPI)

◆ CPI higher incommercial than themajority of technical

◆ several technical(tomcatv, hydro2d)have as high CPI ascommercial

R H 4 6 6 C P I

0 0 .5 1 1 .5 2 2 .5 3 3 .5 4

S P E C w e b 9 6

T P C - C

L a d d i s

a p p l u

a p s i

fp p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w i m

t o m c a t v

t u r b 3 d

w a v e 5

S P E C fp 9 5 _ p a r a

c o m p r e s s

g c c

g o

i j p e g

l i

m 8 8 k s i m

p e r l

v o r t e x

S P E C r a t e _ i n t 9 5

a p p l u

a p s i

fp p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w i m

t o m c a t v

t u r b 3 d

w a v e 5

S P E C r a t e _ fp 9 5

Page 29: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 5

Issuing and Stall TimeIssuing and Stall Time◆ issuing time

◆ comparable single anddual issuing time

◆ no triple/quad issuing incommercial (no fp)

◆ stall time◆ higher in commercial

than SPECint95

◆ SPECfp95: comparable

◆ frozen stalls (Dstream)higher than dry (Istream)

R H 4 6 6 P e r c e n t a g e S t a l l / I s s u i n g T i m e

0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 0

S P E C w e b 9 6

T P C - C

L a d d is

a p p lu

a p s i

fp p p p

h y d ro 2 d

m g r id

s u 2 c o r

s w im

t o m c a t v

t u r b 3 d

w a ve 5

S P E C fp 9 5 _ p a ra

c o m p r e s s

g c c

g o

i jp e g

l i

m 8 8 k s im

p e r l

vo r t e x

S P E C r a t e _ in t 9 5

a p p lu

a p s i

fp p p p

h y d ro 2 d

m g r id

s u 2 c o r

s w im

t o m c a t v

t u r b 3 d

w a ve 5

S P E C r a t e _ fp 9 5

f r o z e n s t a l l

d r y s t a l l

q u a d . i s s u e

t r ip le . is s u e

d u a l . is s u e

s in g le . i s s u e

Page 30: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 6

Memory Barrier TimeMemory Barrier Time

◆ MB time high incommercial

◆ MBs have little effecton SPEC95◆ except parallel: still

lower than commercial

R H 4 6 6 P e r c e n t a g e M e m o r y B a r r i e r C y c l e s

0 2 4 6 8 1 0 1 2 1 4

S P E C w e b 9 6

T P C - C

L a d d i s

a p p l u

a p s i

fp p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w i m

t o m c a t v

t u r b 3 d

w a v e 5

S P E C fp 9 5 _ p a r a

c o m p r e s s

g c c

g o

i j p e g

l i

m 8 8 k s i m

p e r l

v o r t e x

S P E C r a t e _ i n t 9 5

a p p l u

a p s i

fp p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w i m

t o m c a t v

t u r b 3 d

w a v e 5

S P E C r a t e _ fp 9 5

Page 31: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 7

Cache MissesCache Misses◆ high SC misses in

commercial (Bcachebandwidth important)

◆ other caches:◆ IC misses higher in

commercial (and int95)

◆ DC misses higher inSPECfp95 thancommercial

◆ BC misses higher inSPECfp95 thancommercial

R H 4 6 6 C a c h e M i s s e s p e r 1 K I

0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 8 0

S P E C w e b 9 6

T P C - C

L a d d i s

a p p lu

a p s i

fp p p p

h y d ro 2 d

m g r id

s u 2 c o r

s w im

t o m c a t v

t u r b 3 d

w a v e 5

S P E C fp 9 5 _ p a ra

c o m p re s s

g c c

g o

i j p e g

l i

m 8 8 k s im

p e r l

v o r t e x

S P E C r a t e _ in t 9 5

a p p lu

a p s i

fp p p p

h y d ro 2 d

m g r id

s u 2 c o r

s w im

t o m c a t v

t u r b 3 d

w a v e 5

S P E C r a t e _ fp 9 5

B C m is sS C m is sD C m is sI C m is s

Page 32: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 8

Replay Traps and MispredictsReplay Traps and Mispredicts◆ Replays:

◆ LDU replays high incommercial (andSPECint95)

◆ WB_MAF_FULLreplays higher inSPECfp95 thancommercial

◆ branch/PC mispredicts◆ higher in SPECint95

than commercial

R H 4 6 6 T r a p s / M i s p r e d i c t s p e r 1 K I

0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 8 0

S P E C w e b 9 6

T P C - C

L a d d i s

a p p l u

a p s i

f p p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w i m

t o m c a t v

t u r b 3 d

w a v e 5

S P E C f p 9 5 _ p a r a

c o m p r e s s

g c c

g o

i j p e g

l i

m 8 8 k s i m

p e r l

v o r t e x

S P E C r a t e _ i n t 9 5

a p p l u

a p s i

f p p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w i m

t o m c a t v

t u r b 3 d

w a v e 5

S P E C r a t e _ fp 9 5

l i t m u s . t r a pP C . m i s p rb r a n c h . m i s p rW B _ M A F . r e p l a yL D U . r e p l a y

Page 33: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 9

Branch MispredictsBranch Mispredicts

◆ branch mispredicts notcrucial for commercialperformance:◆ number of branches and

mispredicts in commercialis comparable to SPECint95

B r a n c h a n d B r a n c h - M is p r e d ic t p e r 1 K I

0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0

S P E C w e b 9 6

T P C -C

L a d d is

a p p l u

a p s i

fp p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w im

t o m c a t v

t u r b 3 d

w a ve 5

S P E C fp 9 5 _ p a ra

c o m p re s s

g c c

g o

i j p e g

l i

m 8 8 k s i m

p e r l

v o r t e x

S P E C ra t e _ in t 9 5

a p p l u

a p s i

fp p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w im

t o m c a t v

t u r b 3 d

w a ve 5

S P E C ra t e _ fp 9 5

b r a n c hb r a n c h . m i s p r

Page 34: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 10

TB MissesTB Misses

◆ TB misses not crucialfor commercialperformance:◆ several technical

workloads have higherDTB misses thancommercial

◆ ITB misses low

R H 4 6 6 T B M i s s e s P e r 1 K I n s t r u c t io n s

0 1 2 3 4 5

S P E C w e b 9 6

T P C - C

L a d d i s

a p p l u

a p s i

f p p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w i m

t o m c a t v

t u r b 3 d

w a v e 5

S P E C fp 9 5 _ p a r a

c o m p r e s s

g c c

g o

i j p e g

l i

m 8 8 k s i m

p e r l

v o r t e x

S P E C r a t e _ i n t 9 5

a p p l u

a p s i

f p p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w i m

t o m c a t v

t u r b 3 d

w a v e 5

S P E C r a t e _ fp 9 5

I T B . m i s sD T B . m i s s

Page 35: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 11

Instruction profilesInstruction profiles◆ commercial profiles

comparable toSPECint95:◆ no fp instructions

◆ ~25% loads

◆ ~10% stores

◆ ~50% integer

◆ ~15% branches

R H 4 6 6 I n s t r u c t i o n T y p e s

0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 0

S P E C w e b 9 6

T P C - C

L a d d i s

a p p l u

a p s i

f p p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w i m

t o m c a t v

t u r b 3 d

w a v e 5

S P E C fp 9 5 _ p a r a

c o m p r e s s

g c c

g o

i j p e g

l i

m 8 8 k s i m

p e r l

v o r t e x

S P E C r a t e _ i n t 9 5

a p p l u

a p s i

f p p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w i m

t o m c a t v

t u r b 3 d

w a v e 5

S P E C r a t e _ fp 9 5

l d l k . ij s r . r eb r a n c hf l o a t .i n t . o pl o a d ss t o r e s

Page 36: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 12

System RequestsSystem Requests◆ commercial: high

sharing (ReadDirty andInvalidate)

◆ parallel: high sharingin several workloads

◆ rates:◆ no sharing

◆ high bus bandwidthrequirements

R H 4 6 6 S y s t e m R e q u e s t s p e r 1 K I

0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3

S P E C w e b 9 6

T P C - C

L a d d i s

a p p l u

a p s i

f p p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w i m

t o m c a t v

t u r b 3 d

w a v e 5

S P E C fp 9 5 _ p a r a

c o m p r e s s

g c c

g o

i j p e g

l i

m 8 8 k s i m

p e r l

v o r t e x

S P E C r a t e _ i n t 9 5

a p p l u

a p s i

f p p p p

h y d r o 2 d

m g r i d

s u 2 c o r

s w i m

t o m c a t v

t u r b 3 d

w a v e 5

S P E C r a t e _ fp 9 5

B C m is s

S Y S . r e a d _ d i r t y + s e t _ s h a r e d

S Y S . i n v a l i d a t e

Page 37: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 13

Memory Bus BandwidthMemory Bus Bandwidth◆ AlphaServer 8400:

◆ 12 CPUs

◆ commercial:◆ lower bus traffic than

technical multistream

◆ not affectedsignificantly by bankconflicts (technicalaffected profoundly)

T L S B B a n d w i d t h ( M B / s )

0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0

T P C - C

L i n p a c k

B T

L U

S P

E P

t o m c a t v

s w i m

h y d r o 2 d

t o m c a t v _ p

s w i m _ p

h y d r o 2 d _ p

S t r e a m s

Page 38: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 14

Bus RequestsBus Requests◆ commercial:

◆ high shared traffic on thebus (Shared Writes)

◆ Read/Victim traffic lowerthan technical

◆ technical◆ parallel: high shared traffic

◆ multistream: highbandwidth (no sharing)

B u s R e q u e s t s ( M / s )

0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0 2 2

T P C - C

L i n p a c k

B T

L U

S P

E P

t o m c a t v

s w i m

h y d r o 2 d

t o m c a t v _ p a r

s w i m _ p a r

h y d r o 2 d _ p a r

S t r e a m s

W r i t e

W r i t e C S R

V i c t i m

R e a d

R e a d C S R

Page 39: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 15

Time-Allocation ModelTime-Allocation Model

◆ model derived frommeasured events

◆ high stall componentsin commercial:◆ S-to-Bcache

◆ B-to-memory

◆ MB

Percentage Stall Components

0 10 20 30 40 50 60 70 80 90 100

SPECweb96

TPC-C

Laddis

applu

apsi

fpppp

hydro2d

mgrid

su2cor

swim

tomcatv

turb3d

wave5

SPECfp95_para

compress

gcc

go

ijpeg

li

m88ksim

perl

vortex

SPECrate_int95

applu

apsi

fpppp

hydro2d

mgrid

su2cor

swim

tomcatv

turb3d

wave5

SPECrate_fp95

Others (reg conflict + unit busy)MBB-cache miss to memoryS-cache miss to B-cacheD-cache miss to S-cacheI-cache miss to S-cacheLitmusWB/MAF replay trapsLDU replay trapsBranch + PC mispredicts

Page 40: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 16

Summary/ConclusionsSummary/Conclusions

◆ Key factors for commercial performance:◆ high Bcache latency/bandwidth

◆ 96KB cache not sufficient

◆ low latency data sharing◆ (ReadDirty/Invalidate) on the bus

◆ efficient Memory Barriers◆ efficient locks implementation

◆ low CPU time per I/O

Page 41: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

Zarka Cvetanovic, February 1, 1998 17

AcknowledgmentsAcknowledgments

◆ Thanks to John Shakshober, Huy Phan,Dave Wilson, Paula Smith, Judy Piantedosifor help with profiling data collection

Page 42: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...
Page 43: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

&KDUDFWHUL]LQJ�73&�'�RQ�D0,36�5��.�$UFKLWHFWXUH

4LDQJ�&DR��3HGUR�7UDQFRVR�

�-RVHS�/OXLV�/DUULED�3H\ ��-RVHS�7RUUHOODV

'HSDUWPHQW�RI�&RPSXWHU�6FLHQFH

8QLYHUVLW\�RI�,OOLQRLV�DW�8UEDQD�&KDPSDLJQ

'HSDUWDPHQW�G·$UTXLWHFWXUD�GH�&RPSXWDGRUV�

�8QLYHUVLWDW�3ROLWHFQLFD�GH�&DWDOXQ\D

Page 44: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

7RSLFV�&RYHUHG

❚ 73&�'�%HQFKPDUN��5��.�SURFHVVRU

❚ 4XHU\�FDFKH�PLVVHV

❚ 6FDOLQJ

❚ 2SHUDWLRQ�&RVW

❚ ,QGH[LQJ

Page 45: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

73&�'�%HQFKPDUN

❚ 'HFLVLRQ�VXSSRUW�EHQFKPDUN

❚ ���TXHULHV��LQFOXGLQJ�WZR�XSGDWH�TXHULHV

❚ &RPSOH[�TXHULHV�❙ PXOWL�WDEOH�MRLQV❙ H[WHQVLYH�VRUWLQJ��JURXSLQJ�DQG�DJJUHJDWLRQ❙ VHTXHQWLDO�VFDQV

❚ 5XQQLQJ�RQ�3RVWJUHV��

Page 46: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

5��.�SURFHVVRU

❚ )RXU�LVVXH�VXSHUVFDODU�SURFHVVRU

❚ 7ZR�SHUIRUPDQFH�FRXQWHUV�PHDVXUH�XS�WR����HYHQWV��F\FOHV��/��/��,QVWUXFWLRQ�'DWD�FDFKHPLVVHV��HWF�

❚ (YHQWV�DUH�PHDVXUHG�SHU�SURFHVV

❚ 6DYH�WLPH�RYHU�VLPXODWLRQ

Page 47: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

6*,�2ULJLQ����

❚ 6FDODEOH�6KDUHG�PHPRU\

❚ ��SURFHVVRUV

❚ ����0%�PDLQ�PHPRU\

❚ ���.%�/��LQVWUXFWLRQ�FDFKH�DQG���.%�/��GDWDFDFKH

❚ ��0%�XQLILHG�/��LQVWUXFWLRQ�GDWD�FDFKH

Page 48: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

4XHU\�&DFKH�0LVVHV

❚ 6RPH�TXHULHV��4���4���KDYH�PRUH�PLVVHV

Total Number of Misses

0.0E+00

2.0E+08

4.0E+08

6.0E+08

8.0E+08

1.0E+09

1.2E+09

1.4E+09

1.6E+09

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q13 Q16

L2 Data

L1 Data

L2 Inst

L1 Inst

Page 49: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

4XHU\�&DFKH�0LVVHV

❚ /��LQVWUXFWLRQ�PLVVHV�GRPLQDWH�E\�IDU

Normalized Total Number of Misses

0%

20%

40%

60%

80%

100%

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q13 Q16

L2 Data

L1 Data

L2 Inst

L1 Inst

Page 50: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

4XHU\�&DFKH�0LVVHV

❚ &DFKH�SHQDOW\�KDV�VLJQLILFDQW�HIIHFW�RQ�WRWDOH[HFXWLRQ�WLPH

Normalized Execution Time

0%

20%

40%

60%

80%

100%

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q13 Q16

Cache

Non-Cache

Page 51: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

6FDOLQJ

❚ 73&�'�VSHFLILHV�D�VFDOH�IDFWRU�RI���WR��������*%WR��7%�GDWDEDVH�

❚ 'HPDQGLQJ�VSDFH�DQG�WLPH�UHTXLUHPHQW�IRUHDFK�UXQ

❚ 0RVW�UHVHDUFK�VWXGLHV�XVH�VFDOH�IDFWRU����

Page 52: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

6FDOLQJ

❚ &DFKH�PLVVHV�RI�VRPH�TXHULHV�LQFUHDVH�SURSRUWLRQDOO\ZLWK�WKH�VFDOH�IDFWRU��([DPSOH��4�

Q1 L1 Misses

1.0 1.0

6.48.7

0.02.04.06.08.0

10.0

Instr Data

10MB

100MB

Q1 L2 Misses

1.0 1.0

7.3 7.2

0.0

2.0

4.0

6.0

8.0

Instr Data

10MB

100MB

Page 53: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

Q11 L1 Misses

1.0 1.0

66.5 62.4

0.0

20.0

40.0

60.0

80.0

Instr Data

10MB

100MB

Q11 L2 Misses

1.0 1.0

75.0

22.7

0.0

20.0

40.0

60.0

80.0

Instr Data

10MB

100MB

6FDOLQJ

❚ 2WKHU�TXHULHV�GHPRQVWUDWH�PXFK�KLJKHU�PLVVHV�WKDQ�WKHVFDOH�IDFWRU��([DPSOH��4��

❚ 4XHULHV�EHKDYH�GLIIHUHQWO\�ZLWK�WKH�GDWD�VL]H�FKDQJH�+DUG�WR�VFDOH�GRZQ�DFFXUDWHO\

Page 54: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

2SHUDWLRQ�&RVW

❚ ,Q�VRPH�TXHULHV��WKH�FRVW�RI�VFDQ�LV�VPDOO

Opeartion Misses

100% 100% 100% 100%

66%

94%

62%56%

19%26%25%22%

0%20%40%60%80%

100%120%

L1 Inst L2 Inst L1 Data L2 Data

Q1

Q1_1

Q1_2

Sort

Aggre

Group

Sort

SeqScan

Q1

Q1_1

Q1_2

Page 55: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

2SHUDWLRQ�&RVW

❚ ,Q�VRPH�TXHULHV��WKH�FRVW�RI�VFDQ�GRPLQDWHV

❚ &RQFOXVLRQ��1HHG�WR�VLPXODWH�ZKROH�TXHU\�WUHH

Operation Misses

100% 100% 100% 100%85%

66%71% 64%

0%

50%

100%

150%

L1 Inst L2 Inst L1 Data L2 Data

Q6

Q6_1

Aggre

SeqScan

Q6_1

Q6

Page 56: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

,QGH[LQJ

❚ +RZ�GRHV�WKH�LQGH[�VWUXFWXUH�DIIHFW�WKH�FDFKH�PLVVHV�"

❚ &RPSOLFDWHG�LQGH[LQJ�VWUXFWXUH�FDXVH�WKH�LQGH[�VFDQ�WRVXIIHU�PRUH�FDFKH�PLVVHV�WKDQ�WKH�VHTXHQWLDO�VFDQ

L1 Cache Misses

0.0E+002.0E+054.0E+056.0E+058.0E+051.0E+06

L1 Inst L1 Data

SeqScan

IndxScan

L2 Cache Misses

0.0E+005.0E+031.0E+041.5E+042.0E+042.5E+04

L2 Inst L2 Data

SeqScan

IndxScan

Page 57: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

,QGH[LQJ

❚ $�0RGLILHG�4��ZLWK�KLJKHU�VHOHFWLYLW\�VKRZV�IHZHU�GDWDFDFKH�PLVVHV�IRU�LQGH[�VFDQ

❚ 2SWLPL]HU�QHHGV�WR�XVH�VHOHFWLYLW\�IDFWRU�WR�FKRRVHRSWLPDO�DFFHVV�PHWKRG�IRU�FDFKH�PLVVHV

L1 Cache Misses

0.0E+00

5.0E+04

1.0E+05

1.5E+05

2.0E+05

L1 Inst L1 Data

SeqScan

IndxScan

L2 Cache Misses

0.0E+001.0E+032.0E+033.0E+034.0E+035.0E+03

L2 Inst L2 Data

SeqScan

IndxScan

Page 58: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

&RQFOXVLRQV

❚ ,QVWUXFWLRQ�PLVVHV��/��HVSHFLDOO\��GRPLQDWH

❙ ,QVWUXFWLRQ�PLVVHV�VKRXOG�QRW�EH�QHJOHFWHG�LQ�VLPXODWLRQ

❚ 'LIIHUHQW�TXHULHV�KDYH�GLIIHUHQW�VFDOLQJ�EHKDYLRU

❙ +DUG�WR�VFDOH�GRZQ�WKH�GDWD�DFFXUDWHO\

❚ 2SHUDWLRQ�RWKHU�WKDQ�VFDQ�FDQ�FDXVH�PDQ\�PLVVHV

❙ 6LPXODWLRQ�RI�WKH�ZKROH�WUHH�LV�QHFHVVDU\

❚ ,QGH[�6FDQ�FDQ�LQFUHDVH�FDFKH�PLVVHV

❙ 6HOHFWLYLW\�IDFWRU�VKRXOG�EH�XVHG�WR�FKRRVH�RSWLPDO�VFDQ�PHWKRG

Page 59: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...

2/1/982/1/98

)XWXUH�:RUN

❚ 0RUH�H[SHULPHQWV�RQ�ODUJHU�GDWD�VL]H

❚ 7KH�HIIHFW�RI�LQLWLDO�GDWD�DOORFDWLRQ

❚ ,QWHUDFWLRQ�RI�PXOWLSOH�TXHULHV

Page 60: Session 2: Tracing and Characterizationiacoma.cs.uiuc.edu/iacoma-papers/file2.pdf · Session 2: Tracing and Characterization Optimizing UNIX for OLTP on CC-NUMA Darrell Suggs ...