Cache-Related Preemption and Migration Delays: Empirical ... · Cache-Related Preemption and...

35
Cache-Related Preemption and Migration Delays: Cache Related Preemption and Migration Delays: Empirical Approximation and Impact on Schedulability l OSPERT 2010, Brussels July 6, 2010 Andrea Bastoni Björn B Brandenburg Andrea Bastoni Björn B. Brandenburg James H. Anderson University of Rome The University of North Carolina h l ll “Tor Vergata” at Chapel Hill Work supported by AT&T, IBM, and Sun Corps.; NSF grants CNS 0834270, CNS 0834132, and CNS 0615197; ARO grant W911NF-09-1-0535; and AFOSR grant FA 9550-09-1-0549.

Transcript of Cache-Related Preemption and Migration Delays: Empirical ... · Cache-Related Preemption and...

Cache-Related Preemption and Migration Delays:Cache Related Preemption and Migration Delays:Empirical Approximation and

Impact on Schedulability

lOSPERT 2010, BrusselsJuly 6, 2010

Andrea Bastoni Björn B BrandenburgAndrea Bastoni Björn B. BrandenburgJames H. Anderson

University of Rome The University of North Carolinah l ll“Tor Vergata” at Chapel Hill

Work supported by AT&T, IBM, and Sun Corps.; NSF grants CNS 0834270, CNS 0834132, and CNS 0615197;ARO grant W911NF-09-1-0535; and AFOSR grant FA 9550-09-1-0549.

The Problem

Release

T1 T2P1Deadline

Overhead

T2 T3P2

T2 migrates to P1.

t0 4 8 12 16 20

M lti Gl b l EDF h d l ith h d

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 2

Multiprocessor Global EDF schedule with overheads.

The Problem

Kernel overheads (e.g., release overhead,scheduling overhead, etc.) are “easy” to measure.

T2P1

T2P2 Can directly measure delay(code instrumentation)

t

(code instrumentation).

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 3

t0 4 8 12 16 20

The Problem

Overheads due to preemption / migrations are not!

T2P1

T2P2

t

Delay does not occur consecutively,but in many “little pieces”.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 4

t0 4 8 12 16 20

Outline

Cache-related preemption and migration delays (CPMD)(CPMD).

Two methods to measure CPMDTwo methods to measure CPMD.

Experimental results and discussionExperimental results and discussion.

Impact on schedulability (sketch).Impact on schedulability (sketch).

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 5

Cache-Related Preemption and Migration Delays (CPMD)y ( )

Cache-related preemption and migration delays:D l d t dditi l h i h i• Delays due to additional cache misses when resuming execution after a preemption or a migration.

T2

Data in local cache. (Re)fetch.

Data in remote cache Invalidate

Heavily dependent on working set size (WSS).

T2 Data in remote cache. Invalidate.

No effective WCET analysis techniques available for current multiprocessors with cache hierarchies.

Cache-Related Preemption and Migration Delays

Need to rely on empirical measurements.Bastoni, Brandenburg, and Anderson 6

Detecting CPMD

In this study: empirical approximation of CPMD.

CPMD

Complete access to J Working Set (WS),

Complete access to J Working Set (WS),

Complete access to J Working Set (WS),

Cache-Related Preemption and Migration Delays

cache-cold.g ( ),

cache-warm. after preemption.

Bastoni, Brandenburg, and Anderson 7

Measuring CPMD

Directly measure delaywith low overhead clock devicewith low-overhead clock device

M tiMeasure access times.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 8

Measuring CPMD

Indirectly measure delayith h d f twith hardware perf. counters

C t h iCount cache misses.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 9

Outline

Cache-related preemption and migration delays (CPMD)(CPMD).

Two methods to measure CPMDTwo methods to measure CPMD.

Experimental results and discussionExperimental results and discussion.

Impact on schedulability (sketch).Impact on schedulability (sketch).

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 10

Schedule-Sensitive Method

On-line recording of delays:• Execute instrumented synthetic tasks under desired• Execute instrumented synthetic tasks under desired

scheduling policy.• Wide range of WSS, TSS, and read/write ratios.

Can reveal dependencies on:• Scheduling policy.• Task set size (TSS).

Cannot explicitly control preempt./migrat.• P/M depends on the scheduling policy.• Not every job yields a valid measure.

A j b b t d t l t l t t t ll

Cache-Related Preemption and Migration Delays

• A job may be preempted too early, too late, or not at all.

Bastoni, Brandenburg, and Anderson 11

Synthetic Method

Fine-grained control on measurement process:• Artificially trigger preemptions and migrations.Artificially trigger preemptions and migrations.• Explicit control on preemptions and different types of

migrations (through L2 cache, L3 cache, and memory).Fixed-prio scheduling policy (e.g., SCHED_FIFO).• Single high-prio tasks access wide range of WSS.• Wide range of read/write ratios.

Every job yields a valid measure.Cannot detect dependencies on:• Scheduling policy, TSS.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 12

Implementation

Operating System:

LINUX+

UNC’s real-time Linux extension.D l d k l t h ( tl b dDeveloped as kernel patch (currently based on Linux 2.6.32).Code is available atCode is available at http://www.cs.unc.edu/~anderson/litmus-rt/.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 13

Implementation Issues

Low-overhead clock device.• Time-stamp counter (TSC) Per-core clock deviceTime stamp counter (TSC). Per core clock device.

Clock skew among cores.• WS access times only based on samples from the sameS access es o y based o sa p es o e sa e

processor.Interrupts interference.• Interrupts disabled during WS access.

How to detect when a preempt./migration occurred.• Low-overhead kernel-user communication mechanism.• Per-task memory page shared with the kernel.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 14

Outline

Cache-related preemption and migration delays (CPMD)(CPMD).

Two methods to measure CPMDTwo methods to measure CPMD.

Experimental results and discussionExperimental results and discussion.

Impact on schedulability (sketch).Impact on schedulability (sketch).

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 15

Test Platform

Intel Xeon L7455 “Dunnington”:

6 cores per socket.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 16

4 physical sockets.

Test Platform

Intel Xeon L7455 “Dunnington”:

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 17

12 MB L3 Cache.

Test Platform

Intel Xeon L7455 “Dunnington”:

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 18

3 MB L2 Cache.

Test Platform

Intel Xeon L7455 “Dunnington”:

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 19

12 MB L3 Cache.32 KB + 32 KB L1 Cache.

Study Setup

Schedule-Sensitive method:• G-EDF algorithm (but can be applied to other algos)G EDF algorithm (but can be applied to other algos).• TSS between 25 and 250, tasks randomly generated.

Uniform distribution. Periodic tasks with utilizations in [0.001, 0.1] and periods in [10,100] ms.

• WSS in the range from 4 KB to 2048 KB.• Per-WSS write ratio 1/2 and 1/4• Per-WSS write ratio 1/2 and 1/4.

Synthetic method:• Single SCHED FIFO task at the highest prioritySingle SCHED_FIFO task at the highest priority.• WSS in the range from 4 KB to 12 MB.• Per-WSS write ratio in the range from 0 to 1.

Cache-Related Preemption and Migration Delays

g• Preemption length uniformly distributed in [0,50] ms.

Bastoni, Brandenburg, and Anderson 20

Study Setup

Schedule-Sensitive method:• G-EDF algorithm (but can be applied to other algos)G EDF algorithm (but can be applied to other algos).• TSS between 25 and 250, tasks randomly generated.

Uniform distribution. Periodic tasks with utilizations in Tested two configurations for each[0.001, 0.1] and periods in [10,100] ms.

• WSS in the range from 4 KB to 2048 KB.• Per-WSS write ratio 1/2 and 1/4

Tested two configurations for each method:

Idle system.• Per-WSS write ratio 1/2 and 1/4.Synthetic method:• Single SCHED FIFO task at the highest priority

Idle system.Under load.

Single SCHED_FIFO task at the highest priority.• WSS in the range from 4 KB to 12 MB.• Per-WSS write ratio in the range from 0 to 1.

Cache-Related Preemption and Migration Delays

g• Preemption length uniformly distributed in [0,50] ms.

Bastoni, Brandenburg, and Anderson 21

Results

s)Logarithmic scale (log-log plots).Graphs display delays

elay

(us Graphs display delays.

Results from synthetic method only.

sing

de y y

Results from schedule-sensitive method revealed that CPMD does not depend on TSS (on our platform)

Incr

eas TSS (on our platform).

Increasing WSS (KB)

Cache-Related Preemption and Migration Delays“Higher is worse”.

Bastoni, Brandenburg, and Anderson 22

Results (System under load)

Size of L2 cacheimpacts predictability.

Standard DeviationStandard Deviation.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 23

Results (System under load)

Size of L2 cacheimpacts predictability.

Linear trend forWSS < L2 cache.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 24

Results (System under load)

No substantial difference between preemption and migration costs.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 25

Results (System under load)

Worst-case scenario: must reload cache under intense bus contention.

A li l b k d ti it

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 26

Applies unless no background activityand only small WSS for all tasks.

Results (Idle system)Best-case scenario: virtually no

contention for the memory sub-system.

L3, Memory migrations.

L2 i tiL2 migrations.Preemptions.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 27

Outline

Cache-related preemption and migration delays (CPMD)(CPMD).

Two methods to measure CPMDTwo methods to measure CPMD.

Experimental results and discussionExperimental results and discussion.

Impact on schedulability (sketch).Impact on schedulability (sketch).

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 28

Impact on Schedulability

Standard schedulability plot:

sk S

ets

C-EDF (L3)

ble

Tas

G-EDF

hedu

lab

P-EDF C-EDF (L2)

Sch

Soft Real-Time.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 29

Increasing utilization

Impact on Schedulability

Standard schedulability plot:

sk S

ets

ble

Tas

hedu

lab

Fixed CPMD value(fixed WSS value).

Sch (fixed WSS value).

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 30

Increasing utilization

Impact on Schedulabilityty

Weighted schedulability plot:

dula

bilit

e Sc

hed

greg

ate

Compact > 300 of plotsin ~ 10 plots

Agg in 10 plots.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 31

Increasing CPMD (us)

Evaluation of Scheduling Algorithmsity

C-EDF (L3) > P-EDFfor CPMD < 600 us.du

labi

lie

Sche

dgr

egat

e

Soft Real Time

Ag Soft Real-Time.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 32

Increasing CPMD (us)

Conclusions

Empirical approximation of CPMD:S h d l iti th d• Schedule-sensitive method.

• Synthetic method.

CPMD strongly impacts evaluation of schedulers:• Preemptions are not necessarily (much) cheaper

th i ti (f t h d )than migrations (for worst-case overheads).• If there is memory bus contention, then this is

also true for average-case overheads

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 33

also true for average-case overheads.

Future Work

Validate TSC-based results with performance counterscounters.Apply the methodologies on NUMA and embedded platformsembedded platforms.Investigate impact of bus locking on CPMD.• For example: DMA transfers, atomic instructions etc.For example: DMA transfers, atomic instructions etc.

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 34

http://www cs unc edu/ anderson/litmus rt/http://www.cs.unc.edu/~anderson/litmus-rt/

Thank You!

Cache-Related Preemption and Migration Delays Bastoni, Brandenburg, and Anderson 35