Computer Science 12 Design Automation for Embedded Systems Reconciling Compilers & Timing Analysis...

Computer Science 12Design Automation for Embedded Systems

Reconciling Compilers & Timing AnalysisReconciling Compilers & Timing Analysis

for Safety-Critical Real-Time Systemsfor Safety-Critical Real-Time Systems

– –

WCET-aware program optimizationsWCET-aware program optimizations

Heiko Falk

Embedded Systems/Real-Time Systems

Ulm University, Germany

Jan C. Kleinsorge

Computer Science 12

TU Dortmund, Germany

/ 49© J.C.Kleinsorge | 2012-03-31 CGO 2012

Computer Science 12 | DAES

OutlineOutline

WCET-aware Optimizations and Code Quality

Graph Coloring Register Allocation

Scratchpad Memory Allocation

Cache-aware Memory Content Selection

Cache Partitioning for Multi-task Real-time Systems

Combination of Scratchpad Allocation, Memory Content Selection and Cache Partitioning



WCET-aware Optimizations and Code QualityWCET-aware Optimizations and Code Quality

WCET as objective function Actual speed-up but also by enhancing analyzability Side-effects of changes on timing hard to anticipate Issuing just a single instruction can lead to uncertainty regarding:

Location, alignment, access pattern (cache), schedule (pipeline), branch prediction, etc.

Code quality for WCET-aware optimizations Avoid dynamic dispatch, excessive inflation and layout changes

without being clear about its effects In short: maintain predictability first



WCET-aware Optimizations and Systems (1)WCET-aware Optimizations and Systems (1)

Semantics:

Computation

Layout

Pipeline

System

Ab

stra

ction

Open parameters:

+ Ideally: just program input

+ Location: accesses to busses, memories

+ Order, registers

+ DependenciesExpressionsExpressions

Insn (virt.)Insn (virt.)

Insn (phys.)Insn (phys.)

““BLOB”BLOB”



WCET-aware Optimizations and Systems (2)WCET-aware Optimizations and Systems (2)Practical heuristics to pick the right level of abstraction: Still on WCEP? Will decision change WCEP? Are side-effects possible and (in how far) are they bounded? What's the overall impact on the system? How often do we need to re-evaluate intermediate solutions?

Uncertainty that cannot be tackled at any level: Speculative execution, cache hierarchies (and replacement

policies), timing anomalies in general, general I/O, etc. However: Trend towards (many) simpler cores in fact improves

situation as far as per-task predictability is concerned



OutlineOutline









Workflow of Graph Coloring RAWorkflow of Graph Coloring RA

1. Initialization: Build Interference Graph G = (V, E) withV = {virtual registers} {K physical processor registers},e = (v, w) E v and w may never share the same PHREG, (i. e. v and w interfere)

2. Simplification: Remove all nodes v V with degree < K

3. Spilling: After step 2, each node of G has degree K. Select one v V; mark v as potential spill; remove v from G

4. Repeat steps 2 and 3 until G =

5. Coloring: Successively re-insert nodes v into G in reverse order; if there is a free color kv, color v; else, mark v as actual spill

[A. W. Appel, Modern compiler implementation in C, 1998]



Problem of Standard Graph ColoringProblem of Standard Graph Coloring

3. Spilling: After step 2, each node of G has degree K. Select one v ∊ V; mark v as potential spill; remove v from G

Which node v should be selected as potential spill?

Common graph coloring implementations select … … the first node v according to the order in which VREGs

were generated during code selection, ... the node with highest degree in the interference graph, ... a node with high degree, with many DEFs/USEs, in

some inner loop – maybe depending on profiling data.

Uncontrolled spill code generation – potentially alongWorst-Case Execution Path (WCEP) defining the WCET!



WCET-aware Register Allocation WCET-aware Register Allocation

Derived from classic Chaitin graph coloring Registers allocation as a problem from the „tip“ of the memory

hierarchy Besides runtime overhead, spill-code affects:

Instruction count, schedule, memory layout, cache access and pattern, etc.

WCET-aware optimization must take into account: Where to store data (actual allocation decision)? But also: Where to store (spill) instruction (relative to WCEP)?

The catch: …relies on WCET data provided by WCET analysis using aiT ...can’t obtain WCET data since code contains virtual registers



WCET-aware RA: of Chickens and Eggs WCET-aware RA: of Chickens and Eggs

Pessimistic register allocation: Start by marking all VREGs as actual

spill (each VREG is spilled. Now code is fully analyzable)

Perform WCET analysis, get WCEP Allocate VREGs of basic block b with

most worst-case spill code executions to PHREGs using standard GC on original program

Re-evaluate novel WCEP Stop and allocate rest if no more

VREGS on WCEP



Results – Worst-Case Execution TimesResults – Worst-Case Execution Times

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

110%

Re

lati

ve

WC

ET

ES

T [

%]

(Op

tim

iza

tio

n L

ev

el -

O3

)

100% = WCETEST using Standard Graph Coloring (highest degree)

93%

24%

69%



Results – Average-Case Execution TimesResults – Average-Case Execution Times

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

110%

120%

Re

lati

ve

AC

ET

[%

](O

pti

miz

ati

on

Le

ve

l -O

3)

100% = ACET using Standard Graph Coloring (highest degree)

-6% – -12%



Summary & CaveatsSummary & CaveatsSummary Standard graph coloring unaware of worst-case properties May thus lead to uncontrolled spill code generation along WCEP WCET-aware register allocation: combination of standard graph

coloring with WCET-aware spill heuristic Average WCET reductions over 46 benchmarks: 31.2%

Caveats “Bad” spills not revocable, might unbalance pipeline load Experiments with highly accurate ILP-based WCET-aware register

allocation

[H. Falk, WCET-aware Register Allocation based on Graph Coloring, DAC 2009]



OutlineOutline









Caches vs. Scratchpad Memories Caches vs. Scratchpad Memories (SPM)(SPM)

Caches: Processor

L1-Cache

Main Memory

Scratchpads: Processor

SPM

Main Memory

Hardware-controlled Cache contents difficult to

predict statically Latencies of memory

accesses highly variable WCETEST often imprecise

Caches often deactivated in hard real-time systems

No autonomous hardware SPM seamlessly integrated in

processor’s address space Latencies of memory

accesses constant WCETEST extremely precise

SPM contents to be defined by compiler



Scratchpad Allocation: Variants and CaveatsScratchpad Allocation: Variants and Caveats

Characteristics of code and data allocation: Data relocation and mutation „naturally“ supported by architecture Code relocation usually requires modification of instructions

Locality annihilated (Potentially already optimized) Runtime properties destroyed

Static and dynamic scratchpad optimization: Static: Precompute global and static relocation, maintain order

(therefore locations implicit) Dynamic: Precompute dynamic exchange of SPM contents

Perspective of static analysis: self-modifying code Static dispatch (overlaying targets) Memory allocation is hard



ILP for WCET-aware SPM Allocation of CodeILP for WCET-aware SPM Allocation of Code

Goal Determine set of basic blocks to be allocated to the SPM ...such that selected basic blocks lead to overall minimization

of WCETEST

...under consideration of switching WCEPs.

Approach Integer-linear programming (ILP) Optimality of results: no need for backtracking techniques

In the following: uppercase = constants, lowercase = variables



Costs of basic block bi:

models the WCETEST of bi if it is allocated to main memory or SPM, respectively

Decision Variables & CostsDecision Variables & Costs

Binary decision variables per basic block (BB):



Intraprocedural Control FlowIntraprocedural Control Flow

Modeling of a function’s control flow:

A

CB

D

E

Acyclic sub-graphs: (Reducible) Loops:

B

A

C

D

E

Treat body of inner-most loop L like acyclic sub-graph

Fold loop L Costs of L:

Continue with next innermost loop

[V. Suhendra et al., WCET Centric Data Allocation to Scratchpad Memory, RTSS 2005]

= WCET of any path starting at A

Loop LB, C, D



Jump scenarios:

Cross-Memory JumpsCross-Memory Jumps

Allocation of consecutive BBs: Allocation of consecutive BBs in the CFG to different memories

requires adaption/insertion of dedicated jumping code Cross-memory jumps are costly Jumping code: variable overhead in terms of WCETEST and

code size, depending on decision variables

bi

bk

bj

bi

bk

bj

bi

bj

a) Implicit b) Unconditional c) Conditional



Penalties for jump scenarios ( Boolean XOR): Penalty for Implicit jumps:

Add high penalty if BBs i and j are placed in different memories

Penalty for Unconditional jumps: If bi and bj in different memories:

If bi and bj adjacent in same memory: 0

If bi and bj not adjacent in same memory:

Conditional jumps: Obvious combination of and

Penalties for Cross-Memory JumpsPenalties for Cross-Memory Jumps

bi

bk

bj

bj

bk bj



Jump Penalties & Interprocedural Control FlowJump Penalties & Interprocedural Control Flow

Jump penalties for basic block bi:

Modeling of the global control flow: Variable models cost of WCEP starting at bF

entry

If F’ calls F, must be added to WCETEST of F’



Call PenaltiesCall Penalties

Call penalties for basic block bi:

If bi calls F, add WCETEST of F to call penalty. Furthermore, add if bi contains cross-memory call, otherwise.

Final control flow constraints per basic block bi:

Add jump and call penalties to variable modeling WCETEST of any path starting at bi



Objective FunctionObjective Function

WCETEST of entire program:

Variable models WCETEST of entire program

Size of BB bi depends on actual jumping code for bi:

Size of jumping code for bi:# bytes for jumping code, depending on jump/call scenario

Total size of basic block bi:Size of bi without any jumping code plusSize of bi’s jumping code

Scratchpad CapacityScratchpad Capacity



Average WCETAverage WCETESTEST for 73 Benchmarks for 73 Benchmarks

0%10%20%30%40%50%60%70%80%90%

100%110%

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Relative SPM Size [%]

Av

g. R

ela

tiv

e W

CE

TE

ST [

%]

Steady WCETEST decreases for increasing SPM sizes

WCETEST reductions from 7% – 40%

7%



Summary & CaveatsSummary & Caveats

Summary Current state of the art:

Neglects varying jumping code in basic blocks Select one element of power set of basic blocks

Our approach: Models changing WCEPs Uses jump scenarios to cope with varying jumping code

Caveats Implicit control-flow model requires well-structured code No component-wise compilation

[H. Falk, J. Kleinsorge Optimal Static WCET-aware Scratchpad Allocation of Program Code,DAC 2009]



OutlineOutline









Cache-aware Memory Content SelectionCache-aware Memory Content Selection

Compilers good at dealing with registers (register/stack) WCC good at SPM-allocation (spm/main memory) Aspects of cache-aware optimizations:

Generally unresolved problem due to system-wide influence of local decisions and generally unknown cache parameters

Only generalized attempts on data - like loop transformations - to improve average access pattern on data

For predictability and idle optimization potential in code: Divide program in cached or uncached parts Software-controlled memory content selection to adapt to

actual access pattern



012345

C a c h eS et M a in M e m o ry

fo o 2

fo o 3

fo o 1S e t

Example for unprofitable memory layout: Mutual eviction of functions Could lead to a highly increased WCETEST due to thrashing

fo o 1void foo1() { for(i=0; i<100; i++) { foo2(); foo3(); ... }}


fo o2fo o2

fo o3

fo o 1



Basic idea: Step-wise allocation of functions to cached memory areas Select functions whose WCETEST benefits most from cached

execution

Unprofitable functions w. r. t. a program’s WCETEST must not evict profitable ones from cache

Hill-climbing approach with a “profit”-metric:




Memory Content Selection Algorithm (1)Memory Content Selection Algorithm (1)LLIR mcs( LLIR P, Cache cache ):

// Precompute profit profit = computeFunctionProfit( P )

// Fill cache exactly once for_each( sort( F in P, profit ) ): allocate( F, cache ) if cache.full:

break // Perform WCET-aware cache-allocation ...



Memory Content Selection Algorithm (2)Memory Content Selection Algorithm (2) // “Overcharge” cache memory unless WCETEST degrades wcet = computeWCET( P ) profit = computeFunctionProfit( P )

// As before: most profitable function first for_each( sort( F in P, profit ) ): allocate( F, cache ) tmp = computeWCET( P )

// Only keep improvements if ( wcet < tmp ): deallocate( F, cache ) else wcet = tmp profit = computeFunctionProfit( P )



adpcm

g721_encode g723_encodegsm

h264dec

md5

rijndael_decoder rijndael_encoder statem

ate v32.m

odem_benc.

Average

50%

60%

70%

80%

90%

100%

5% Cache 10% Cache 20% Cache

Re

lati

ve

WC

ET

Results compared to unoptimized cache usageE

ST 20%



Conclusion Iterative approach ensures optimizing along a possibly switching

WCEP Profitable functions not evicted from cache by unprofitable ones

w.r.t. their WCETEST

Achieves WCETEST reductions of up to 20%

Caveats Greedy approach (upside: direct, simple) Functions as allocation units might be too coarse


[S. Plazar, P. Lokuciejewski and P. Marwedel, WCET-driven Cache-aware Memory Content Selection, ISORC 2010]



OutlineOutline









Software-based Cache PartitioningSoftware-based Cache Partitioning

General thoughts on presented optimization strategies: Until now, greedy relocation successful strategy to get around

intra-task cache conflicts due to tight coupling with static WCET analysis

Fails in multi-task environments: Analysis unaware of potential preemptions

Safety can only be achieved by guaranteeing no collisions Granularity: instructions (possibly splitting basic blocks)

Intuition: Divide the cache into partitions of optimal size Assign one task per partition to prevent mutual eviction



Software-based Cache PartitioningSoftware-based Cache Partitioning Exploit the cache addressing logic (index bits) Distribute memory blocks of tasks over address space Ensure mapping to particular cache lines

Effectively inverts the logical mapping direction

0x0

0x80

0x100

0 x 0

0 x 1 8 0

0 x 8 0

0 x 2 0 0

0 x 1 0 0

0 x 2 8 0Ta s k 1 (p a r t3 )

Ta s k 2 (p a r t1 )



M a in M e m o r y

Ta s k 1

Ta s k 2

C a c h e




WCET-aware Cache PartitioningWCET-aware Cache Partitioning

Greedy approach Partition size depends on task’s

code size Example: 4 tasks with the same

code size

Better ILP-model to select individual

partition size per task Take number of activations into

account

Cache Line

0

63

Task 1

Task 2

Task 4

Task 3

Cache Line

0

63

Task 1

Task 2

Task 4

Task 3

[F. Müller, Compiler Support for Software-Based Cache Partitioning, 1995]



ILP ILP FormulationFormulation



ILP FormulationILP Formulation

Each task must have a partition assigned:

Keep track of the cache size:



ILP FormulationILP Formulation

Partition-specific WCET per task:

Objective function to minimize:



Results: MRTC benchmarksResults: MRTC benchmarks

256 512 1024 2048 4096 8192 1638440%

50%

60%

70%

80%

90%

100%

5 Tasks 10 Tasks 15 Tasks

C ache size [B ytes]

Re

lativ

e W

CE

T

Average of 100 sets of randomly selected tasks:

5 tasks: ~6kB 10 tasks: ~12kB 15 tasks: ~19kB

WC

ET

re

lati

ve t

o g

reed

y ap

pro

ach



Conclusion Optimal partition sizes w.r.t. the overall system WCET Partitioning introduces predictability for preemptive schedules Average WCET reduction of 12% (5 tasks) up to 19% (15 tasks)

compared to greedy approach

Caveats “Zero-collision” policy can be too conservative depending on the actual

cache logic and scheduling policy Pre-computation of partitions time consuming Locality in address space (basic block splits, instruction corrections)


[S. Plazar, P. Lokuciejewski and P. Marwedel, WCET-aware Software Based Cache Partitioning for Multi-Task Real-Time Systems, WCET 2009]



OutlineOutline









An Experiment: Combined ApproachAn Experiment: Combined Approach

Can SPM allocation, memory content selection and cache partitioning be combined?

Intention is to fully exploit memory hierarchy All three severely alter the memory layout due to relocation and

partitioning Order of application critical for good results

Example:MCS prior to SPM

Cached Uncached SPM

Taski

Taski,j

Taski,k

Taski,l

Taski,j,0

Taski,j,1

Taski,j,2

Taski,k,0

Taski,k,1



Combined Approach (1)Combined Approach (1)

Reasoning about the order of application SPM allocation (SPMA) should be performed prior to Memory

Content Selection (MCS) and Cache Partitioning (CP) CP prior to MCS:

Similar to previous example: cache potentially under-utilized MCS prior to CP:

CP only considers objects designated to be cached by MCS Likely that the greedy MCS decision was inappropriate given the

potential exploited by a fine-grained partitioning Computing MCS solution per partition in precomputation of CP Apply CP ILP to determine optimal combination



Application in order: Application in order: Effects of SPM, CP invoking MCS in preprocessing

Remainsuncached (MCS)

Not affected byCP/MCS (SPMA)



EvaluationEvaluation

0%

20%

40%

60%

80%

100%

0% 1% 3% 5% 10% 15% 20% 25%

25%

15%

5%

1%

Gai

n c

om

par

ed t

o u

no

pti

miz

ed

co

de

SPM size (%)

Cache size (%)

92%

Gains in WCETEST:

crc, fft1, gsm_decode, trellis

73%



Remarks WCET-aware compilation:

Compilers usually are unaware of timing Optimistic optimization strategies: no clearly defined objective “Maybe faster but could be worse” doesn’t quite cut it for hard

real-time applications (profile-guided optimization no match) Fine-grained optimization decisions span from well-directed

exploitation over conflict reduction to full conflict freedom

Challenges Multi-tasking: component-wise compilation, interaction, OS Multi-core: inter-core communication Tailor (fully) predictable but still highly configurable systems

Conclusion Conclusion



Backup: RABackup: RA



WCET-Aware Graph Coloring (1)WCET-Aware Graph Coloring (1)LLIR WCET_GC_RA( LLIR P ):

// Iterate until current WCEP is fully allocated. while ( true ): // Clone P, spill all VREGs of P’ onto stack. LLIR P’ = P.copy() P’.spillAllVREGs()

// Compute Worst-Case Execution Path for fully spilled LLIR. set<basic_blocks> WCEP = computeWCEP( P’ )

// If there are no more VREGs, the allocation loop is over. if ( getVREGs( WCEP ) == ) break



WCET-Aware Graph Coloring (2)WCET-Aware Graph Coloring (2) // Determine that block on the WCEP with highest product of // Worst-Case Execution Count * spilling instructions. basic_block b’ = getMaxSpillCodeBlock( WCEP ) basic_block b = getBlockOfOriginalP( b’ )

// Collect all VREGs of this most critical block. list<virtualRegister> vregs = getVREGs( b )

// Sort VREGs by #occurrences, apply standard graph coloring. vregs.sort( occurrences of VREG in b ) traditionalGraphColoring( P, vregs ) end while

// Allocate all remaining VREGs not lying on the WCEP. traditionalGraphColoring( P, getVREGs( P ) ) return P;}



WCET-aware RA: spilling WCET-aware RA: spilling



Backup: SPM allocationBackup: SPM allocation



Timing Predictability of Caches & SPMs Timing Predictability of Caches & SPMs (G.721)(G.721)

SPMs are – in contrast to caches – highly predictable: WCETEST scale with ACETs

[L. Wehmeyer, P. Marwedel, Influence of Memory Hierarchies on Predictability for TimeConstrained Embedded Software, DATE 2005]



Support of the ILP by WCC InfrastructureSupport of the ILP by WCC Infrastructure

WCETEST of BB bi for SPM and main memory: ,

Max. Iteration counts of loop L: Size of BB bi:

SPM Size = 47 kBSPM Access = 1 CycleFlash Access = 6 Cycles

Other parameters hard-coded: , , …



WCETWCETESTEST for for g721 encoderg721 encoder

Steady WCETEST decreases for increasing SPM sizes

WCETEST reductions from 29% – 48%

0%10%20%30%40%50%60%70%80%90%

100%110%

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%


Avg

. Rel

ativ

e W

CE

TE

ST [%

] g721_encode

X-Axis: SPM size = x% of benchmark’s code size Y-Axis: 100% = WCETEST when not using SPM at all



0%10%20%30%40%50%60%70%80%90%

100%110%

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%


Av

g. R

ela

tiv

e W

CE

TE

ST [

%] cover

WCETWCETESTEST for for covercover


Stepwise WCETEST decreases: Useful content allocated to SPM only at 40%, 70% and 100% relative SPM size

WCETEST reductions of 10%, 35% and 44%



0%10%20%30%40%50%60%70%80%90%

100%110%

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%


Av

g. R

ela

tiv

e W

CE

TE

ST [

%] md5

WCETWCETESTEST for for md5md5


Almost invariable WCETEST reductions for all SPM sizes: 40% – 44%

ILP clearly finds tiny but time-critical hot-spot of md5 and allocates it to SPM



Backup: Content selectionBackup: Content selection



Example for unprofitable memory layout: Mutual eviction of functions Could lead to a highly increased WCETEST due to increased

number of possible cache missesM a in M e m o ryC a c h e

fo o 2

S e t

fo o 3

fo o 1012345

S e t


WCETEST reduction:(350-195+690-470 = )

375 cycles



Evaluation

Infineon TriCore TC1796,16 kB 2-way set associative cache (LRU), 2 MB program flash

Employed the 10 largest benchmarks of our benchmark suites DSP Stone, MediaBench, MiBench, MRTC, Netbench and UTDSP

Code size ranges from 5 kB (v32.modem_bencode) up to 15 kB (the two rijndael benchmarks)

Using optimization level –O3 (incl. procedure positioning)

Artificially limit cache sizes to 5, 10 and 20% of overall code size



Optimization Time

Most of the optimization time consumed by repetitive WCET analyses employing aiT

Maximal number of analyses amounts to: Test machine: Intel Xeon X3220 (2.4 GHz)

rinjndael_decoder: 6 WCET analyses consumed almost 2 hours of CPU time

g721/g723_encode: 17 WCET analyses amount to 8 respectively 10 minutes analysis time



Backup: Cache partitioningBackup: Cache partitioning



Distribution of CodeDistribution of Code

Achieved by exploiting the linker Each portion is assigned to its own section Example linker script for two tasks:

0 x 0

0 x 1 8 0

0 x 8 0

0 x 1 0 0

0 x 2 8 0Ta s k 2 (p a r t3 )




M a in M e m o r y


—



Memory usageMemory usage

0 x 0

0 x 1 8 0

0 x 8 0

0 x 2 0 0

0 x 1 0 0

0 x 2 8 0Ta s k 1 (p a r t3 )




M a in M e m o r y

Ta s k 1

Ta s k 2

C a c h e

—



Optimization TimeOptimization Time

Host machine: Dual Xeon L5420 @ 2.50GHz Using a single core Complete workflow consists of:

Compilation Analysis Optimization

: up to 3 minutes

: up to 1 hour / task

: up to 1 minute



Results: UTDSP benchmarksResults: UTDSP benchmarks

256 512 1024 2048 4096 8192 1638440%

50%

60%

70%

80%

90%

100%

5 Tasks 10 Tasks 15 Tasks


Re

lativ

e W

CE

T

Average of 100 sets of randomly selected tasks: 5 tasks: ~8kB 10 tasks: ~18kB 15 tasks: ~26kB

WC

ET

re

lati

ve t

o G

reed

y a

pp

roac

h



Partitioning OverheadPartitioning Overhead

256 512 1024 2048 4096 8192 163840,8

0,9

1

1,1

1,2

1,3

1,4MRTC UTD S P


Average WCET increase Caused by additional jumps



Backup: CombinedBackup: Combined



Combined Approach (2)Combined Approach (2)

Adaption of algorithms for multi-tasking model SPMA

Requires heuristic to assign memory space per task Three heuristics directly apparent:

WCET: ratio of single task WCET to accumulated task-set WCET CS: ratio of code-size to accumulated code-size WCET&CS = (WCET/CS)/2: based on assumption that larger

portions of assigned space also yields performance improvements

CP and MCS restrict to functions not allocated to SPM already



EvaluationEvaluationSample results for multi-task set of: crc, g721 marcuslee decoder, h264dec_ldecode_block

CP

CP&MCS

Gai

n c

om

par

ed t

o u

no

pti

miz

ed

co

de

Allowed relative cache size of full task-set

Algorithm:



Backup: DemoCarBackup: DemoCar

Computer Science 12 Design Automation for Embedded Systems Reconciling Compilers & Timing Analysis...

Documents

Transcript of Computer Science 12 Design Automation for Embedded Systems Reconciling Compilers & Timing Analysis...