Optimizations for Faster Simulation of Esterel Programs Dumitru POTOP-BUTUCARU Advisers: Gérard...
-
Upload
diego-dandy -
Category
Documents
-
view
216 -
download
0
Transcript of Optimizations for Faster Simulation of Esterel Programs Dumitru POTOP-BUTUCARU Advisers: Gérard...
Optimizations for Faster Simulation of Esterel Programs
Dumitru POTOP-BUTUCARU
Advisers:
Gérard Berry – Esterel Technologies
Robert de Simone – INRIA, project TICK
PhD Thesis Defense, 26 November 2002, Agelonde, France
Part 1: Why?– Background
– Motivation
Part 2: How?– Presentation of the work
– Results and conclusion
Two compilation trends1. Semantic completeness
• Formal semantics (Esterel v5)• Formal models (automata, circuits)• Formal analysis and optimization methods• Efficiency issues (do not scale up well)
2. Efficient simulation• Custom intermediate formats• Scale up well• Semantic issues
• Structural imperative style
Why? Because of Esterel properties
loop [ await A;emit B || await B ]; emit O; haltevery R
if(BOOT){A_active=1;B_active=1;}else { if(R){A_active=1;B_active=1;} else if(A_active|B_active) { if(A_active) if(A) {A_active=0;B=1;} if(B_active) if(B) B_active=0; if(!(A_active|B_active)) O=1; } }
– Esterel source = control-flow specification• well-structured code• control-flow optimizations
– But…
Why? Because of Esterel properties• Constructive causality
– Correct causality cycles – Instantaneous reaction to signal absence (analysis of not yet executed code)
– Solution: Translate into a formal mathematical model– But: Loss of efficiency
signal S,T in emit S; present T then present S else emit T end end;end
causality cycle
break the cycle
Explicit FSM Circuits
Very large,Very fast
SmallSlow
Bisimulation(fc2tools)
RTL optimizations(SIS)
Expensive, slowGeneral
Cheap, fastGeneral*
Efficient code
Very smallVery fast
Classical control-flow optimizations
Cheap, fastOnly “acyclic” programs
*=sccausal or slow simulation
Semantically complete
Generated code(without optim.)
Optimization
Problems
Compilingmethod
Do not scale up wellSemantics (acyclic=?)Less powerful optim.
Current methods
?Intermediate
model
What we want• Generate efficient code for “good” programs• Generate code for all programs• Understand cyclicity at a higher level • Inexpensive optimizations based on static analysis
• Formalize the efficient approach– New intermediate format/model (GRC)
• Hierarchical state representation• Control-flow graph• No specific encoding
Means
Part II - Outline
• The GRC format– Definition (small example)– Code generation for “acyclic” GRC specifications
• State encoding• Scheduling
• Static analysis for optimizations• Cyclic specifications
– What “cyclic” means?
• Implementation and benchmarks• Conclusion
[2]
[3]
0[1]
enter 5
enter 4enter 3enter 2exit 1
2
enter 6exit 3K0
Inactive[4]
R
exit 2
4A
exit 4 K0[4]
K1[4]Inactive[5]
5B
exit 5 K0[5]
K1[5][6]
K1
emit O
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
loop [ await A;emit B || await B ]; emit O; halt every R
GRC format – a small example
emit B
loop [ await A;emit B || await B ]; emit O; halt every R
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
GRC format – a small example
selection tree = parallel/exclusive abstraction of the syntax tree
The nodes represent the activation of various statements
» loop [ await A;emit B || await B ]; emit O; halt every R
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
Initial state
GRC format – a small example
loop [ await A;emit B || await B ]; emit O; halt every R
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
After the first reaction – waiting for A and B
GRC format – a small example
loop [ await A;emit B || await B ]; emit O; halt every R
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
B has been received. Waiting for A
GRC format – a small example
loop [ await A;emit B || await B ]; emit O; halt every R
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
A has been received. Halted
GRC format – a small example
loop [ await A;emit B || await B ]; emit O; halt every R
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
Program reset after R has been received
GRC format – a small example
[2]
[3]
0[1]
enter 5
enter 4enter 3enter 2exit 1
2
enter 6exit 3K0
Inactive[4]
R
exit 2
4A
exit 4 K0[4]
K1[4]Inactive[5]
5B
exit 5 K0[5]
K1[5][6]
K1
emit O
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
loop [ await A;emit B || await B ]; emit O; halt every R
emit B
GRC format – a small example
[2]
[3]
0[1]
enter 5
enter 4enter 3enter 2exit 1
2
R
exit 2
4A
exit 4
5B
exit 5
[6]
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
• loop [ await A;emit B || await B ]; emit O; halt every R
•
R absent,A present
enter 6exit 3K0
Inactive[4]
K0[4]
K1[4]Inactive[5]
K0[5]
K1[5]
K1
emit Oemit B
GRC format – a small example
[2]
[3]
0[1]
enter 5
enter 4enter 3enter 2exit 1
2
R
exit 2
4A
exit 4
5B
exit 5
[6]
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
•loop [ await A;emit B || await B ]; emit O; halt every R
•
R absent,A present
enter 6exit 3K0
Inactive[4]
K0[4]
K1[4]Inactive[5]
K0[5]
K1[5]
K1
emit Oemit B
GRC format – a small example
[2]
[3]
0[1]
enter 5
enter 4enter 3enter 2exit 1
2
R
exit 2
4A
exit 4
5B
exit 5
[6]
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
loop [ await A;emit B •|| await B ]; emit O; halt every R
•
R absent,A present
enter 6exit 3K0
Inactive[4]
K0[4]
K1[4]Inactive[5]
K0[5]
K1[5]
K1
emit Oemit B
GRC format – a small example
[2]
[3]
0[1]
enter 5
enter 4enter 3enter 2exit 1
2
R
exit 2
4A
exit 4
5B
exit 5
[6]
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
loop [ •await A;emit B || await B ]; emit O; halt every R
•
R absent,A present
enter 6exit 3K0
Inactive[4]
K0[4]
K1[4]Inactive[5]
K0[5]
K1[5]
K1
emit Oemit B
GRC format – a small example
[2]
[3]
0[1]
enter 5
enter 4enter 3enter 2exit 1
2
R
exit 2
4A
exit 4
5B
exit 5
[6]
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
loop [ await A;emit B• || await B ]; emit O; halt every R
R absent,A present
enter 6exit 3K0
Inactive[4]
K0[4]
K1[4]Inactive[5]
K0[5]
K1[5]
K1
emit Oemit B
GRC format – a small example
[2]
[3]
0[1]
enter 5
enter 4enter 3enter 2exit 1
2
R
exit 2
4A
exit 4
5B
exit 5
[6]
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
loop [ await A;emit B || await B ]•; emit O; halt every R
R absent,A present
enter 6exit 3K0
Inactive[4]
K0[4]
K1[4]Inactive[5]
K0[5]
K1[5]
K1
emit Oemit B •
GRC format – a small example
[2]
[3]
0[1]
enter 5
enter 4enter 3enter 2exit 1
2
R
exit 2
4A
exit 4
5B
exit 5
[6]
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
loop [ await A;emit B || await B ]; •emit O; halt every R
R absent,A present
enter 6exit 3K0
Inactive[4]
K0[4]
K1[4]Inactive[5]
K0[5]
K1[5]
K1
emit Oemit B •
GRC format – a small example
[2]
[3]
0[1]
enter 5
enter 4enter 3enter 2exit 1
2
R
exit 2
4A
exit 4
5B
exit 5
[6]
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
loop [ await A;emit B || await B ]; emit O; •halt every R
R absent,A present
enter 6exit 3K0
Inactive[4]
K0[4]
K1[4]Inactive[5]
K0[5]
K1[5]
K1
emit Oemit B •
GRC format – a small example
[2]
[3]
0[1]
enter 5
enter 4enter 3enter 2exit 1
2
R
exit 2
4A
exit 4
5B
exit 5
[6]
#
||
0
1
2
boot:
await A
await B
haltloop-every
#
34
5
6
loop [ await A;emit B || await B ]; emit O; halt every R
R absent,A present
enter 6exit 3K0
Inactive[4]
K0[4]
K1[4]Inactive[5]
K0[5]
K1[5]
K1
emit Oemit B
GRC format – a small example
Code generation – acyclic case
• “Good programs” => acyclic GRC flowgraphs
• Code generation for acyclic specifications– State encoding
• Software-specific• Bitwise• Hierarchic
– Static scheduling• Respects the causality
– boot instant– « await A » active, « await B » completed– « await A » active, « await B » active– « halt » active– program terminated
#
||
boot:
#
await A
await B
loop-every
0 1 2 3 4
1 0 XXX1 1 0 1 01 1 0 1 11 1 1 XX0 XXXX
0
1
0
1
• State encoding
Bit index:
States:
halt
Code generation – acyclic case
#
||
boot:
#
await A
await B
loop-every
0 1 2 3 4
0
1
0
1
• State encoding
halt
[2]
[3]
0[1]
enter 5
enter 4enter 3enter 2exit 1
2
enter 6exit 3K0
Inactive[4]
R
exit 2
4
Aexit 4 K0[4]
K1[4]Inactive[5]
5
Bexit 5 K0[5]
K1[5]
[6]
K1
emit O
0
1
2
34
5
6
Code generation – acyclic case
emit B
#
||
boot:
#
await A
await B
loop-every
0 1 2 3 4
0
1
0
1
• State encoding
halt
enter 5
enter 4enter 3enter 2exit 1
enter 6exit 3K0
Inactive[4]
R
exit 2
4
Aexit 4 K0[4]
K1[4]Inactive[5]
5
Bexit 5 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
0
1
2
34
5
6
Code generation – acyclic case
emit B
#
||
boot:
#
await A
await B
loop-every
0 1 2 3 4
0
1
0
1
• State encoding
halt
enter 5
enter 4enter 3enter 2exit 1
enter 6exit 3K0
Inactive[4]
R
exit 2
Aexit 4 K0[4]
K1[4]Inactive[5]
Bexit 5 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
0
1
2
34
5
6
Code generation – acyclic case
emit B
#
||
boot:
#
await A
await B
loop-every
0 1 2 3 4
0
1
0
1
• State encoding
halt
S[4]=1
S[3]=1S[2]=0S[1]=1exit 1
S[2]=1exit 3K0
Inactive[4]
R
exit 2
Aexit 4 K0[4]
K1[4]Inactive[5]
Bexit 5 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
0
1
2
34
5
6
Code generation – acyclic case
emit B
#
||
boot:
#
await A
await B
loop-every
0 1 2 3 4
0
1
0
1
• State encoding
halt
S[4]=1
S[3]=1S[2]=0S[1]=1
S[2]=1 K0
Inactive[4]
R
AS[3]=0 K0[4]
K1[4]Inactive[5]
BS[4]=0 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
0
1
2
34
5
6
Code generation – acyclic case
emit B
#
||
boot:
#
await A
await B
loop-every
0 1 2 3 4
0
1
0
1
• State encoding
halt
S[4]=1
S[3]=1S[2]=0S[1]=1
S[2]=1 K0
Inactive[4]
R
AS[3]=0 K0[4]
K1[4]Inactive[5]
BS[4]=0 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
0
1
2
34
5
6
Code generation – acyclic case
emit B
#
||
boot:
#
await A
await B
loop-every
0 1 2 3 4
0
1
0
1
• State encoding
halt
K0
Inactive[4]
R
AS[3]=0 K0[4]
K1[4]Inactive[5]
BS[4]=0 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
S[1..4]=1011
S[2]=1
0
1
2
34
5
6
Code generation – acyclic case
emit B
K0
Inactive[4]
R
AS[3]=0 K0[4]
K1[4]Inactive[5]
BS[4]=0 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
S[1..4]=1011
S[2]=1
• Static scheduling
Code generation – acyclic case
emit B
if(S[1]){ } else { }
K0
Inactive[4]
R
AS[3]=0 K0[4]
K1[4]Inactive[5]
BS[4]=0 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
S[1..4]=1011
S[2]=1
• Static scheduling
Code generation – acyclic case
emit B
if(S[1]){ if(R){ } else { }} else { }
K0
Inactive[4]
R
AS[3]=0 K0[4]
K1[4]Inactive[5]
BS[4]=0 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
S[1..4]=1011
S[2]=1
• Static scheduling
Code generation – acyclic case
emit B
bool aux=0;if(S[1]){ if(R){aux=1;} else { }} else {aux=1;}if(aux){ }
K0
Inactive[4]
R
AS[3]=0 K0[4]
K1[4]Inactive[5]
BS[4]=0 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
S[1..4]=1011
S[2]=1
• Static scheduling
Code generation – acyclic case
emit B
bool aux=0;if(S[1]){ if(R){aux=1;} else {
}} else {aux=1;}if(aux){S[1..4]=1011;}
K0
Inactive[4]
R
AS[3]=0 K0[4]
K1[4]Inactive[5]
BS[4]=0 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
S[1..4]=1011
S[2]=1
• Static scheduling
Code generation – acyclic case
emit B
bool aux=0;if(S[1]){ if(R){aux=1;} else { if(!S[2]){
}}} else {aux=1;}if(aux){S[1..4]=1011;}
K0
Inactive[4]
R
AS[3]=0 K0[4]
K1[4]Inactive[5]
BS[4]=0 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
S[1..4]=1011
S[2]=1
• Static scheduling
Code generation – acyclic case
emit B
bool aux=0;if(S[1]){ if(R){aux=1;} else { if(!S[2]){ if(S[3])if(A){S[3]=0;B=1;}
}}} else {aux=1;}if(aux){S[1..4]=1011;}
K0
Inactive[4]
R
AS[3]=0 K0[4]
K1[4]Inactive[5]
BS[4]=0 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
S[1..4]=1011
S[2]=1
• Static scheduling
Code generation – acyclic case
emit B
bool aux=0;if(S[1]){ if(R){aux=1;} else { if(!S[2]){ if(S[3])if(A){S[3]=0;B=1;} if(S[4])if(B)S[4]=0;
}}} else {aux=1;}if(aux){S[1..4]=1011;}
K0
Inactive[4]
R
AS[3]=0 K0[4]
K1[4]Inactive[5]
BS[4]=0 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
S[1..4]=1011
S[2]=1
• Static scheduling bool aux=0;if(S[1]){ if(R){aux=1;} else { if(!S[2]){ if(S[3])if(A){S[3]=0;B=1;} if(S[4])if(B)S[4]=0; if(S[3]=0&S[4]=0){
} }}} else {aux=1;}if(aux){S[1..4]=1011;}
Code generation – acyclic case
emit B
K0
Inactive[4]
R
AS[3]=0 K0[4]
K1[4]Inactive[5]
BS[4]=0 K0[5]
K1[5]
K1
emit O
S[1]
S[2]
S[3]
S[4]
S[1..4]=1011
S[2]=1
• Static scheduling bool aux=0;if(S[1]){ if(R){aux=1;} else { if(!S[2]){ if(S[3])if(A){S[3]=0;B=1;} if(S[4])if(B)S[4]=0; if(S[3]=0&S[4]=0){ O=1;S[2]=1; } }}} else {aux=1;}if(aux){S[1..4]=1011;}
Code generation – acyclic case
emit B
Static analysis and optimizations
• GRC specifications => usually very redundant
• Optimizations• Compatible with the code generation scheme
– GRC code optimizations– Software encoding optimizations
• Semantic-preserving• Fast, efficient
• How? Static analysis• Fast, efficient• Prepare the optimizations and the encoding• Not software-specific
• Static analysis, example
trap T in sustain A|| await B;await C;exit Tend
||
sustain A
await B
#
await C
#
boot:
nt:
nt:
Static analysis and optimizations
• Utility:– Simplify the state access/update protocol– Simplify the state encoding
Same status at all instants
• Optimized state encoding
trap T in sustain A|| await B;await C;exit Tend
||
sustain A
await B
#
await C
#
boot:
– boot instant– « sustain A »,« await B» active– « sustain A »,« await C» active– program terminated
0 1 2 3 4
1 0 XXX1 1 1 1 01 1 1 1 10 XXXX
0
1
1
Unoptimized encoding:
States:
0
Static analysis and optimizations
• Optimized state encoding
trap T in sustain A|| await B;await C;exit Tend
||
sustain A
await B
#
await C
#
boot:
– boot instant– « sustain A »,« await B» active– « sustain A »,« await C» active– program terminated
0 1 2
1 0 X1 1 01 1 10 X X
0
1
1
Optimized encoding:
States:
0 nt:
nt:
Static analysis and optimizations
• Dependency removal (propagation of exclusions)
[2]
0[1]
enter 3enter 2exit 1
exit 3 S
exit 4
enter 4
exit 2 exit 0
[4]
2[3]
pause; present S then emit T end;pause;emit S;
#
#
0
1
2
3
4
boot:
emit S
emit T
Static analysis and optimizations
• Dependency removal (propagation of exclusions)
[2]
0[1]
enter 3enter 2exit 1
exit 3 S
exit 4
enter 4
exit 2 exit 0
[4]
2[3]
pause; present S then emit T end;pause;emit S;
#
#
0
1
2
3
4
boot:
emit S
emit T
Static analysis and optimizations
• Dependency removal (propagation of exclusions)
[2]
0[1]
enter 3enter 2exit 1
exit 3 S
exit 4
enter 4
exit 2 exit 0
[4]
2[3]
pause; present S then emit T end;pause;emit S;
#
#
0
1
2
3
4
boot:
emit S
emit T
Static analysis and optimizations
• AcyclicCorrect specificationEfficient code generation
• Depends on the representation(GRC,circuit,…)– Compatibility, correctness, and efficiency issue (algorithmic)– Circuits = privileged representation (finer, cleaner)
• Unify GRC-level and circuit-level cyclicity• Generate simulation code (future work)
Cyclic specifications
• Difference only on synchronizers• Solution: synchronizer splitting
– GRC code refinement– Inexpensive local analysis
Unify GRC-level, circuit-level cyclicityK1[0]
K0[0]K0[1]
K1IS
GO K0
GOI
I
K1
K0
[ present I then present S then pause else pause end end || nothing]; emit S
• Synopsys (S. Edwards)– Similar intermediate format, state encoding– State already encoded in the intermediate form– Better context switch encoding
• FTR&D (Bertin, Closse, Weil, Pulou, Poize)– Hierarchic state representation flattened into a
list of pending threads ordered by a static scheduling
Comparison with existing compilers
G1 GnG2
P1 PnP2
...
...
Results
• Prototype compiler (acyclic case)
• Examples:1. Turbo channel bus2. Berry’s wristwatch3. Video generator4. Shock absorber5. Operating system
model6. Avionics fuel controller7. Avionics cockpit8. Man-machine interface
Size (kbytes)
0
50
100
150
200
250
300
350
1 2 3 4 5 6 7 8
GRC2C
FTR&D
Synopsys
Test configuration: PIII/1GHz/128M/Linuxgcc-2.96 –O, 1Mcycle random or given
Relative execution
time(%grc2c)
0
50
100
150
200
250
300
350
400
450
500
1 2 3 4 5 6 7 8
GRC2C
FTR&D
Synopsys
• Semantics of Esterel with data• Intermediate model for Esterel programs• Static analysis and optimization at this level• Characterization of circuit-level cyclicity at this level• General code generation scheme• Prototype compiler, acyclic case
• Correctness proofs, complete implementation (work in progress)
• Cyclic programs…
Conclusion
Future…
Still cyclic…• Hybrid code scheduling technique
– Abstract the SCCs => globally acyclic graph– Static scheduling for the acyclic graph => efficiency?– Circuit-level simulation techniques on SCCs
• Does not guarantee program correctness– Verify otherwise– Simulation (not implementation)