1 CP in Electronic Design Automation (EDA) (Java Constraint Programming) JaCoP solver Radoslaw...
-
date post
20-Dec-2015 -
Category
Documents
-
view
229 -
download
2
Transcript of 1 CP in Electronic Design Automation (EDA) (Java Constraint Programming) JaCoP solver Radoslaw...
1
CP in Electronic Design Automation (EDA)
(Java Constraint Programming) JaCoP solver
Radoslaw (Radek) Szymanek
6
CP in EDA
Constraint-DrivenDesign Space Exploration
for Memory-DominatedEmbedded Systems
Radoslaw Szymanek
7
Embedded Systems
• Processor based system
• Integral part of larger system
• Specific functionality
• Heterogeneous architecture
• Heterogeneous requirements
8
Application
• Data dominated application
• Application model at high abstraction level
• Annotated task graph
• Heterogeneous constraints
9
Processing Architecture
P1
ROM
RAM
P2
ROM
RAM
P3
ROMRAM
A1
RAM
B1
L1
L2
• Heterogeneous units
• Resource view
• Trade-offs
• Architecture Selection
10
Execution Scenario
ErrCor
D4
ASIC
Cancel
D2
DSPUI
D1
P
D3
D3
DSP
D4 D5 D6
Scrabling
Encoding
Decoding
C3
Descrabling
C4
11
Design Flow
Specification Design Execution
Application
(C/C++, SystemC)
•Architecture Selection•Task Assignment•Task Scheduling
Application
TaskGraph Model
•Pareto Diagram Composition•Data Assignment•Data Access Scheduling
Application
ExecutionScenario(s
)
12
Motivation
• Memory contributes most to cost, power consumption, and application execution time
• Exploration of different resource trade-offs (finds efficient execution scenarios)
• Constraints during the system synthesis are abundant (Specify, Explore, Refinement)
• Synthesis problem often changes so we need an easy method to extend, understand, and employ optimization suite
14
Schedule length versus cost (I)
• Architecture selection determines the schedule makespan
• We choose an architecture and optimize schedule makespan for it
• Heterogeneous application, architecture, and constraints
Task2
Task1
Task3
Task2
Task1
Task3
PU 1
PU 2PU 1
timetime
15
Schedule length versus cost (I)
• Uses meta search heuristics to search only part of the design space search – partial search
ExploreArchitecture
selection
ExploreAssignment
ExploreScheduling
1st best solution
16
Schedule length versus cost (I)
• Divide and Conquer based on consecutive refinement
2nd best solution
ExploreArchitecture
selection
ExploreAssignment
ExploreScheduling
17
Schedule length versus cost (I)
• Each exploration step uses results from previous steps
3rd best solution
ExploreArchitecture
selection
ExploreAssignment
ExploreScheduling
19
Memory versus Execution Time (II)
• Faster execution usually requires more data memory
DataT2
DataT3
Mem
ory
Ad
dre
ss
DataT1
DataT2
DataT3
DataT1M
em
ory
Ad
dre
ss
time
time
ParallelExecution
SequentialExecution
20
Data3
Memory versus Execution Time (II)
• Scheduling with data memory placement so memory fragmentation problem is taken into account
Data2
Data1Mem
ory
Ad
dre
ss
time
21
Memory versus Execution Time (II)
• Scheduling with data memory placement so memory fragmentation problem is taken into account
Data3
Data2
Data1Mem
ory
Ad
dre
ss
time
22
Memory versus Execution Time (II)
• Adaptive and estimate guided heuristic (criteria memory consumption or execution time)
• Look-ahead and backtracking capabilities
Memory bottleneck?
Reduce Execution Time
Reduce Memory Usage(backtrack, consume
data)
No Yes
23
Memory versus Execution Time (II)
• Algorithmic pipelining to improve throughput
MemorySequential
Data2
Data1
Data3
Data2
Data1
Data3
Data2
Data1
Data3
Mem
ory
Pip
elin
ing
time
Data2
Data1
Data3
24
Partial Assignment Technique (III)
• Reduce the problem size to simplify task assignment and task scheduling
• Clustering – simplifies the model
T1T2
T3
T4
T5
T7
T8T6
T1T2&T4
T3&T5
T7
T6&T8
25
Partial Assignment Technique (III)
• Clustering can cause deadlock, not all groups of tasks are allowed
• Linear groups of tasks are not optimal any longer if resources such as memories are present
T1
T2
T3
T1&T3
T2
26
Partial Assignment Technique (III)
• Problem simplification through adding constraints; not model simplification
• No deadlock problem, more refine simplifications
• Better than clustering with linear-clusters
T1
T2
T3
T1
T2
T3
PT1= PT3
27
Memory Bandwidth
• A major bottleneck in many data-dominated applications
• Processor often waits for data – latency or bandwidth
• Actual bandwidth depends on access patterns and data assignment
• Higher bandwidth? – more memories– better utilization
28
Memory Architecture (IV-V)
• Most significant resource
• Bandwidth bottleneck
• Energy and timing considerations
• Complex memories such as SDRAM
• Multiple memories
::
row n row n
row 1 row 1
page 1
page 2
SDRAM
port
Bank 1 Bank 2
29
Memory model for SDRAM (IV-V)
• considered as a resource since– Fixed maximal size– Fixed number of page buffers– Fixed maximal bandwidth
B1 B2 B3 B4
………. ……….S1 S2 S3 S4
P1 P2
T1
T2
Time Window
time
32
Energy vs. Execution Time (IV-V)en
ergy
time
exploration
Application Pareto Diagram
D1 D2
D3
D4 D5
D6
D7
SDRAM 1 SDRAM 2
36
Conflict Graph (IV-V)
• Specifies assignment constraints for different tasks’ execution options
• Memory/Page conflict edge
• Memory/Bank compatibility edge
SDRAM Y
SDRAM X
memorycompatibility
pageconflict
memoryconflict
37
Composition (IV)en
ergy
time
exploration
Application Pareto Diagram
SDRAM 1 SDRAM 2
D1 D2
D3
D4 D5
D6
D7
39
Energy vs. Execution Time (IV)
• Heuristic for trading bandwidth and assignment constraints between tasks to achieve efficient application execution
• Scheduling estimates, data assignment feasibility check
• Memory oriented application model (e.g. SDRAM)
…
time time time
en
erg
y
en
erg
y
en
erg
y
40
SchedulingOptimizatio
n
Iterative Optimization (V)M
em
ory
Str
uct
ure
Assignment
Optimization
Ap
plic
ati
on
Task
G
rap
h
ParetoCompositio
n
weightsadjustment
Non valid CGComposition
removal
Sub
op
tim
al
Poin
ts r
em
oval
Parallelizationconstraints
Ap
plic
ati
on
Pare
to D
iag
ram
41
Summary of CP applications in EDA
• We considered data-dominated applications and memory issues
• We showed that CP framework can be efficiently used as an optimization framework for modeling and solving embedded system synthesis problems
• We proposed and evaluated different techniques, heuristics, and models for system level synthesis (e.g., PAT)
• We addressed resource and optimization trade-offs
43
Outline
• Basics
• Features
• Marketing stuff
• Typical misconceptions
• Applications
• Licensing
44
Basics
• written by only two people (Krzysztof Kuchcinski and Radoslaw Szymanek)
• entirely based on Java
• the process of developing JaCoP began in 2001
• it is under continuous development
• it has slightly above 20 thousands lines of code
• there is no GUI (just core engine)
45
Basics
• it can be easily plug-in in any other Java based application
• it has reasonable performance
• scheduling application problems from electronic design automation industry influenced development
• it has global constraints (scheduling related)
• it was already used in several research facilities
46
Features
• global constraints - alldifferent, cumulative, diff2, element, and circuit - often available in different flavors (gcc in plans ;))
• it is rather simple to extend
• Application Programming Interface (API) generated using JavaDoc is available
• small JaCoP guide is also available
47
Marketing stuff ;)
• it has simple and convenient API
• it was already tested in different situations (quite robust)
• small footprint (around 200k in jar file)
• it keeps getting better ;)
• great vehicle for research
• complex data structures/reuse of already computed information
48
Typical misconceptions
• it is Java based so it must be really slow (garbage collector is a sweet thing)
• it must be hard to extend it (extensions are easier than with other industrial solvers and extensions can be more efficiently implemented)
• it must be hard to learn (experience with any other solver suffice, it is used for teaching purposes in Sweden and Poland)
49
Applications
• JaCoP authors own research published at EDA conferences and journals (scheduling problems)
• Los Alamos National Laboratory (synthesis of FPGA based designs)
• Kunliga Tekniska Hogskola (KTH) research in the field of Network on Chip (NoC)
• First industrial application on the way ;)
50
Licensing
• It is free for research and it will always be ;)
• commercial applications require fee per contract basis (at least 25euro AND at least 1% contract value), paid when contract is realized
• any further distribution requires notification of JaCoP licensing terms
• any extension which does not required reverse engineering (code is obfuscated) and keeps JaCoP in its original form is allowed
51
Licensing (Special terms for 4C)
• the source code is available, right to modify and share source code within 4C
• No possibility to distribute your own version of JaCoP outside 4C (to keep from forking – like Java)
• authors are eager to incorporate any improvements you suggest/make so you can distribute your own application with standard JaCoP library on normal terms
52
Research Ideas (cooperation)
• Global constraints (how to efficiently compute consistency methods, iterative computation)
• Visualization and explanations within global constraints, extending constraint functionality
• SALSA type of search framework for developing search methods (coarse grain search methods)
• Any of this topic is of interest for me, few more ideas piled on the stack (playing with your own solver gives you plenty of ideas)