On-Chip On-Chip Communication Communication ArchitecturesArchitectures
Synthesis Techniques
ICS 295Sudeep Pasricha and Nikil DuttSlides based on book chapter 6
1© 2008 Sudeep Pasricha & Nikil Dutt
OutlineOutlineIntroduction
Topology Synthesis
Protocol Parameter Synthesis
Topology and Protocol Parameter Synthesis
Physically aware Synthesis
Co-synthesis with Memory
2© 2008 Sudeep Pasricha & Nikil Dutt
IntroductionIntroductionDesigning on-chip communication
architectures is becoming more and more challenging◦ increasing number of components in today's
systems translates into more inter-component communication
Multi-dimensional design constraints ◦ ↑ performance, reliability◦ ↓ power, cost, area, time-to-market
System designers need techniques that can◦ optimize for individual design goals◦ allow design decisions to provide a good balance
between other design goals
3© 2008 Sudeep Pasricha & Nikil Dutt
IntroductionIntroduction Exploration and synthesis
techniques can broadly be classified into 3 categories: ◦ Static, dynamic, hybrid
Commercial toolkits available for standard bus architectures, ◦ AMBA Designer/Design Kit ◦ STBus GenKit◦ Sonics Studio
Not very useful for automating exploration and synthesizing communication architectures that satisfy diverse design constraints
4© 2008 Sudeep Pasricha & Nikil Dutt
IntroductionIntroductionBus Architecture Synthesis:
◦ process of designing a bus architecture topology and/or its protocol parameters to satisfy application constraints
5© 2008 Sudeep Pasricha & Nikil Dutt
S1S1
S3S3
S2S2
MEM3MEM3M3M3
MEM2MEM2
M2M2
CPU1CPU1
MEM1MEM1
S4S4
M2M2
CPU1CPU1
S1S1
MEM3MEM3
MEM2aMEM2a
S3S3
S2S2
periphmain1
bridgebridge
MEM1MEM1 S4S4
MEM2bMEM2b
main2
M3M3
bridge bridge
bridge bridge
main3
bridgebridge
Bus ArchitectureSynthesis
Constraints-Performance-Power-Cost-Area-reliability M2M2
CPU1CPU1
S1S1
MEM3MEM3
MEM2aMEM2a
S3S3
S2S2
periph
MEM1MEM1 S4S4
MEM2bMEM2b
main1
M3M3
bridge bridge
main2
bridgebridgeM2M2
CPU1CPU1
S1S1
MEM3MEM3
MEM2aMEM2a
S3S3
S2S2
periph
MEM1MEM1 S4S4
MEM2bMEM2b
main1
M3M3
bridgebridge
M2M2
CPU1CPU1S1S1
MEM3MEM3
MEM2aMEM2a
S3S3
S2S2
periph
MEM1MEM1
S4S4
MEM2bMEM2b
main1
M3M3
bridge bridge
main2
bridgebridge
M2M2 CPU1CPU1
S1S1
MEM3MEM3
MEM2aMEM2a
S3S3
S2S2
periph
MEM1MEM1
S4S4
MEM2bMEM2b
main1
M3M3
bridge bridge
main2
bridgebridge
Topology Space
Arbitration strategy
Data bus widths
Bus clock frequencies
Buffer sizes
Parameter Space
OutlineOutlineIntroduction
Topology Synthesis
Protocol Parameter Synthesis
Topology and Protocol Parameter Synthesis
Physically aware Synthesis
Co-synthesis with Memory
6© 2008 Sudeep Pasricha & Nikil Dutt
Topology SynthesisTopology SynthesisTopology of a bus-based on-chip communication
architecture determines◦ number of buses in the system◦ manner in which they are interconnected to each other◦ how components are allocated to the buses
Early work focused on allocating inter-component comm. to buses for distributed real-time embedded systems◦ Yen et al.[ICCAD ‘95] proposed techniques to estimate
comm. delay on a bus using static analysis for a system with periodic tasks assigned a PE to existing bus, or created a new bus to meet task
deadlines
◦ Ortega et al. [ICCAD ‘98] explored mapping of PEs in to a set of off-chip bus architecture configurations (shared buses or point-to-point) and protocols (such as CAN or I2C ) explored different performance vs. cost tradeoffs
7© 2008 Sudeep Pasricha & Nikil Dutt
Topology SynthesisTopology SynthesisLiveris et al. [DATE ‘04] proposed a bus
topology synthesis technique to reduce bus power consumption while meeting latency constraints◦ AMBA AHB bus architecture◦ Simple FIFO arbitration◦ Dynamic power reduction ◦ Switching activity α is taken as 0.5 for data bus, and
a lower value for address bus control wire switching is ignored
◦ Each master has a latency constraint that determines number of cycles available to complete a communication operation
8© 2008 Sudeep Pasricha & Nikil Dutt
Topology SynthesisTopology SynthesisTo improve latency response of
communication architecture and also reduce power consumption on the bus wires, Liveris et al. proposed using 3 different topology transformations
9© 2008 Sudeep Pasricha & Nikil Dutt
Topology SynthesisTopology SynthesisPrivate slave creation
◦ making a slave private to a master is possible if the master is the only one accessing the slave
◦ removes a slave from the shared bus, which reduces the fanout by one for all the signals driven by the AMBA logic
10© 2008 Sudeep Pasricha & Nikil Dutt
Topology SynthesisTopology SynthesisSlave isolation
◦ Moving a slave to another layer
11© 2008 Sudeep Pasricha & Nikil Dutt
Topology SynthesisTopology SynthesisGrouping masters
◦ Moving masters to another layer to reduce arbitration conflict
12© 2008 Sudeep Pasricha & Nikil Dutt
Topology SynthesisTopology SynthesisSynthesis heuristic
◦ initially, all masters and slaves are mapped to a single layer◦ private slave creation transformation is applied for all eligible
slaves◦ in case a latency violation exists for a master, slave isolation
transformation is applied to the slowest slave◦ if violation persists, grouping masters transformation is
performed by transferring masters with less stringent latency requirements to a new layer
◦ once a solution that satisfies latency constraints is obtained, slave isolation and grouping masters transformations are performed to reduce power
◦ at every iteration power of current solution is calculated, by using probability-based formulations to estimate switching activity on the
wires
◦ transformations are carried out till no more improvement is obtainable 13© 2008 Sudeep Pasricha & Nikil Dutt
Topology SynthesisTopology Synthesis
14© 2008 Sudeep Pasricha & Nikil Dutt
Heuristic was implemented in C and applied to◦ Sobel Transform SoC
29.6% less power
Topology SynthesisTopology SynthesisMurali et al. [DATE ‘05] proposed a methodology
for STBus crossbar (matrix) synthesis
Compared to a full crossbar, a partial crossbar has ◦ fewer communication components (buses, arbiters,
decoders, etc.), lower area, reduced power consumptionGoal:
◦ design a minimal cost partial crossbar bus architecture for a given MPSoC application
◦ average and maximum packet latencies must lie within acceptable bounds from the latencies obtained for a full crossbar
15© 2008 Sudeep Pasricha & Nikil Dutt
Topology SynthesisTopology Synthesis
Phase 1: SystemC simulation◦ window-based traffic analysis -> window size is parametrizable
Phase 2: Preprocessing to identify◦ overlapping critical traffic streams to be mapped to separate
buses◦ targets with large traffic overlap in a window to map to separate
buses◦ max. no. of targets to be connected to a bus (to bound max.
latency)Phase 3: MILP based partial crossbar generation
16© 2008 Sudeep Pasricha & Nikil Dutt
Topology SynthesisTopology SynthesisApplied methodology to synthetic MPSoC
applications
17© 2008 Sudeep Pasricha & Nikil Dutt
Topology SynthesisTopology SynthesisThepayasuwan et al. [DATE ‘04] proposed a simulated
annealing (SA)-based approach to synthesize a hierarchical shared bus architecture topology◦ cost function accounts for criteria such as number of
buses, communication conflict, and bus utilization◦ SA based optimization depends on weights in cost
function Yoo et al. [ASPDAC ‘07] presented an SA-based
approach for synthesizing a cascaded crossbarTopology synthesis for segmented bus was presented
by Guo et al. [ASPDAC ‘06] to◦ obtain a solution with minimum wire energy◦ generate a set of solutions to trade-off chip area, energy,
delay
18© 2008 Sudeep Pasricha & Nikil Dutt
OutlineOutlineIntroduction
Topology Synthesis
Protocol Parameter Synthesis
Topology and Protocol Parameter Synthesis
Physically aware Synthesis
Co-synthesis with Memory
19© 2008 Sudeep Pasricha & Nikil Dutt
Protocol Parameter Protocol Parameter SynthesisSynthesis
Bus-based communication architectures are characterized by several protocol parameters◦ bus widths, bus clock frequencies, transaction burst
sizes, arbitration schemes, buffer sizesProtocol parameter synthesis determines values
for one or more parameter for a fixed topology◦ while satisfying constraints of the application
Early work in protocol parameter synthesis focused on determining bus width◦ Narayan et al. [DATE ‘94]
for simple shared bus architecture trade-off bus width with system performance no arbitration assumed; traffic conflict on shared bus
ignored
20© 2008 Sudeep Pasricha & Nikil Dutt
Protocol Parameter Protocol Parameter SynthesisSynthesisLahiri et al. [ICCAD ’00] proposed an
approach to determine bus protocol parameters as well as component mapping on buses to improve performance
21© 2008 Sudeep Pasricha & Nikil Dutt
Protocol Parameter Protocol Parameter SynthesisSynthesis
22© 2008 Sudeep Pasricha & Nikil Dutt
Protocol Parameter Protocol Parameter SynthesisSynthesis Step 1: Co-simulate entire system ◦ assuming completely parallel (conflict-free) comm.
between cores◦ generate execution traces
Step 2: save traces as a comm. analysis graph (CAG) Step 3: Performance analysis to generate comm.
graph (CG)◦ Represents statistics gathered by performance analysis◦ Single weight derived for each edge
23© 2008 Sudeep Pasricha & Nikil Dutt
Protocol Parameter Protocol Parameter SynthesisSynthesis
Step 4: Generate initial component mapping to buses◦ analyze CG◦ calculate demand from component on comm. architecture
demand of component = sum of weights of outgoing edges
◦ arrange components in a descending order of demand◦ rank buses in comm. architecture by analyzing topology
template higher rank is given to buses that have higher performance and are
well connected to the rest of the buses
◦ Select highest ranked component and map to bus with maximum interaction level; repeat till no more components left
Step 5: Generate initial protocol parameters◦ High arbitration priority for higher ranked component◦ Maximum block transfer size calculated as weighted average
of the size of transactions between components on the bus
24© 2008 Sudeep Pasricha & Nikil Dutt
Protocol Parameter Protocol Parameter SynthesisSynthesis
Step 7: Generate transformations/moves to improve performance◦ Create communication conflict graph (CCG) where edges
between components represent communication overlap◦ Changed congestion levels used to recalculate time taken for
transactions◦ Move with maximum time reduction (potential gain) is selected◦ Repeat till no more improvement possible
25© 2008 Sudeep Pasricha & Nikil Dutt
Protocol Parameter Protocol Parameter SynthesisSynthesisExperimental results
◦ ATM: cell forwarding unit of an output queued ATM switch, with a fixed topology having three buses connected by two bridges
◦ SYS: simple communication system with two buses connected by a single bridge
26© 2008 Sudeep Pasricha & Nikil Dutt
Protocol Parameter Protocol Parameter SynthesisSynthesis
Shin et al. [DATE ‘04] proposed a methodology to automatically determine slot schedule for a time division multiple access (TDMA)-based arbitration scheme
27© 2008 Sudeep Pasricha & Nikil Dutt
Protocol Parameter Protocol Parameter SynthesisSynthesisObjective function
◦ To meet throughput requirements for masters
28© 2008 Sudeep Pasricha & Nikil Dutt
Protocol Parameter Protocol Parameter SynthesisSynthesisObjective function
◦ To meet throughput and latency requirements for masters
29© 2008 Sudeep Pasricha & Nikil Dutt
Protocol Parameter Protocol Parameter SynthesisSynthesisExperimental results
◦ Best results with following GA parameters: crossover rate of 70%, mutation rate of 25%, population size of 80%
30© 2008 Sudeep Pasricha & Nikil Dutt
OutlineOutlineIntroduction
Topology Synthesis
Protocol Parameter Synthesis
Topology and Protocol Parameter Synthesis
Physically aware Synthesis
Co-synthesis with Memory
31© 2008 Sudeep Pasricha & Nikil Dutt
Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis
Unlike previous approaches, a few approaches consider both topology and protocol parameter synthesis simultaneously◦ more comprehensive synthesis
Pandey et al. [FPLA ‘05] proposed a technique to simultaneously synthesize hierarchical shared bus topology and width of data buses ◦ while satisfying the performance constraints◦ using integer linear programming (ILP) formulation
Pasricha et al. [ASPDAC ‘05] proposed a technique to automate synthesis of hierarchical bus topology and multiple protocol parameters ◦ data bus widths, bus clock speeds, OO buffer sizes, DMA burst
sizes◦ using several heuristics
32© 2008 Sudeep Pasricha & Nikil Dutt
Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis
Pasricha et al. [ASPDAC ‘06] proposed automated topology and parameter synthesis methodology for bus matrix architectures
Goal: minimal cost partial bus matrix tailored to application◦ Has fewer busses (consequently fewer arbiters, decoders,
buffers)◦ Maximizes bus utilization◦ Reduces implementation cost, area and power dissipation
33© 2008 Sudeep Pasricha & Nikil Dutt
Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis
MPSoC designs have performance constraints that can be represented in terms of Data Throughput Constraints
Communication Throughput Graph, CTG = G(V,A) incorporates SoC components and throughput constraints
Throughput Constraint Path (TCP) is a CTG sub-graph
34© 2008 Sudeep Pasricha & Nikil Dutt
Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesisCommunication Parameter Constraint Set (Ψ)
◦ Used to ensure that approach generates realistic communication architecture
◦ constraints are in the form of a discrete set of valid values for protocol parameters to be synthesized
◦ e.g., specifying that bus clock frequency for a bus can only be multiples of 33 MHz, up to a maximum of 330 MHz
Allows designer to bias synthesis process based on knowledge of design and technology being targeted
35© 2008 Sudeep Pasricha & Nikil Dutt
Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis
36© 2008 Sudeep Pasricha & Nikil Dutt
Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis
B&B Goal: cluster slave modules to minimize matrix cost Start by clustering two slave clusters at a time
◦ Initially, each slave cluster has only one slave However, the total number of clustering configurations possible for n
slaves is nC2 + (nC2 .n-1C2) + (nC2 .n-1C2 .n-2C2) + … + (n! x (n-1)!)/2(n-1)
◦ Extremely large number for even medium sized SoCs! To quickly prune out invalid clustering configurations and converge on
an optimal solution, use a powerful bounding function Bounding function
◦ Called after every clustering operation◦ Uses lookup table to discard duplicate clustering ops◦ Discards all non-beneficial clustering ops (i.e. no savings in no. of busses)◦ Discards incompatible clustering ops
e.g. mergers of busses with conflicting bus speeds
◦ Discards clustering which cannot theoretically support b/w requirements
37© 2008 Sudeep Pasricha & Nikil Dutt
Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesisExperimental results on four MPSoC
applications from the networking domain
Significant matrix component savings◦ 4.6x to 9x when compared with a full bus matrix
38© 2008 Sudeep Pasricha & Nikil Dutt
Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis
Methodology extended by Pasricha et al. [CODES+ISSS ‘06] to synthesize bus matrix topology and protocol parameters ◦ with the incorporation of energy estimation models
for bus wires and bus logic componentsGoal: generate multiple candidate bus matrix
solutions, on which to perform a power-performance trade-off analysis
Methodology applied to an MPSoC application
39© 2008 Sudeep Pasricha & Nikil Dutt
Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis
40© 2008 Sudeep Pasricha & Nikil DuttCTG
Results
Up to 20% in power and 40% in performance possible trade-off
Up to 8% in runtime and 15% in energy possible trade-off
Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesisPasricha et al. [VLSID ‘08] further extended
this synthesis methodology by incorporating a PVT (process, voltage, temperature) variation aware power estimation technique
Incorporating PVT variation-awareness in the system level bus matrix synthesis technique resulted in a set of curves for power and energy in the trade-off graph outputs◦ instead of a single curve for power and energy
Allowed for a more accurate power characterization in the face of PVT variations early in the design flow◦ enabling designers to make more informed decisions
when selecting a bus matrix configuration
41© 2008 Sudeep Pasricha & Nikil Dutt
OutlineOutlineIntroduction
Topology Synthesis
Protocol Parameter Synthesis
Topology and Protocol Parameter Synthesis
Physically aware Synthesis
Co-synthesis with Memory
42© 2008 Sudeep Pasricha & Nikil Dutt
Physically-aware SynthesisPhysically-aware SynthesisMost synthesis approaches design the communication
architecture without considering physical implementation issues that can influence performance◦ such as the layout of the components on the chip or the lengths
and routing of the bus wires interconnecting componentsPhysical level information can be extremely important
to guarantee that the synthesis results are reliableHowever, such physical level information is typically
available much later in the design flow◦ challenging to abstract up this information to early in the design
flow during communication architecture designA few approaches have looked at this problem of
physically aware synthesis
43© 2008 Sudeep Pasricha & Nikil Dutt
Physically-aware SynthesisPhysically-aware Synthesis
Dick et al. [DATE ‘99] proposed physically aware topology synthesis technique to ensure hard real-time communication deadlines between components were satisfied◦ used a high level floorplanner to create a block placement,
and estimate global wiring delays◦ genetic algorithm (GA) was used to iterate over
different bus topology configurations having low contention task assignments on components
Drinic et al. [ICCAD ‘00] and Meguerdichian et al. [DAC ‘01] used a high level floorplanner to determine design feasibility during bus topology synthesis◦ compared estimates of wire length with upper bound on wire
length◦ does not account for varying capacitive loads of components
on a bus 44© 2008 Sudeep Pasricha & Nikil Dutt
Physically-aware Physically-aware SynthesisSynthesis
Thepayasuwan et al. [ICCD ‘03] proposed a topology synthesis framework that used a high level floorplanner to obtain wire lengths◦ lengths are incorporated into an SA cost function that is used to
synthesize bus topology◦ SA minimizes the cost function, and selects a topology solution
with low total wire length
Guo et al. [ASPDAC ‘06] used a high level floorplanner during segmented bus topology synthesis◦ floorplanner aims to reduce length of critical wires with high
switching activity to reduce wire energy consumption
Pasricha et al. [CODES+ISSS ‘06] used a high level floorplanner to obtain wire length for estimating wire energy◦ during bus matrix topology and parameter synthesis
45© 2008 Sudeep Pasricha & Nikil Dutt
Physically-aware SynthesisPhysically-aware Synthesis Pasricha et al. [DAC ‘05] proposed physically aware hierarchical
bus topology and protocol parameter synthesis technique (FABSYN)◦ detects and eliminates clock cycle timing violations
To meet performance constraints, bus clk speed set to 333 MHz (3 ns cycle time)
After layout, signal delay 3.5 ns, which violates 3 ns clock timing constraint!◦ adverse effect on cost, complexity, constraint satisfiability
To eliminate such violations, designers use repeaters, pipeline elements◦ can severely affect performance, power◦ requires considerable manual RTL re-work, re-verification 46© 2008 Sudeep Pasricha & Nikil Dutt
IP1
IP2
DMAC
ASIC1
MEM4
MEM1
ARM
ITCM
DTCMMEM2
MEM3
ASIC2
SoC floorplan
Physically-aware SynthesisPhysically-aware Synthesis
47© 2008 Sudeep Pasricha & Nikil Dutt
Physically-aware SynthesisPhysically-aware SynthesisSimple bus mapping
48© 2008 Sudeep Pasricha & Nikil Dutt
Busmapping
Physically-aware SynthesisPhysically-aware SynthesisMutate topology
49© 2008 Sudeep Pasricha & Nikil Dutt
Create new bus
and/or migrate IPs
Physically-aware SynthesisPhysically-aware SynthesisMutate topology
50© 2008 Sudeep Pasricha & Nikil Dutt
Create new bus
and/or migrate IPs
Physically-aware Physically-aware SynthesisSynthesis
If a timing violation is detected ◦ TCPs that have components on buses with violations flagged◦ feedback loop is used to go back and attempt to eliminate
violations◦ first the TCP that has components on the violated bus with the
largest load capacitance on its pins is selected from the flagged TCPs since cumulative capacitive load of components directly contributes to
increasing signal propagation delay
◦ the components are iteratively migrated to another existing bus or a new bus if migration to existing buses causes TCP constraint
violations
◦ If there is still a violation, another flagged TCP is selected and its components migrated away from the violated bus
◦ Another way used to eliminate clock cycle violations is to reduce bus clock frequency increases cycle times
51© 2008 Sudeep Pasricha & Nikil Dutt
Physically-aware Physically-aware SynthesisSynthesisSynthesized hierarchical bus architecture
52© 2008 Sudeep Pasricha & Nikil Dutt
Parameter Values
main1 main2 main3 periph
bus width 32 32 32 32
bus speed 133 133 133 66
arb priority CPU1 > M3 > M2 (static)
Physically-aware SynthesisPhysically-aware SynthesisExperimental study
53© 2008 Sudeep Pasricha & Nikil Dutt
CTG
Constraint Set
Physically-aware SynthesisPhysically-aware Synthesis
54© 2008 Sudeep Pasricha & Nikil Dutt
Physically-aware Physically-aware SynthesisSynthesis
Quality of the FABSYN synthesis solution was compared with other synthesis approaches◦ Initial: solution with just 2 buses (initial mapping)◦ ABS: synthesis approach without integrated
floorplanners◦ Manual: designer driven manual synthesis approach
with floorplanner
55© 2008 Sudeep Pasricha & Nikil Dutt
OutlineOutlineIntroduction
Topology Synthesis
Protocol Parameter Synthesis
Topology and Protocol Parameter Synthesis
Physically aware Synthesis
Co-synthesis with Memory
56© 2008 Sudeep Pasricha & Nikil Dutt
Co-synthesis with MemoryCo-synthesis with MemoryMemory can take up a large chunk of on-chip area, as
much as 70% in some cases◦ Estimates indicate that this will go up to 90% in coming years
Variety of different memory types available to satisfy storage requirements in MPSoC applications◦ DRAMs, SRAMs, EPROMs, EEPROMs etc.
Typically ◦ DRAMs -> larger memory requirements, slower, cheaper◦ SRAMs -> smaller memory requirements, faster, expensive◦ EPROMs and EEPROMs -> read-only data
Several tradeoffs during memory architecture synthesis◦ SRAM vs. DRAM
cost vs. performance vs. area◦ ports vs. number of memory blocks
57© 2008 Sudeep Pasricha & Nikil Dutt
Co-synthesis with MemoryCo-synthesis with MemoryMemory architecture synthesis determines the
◦ number, type, size of the memories in the system◦ application data mapping to memories
Memory architecture significantly contributes to data traffic on communication architectures
Design of memory architecture has a substantial influence on communication architecture design
Traditionally, in platform-based design, memory synthesis is performed before communication architecture synthesis◦ can lead to inferior design decisions
58© 2008 Sudeep Pasricha & Nikil Dutt
Co-synthesis with MemoryCo-synthesis with Memory
59© 2008 Sudeep Pasricha & Nikil Dutt
Motivational study (Pasricha et al. [DATE ‘06])
MPSoC memory and comm. architecture synthesis
Co-synthesis
Separatesynthesis
Co-synthesis with MemoryCo-synthesis with Memory Shalan et al. [SASIMI ‘03] proposed a tool to automatically
generate a full crossbar and a dynamic memory management unit
Grun et al. [DATE ‘02] considered the system connectivity topology early in the design flow, in conjunction with memory exploration, for simple processor–memory systems◦ most active access patterns extracted from application data structures◦ different memory architecture configurations that can match needs of
access patterns are obtained, assuming a simple connectivity model◦ next, different comm. architectures are considered for these memory
architecture configurations, and the most suitable interconnect and memory architecture is selected from a pareto-optimal curve
Srinivasan et al. [DATE ‘05] presented an approach to simultaneously consider bus topology splitting and memory bank partitioning during synthesis◦ with the goal of reducing system energy
60© 2008 Sudeep Pasricha & Nikil Dutt
Co-synthesis with MemoryCo-synthesis with Memory Pasricha et al. [DATE ‘06] proposed the COSMECA
methodology for memory and comm. architecture synthesis◦ Synthesize bus matrix topology and protocol parameters
Goal: obtain a least cost system, having minimal number of buses while satisfying performance and memory area constraints
COSMECA selects memory blocks from a library populated by several types of memories◦ on-chip SRAMs, DRAMs, EPROMs, EEPROMs, …
Each memory type can have variants in library, having different ◦ capacities, areas, ports, operating frequencies and access
times Memory synthesis in COSMECA
◦ selects appropriate physical memories from library◦ maps application arrays, scalars to physical memories selected
from library
61© 2008 Sudeep Pasricha & Nikil Dutt
Co-synthesis with MemoryCo-synthesis with MemoryApplication memory requirements are
initially represented by abstract data blocks (DBs) in a CTG
DBs are initially grouped together into virtual memories
62© 2008 Sudeep Pasricha & Nikil Dutt
Co-synthesis with MemoryCo-synthesis with MemoryDBs are merged at this initial step only if they
have◦ similar edges (i.e., edges from the same masters) and◦ non-overlapping access
Subsequently, the enhanced CTG with VMs is used as an input to a branch and bound based bus matrix synthesis framework to generate minimal cost solution
63© 2008 Sudeep Pasricha & Nikil Dutt
Co-synthesis with MemoryCo-synthesis with Memory Heuristic used to map VMs to physical memories from
library◦ finds N solutions that satisfy memory area and
performance constraints of design Generate memory access traces that are used to
determine the extent of access overlap of VMs at each slave access point (SAP)◦ after simulating best solution
If the overlap is below a user defined overlap threshold T, the VMs are merged
64© 2008 Sudeep Pasricha & Nikil Dutt
Co-synthesis with MemoryCo-synthesis with MemoryVMs are then mapped to physical memories from library Initially, best memory from the library is selected for a VM
that fits capacity requirements and has max. port bandwidth If performance constraints are not met even for the memory
with best performance, the matrix solution is discarded◦ the next best matrix solution from the set of (ranked) matrix solutions
is selected If performance constraints and memory area constraints are
met, the solution is added to the final solution databaseNext, to lower memory area, VMs at SAPs are randomly
selected and the mapped physical memory replaced with one that meets capacity requirements and has lower area◦ If violation detected, then move is reversed, otherwise solution is kept◦ Procedure repeated iteratively till N solutions obtained
65© 2008 Sudeep Pasricha & Nikil Dutt
Co-synthesis with MemoryCo-synthesis with MemoryExperiments with MPSoC applications
◦ Shown below: PYTHON application synthesis
66© 2008 Sudeep Pasricha & Nikil Dutt
Co-synthesis with MemoryCo-synthesis with MemoryTrade-off curve between number of buses and
memory area
Impact of threshold value
67© 2008 Sudeep Pasricha & Nikil Dutt
Co-synthesis with MemoryCo-synthesis with Memory COSMECA saves 25–40% in the number of buses in the
matrix and from 17–29% in memory area compared to traditional approach
68© 2008 Sudeep Pasricha & Nikil Dutt
Co-synthesis with MemoryCo-synthesis with MemoryMeyer et al. [CODES+ISSS ‘07] attempted to extend
COSMECA by adding layout-awareness during co-synthesis◦ co-synthesis is performed using a SA-based algorithm
Results indicate 20–27% cost reduction for a synthetic DSP software pipeline case study by using the approach ◦ compared to an approach that separately allocates
memory and synthesizes busesA few limitations
◦ Only bus topology synthesis is performed – bus parameter synthesis is neglected
◦ memory synthesis does not consider different memory types - only SRAM memories are supported
69© 2008 Sudeep Pasricha & Nikil Dutt
SummarySummaryDesigners need techniques that can efficiently explore the
increasingly intractable comm. architecture design space◦ to satisfy and optimize constraints during comm. architecture design
Presented research on techniques for efficient bus-based communication architecture synthesis◦ Scope to extend synthesis techniques for emerging applications
A lot of open problems still remain to be solved, especially in the areas of low level physical and circuit level synthesis approaches (refer book chapter for more details)◦ wire metal layer assignment◦ wire sizing optimization◦ inductance estimation◦ timing-driven floorplanning◦ shield wire insertion algorithms
© 2008 Sudeep Pasricha & Nikil Dutt 70
71© 2008 Sudeep Pasricha & Nikil Dutt
Top Related