On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt...

71
On-Chip Communication On-Chip Communication Architectures Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1 © 2008 Sudeep Pasricha & Nikil Dutt

Transcript of On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt...

Page 1: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

On-Chip On-Chip Communication Communication ArchitecturesArchitectures

Synthesis Techniques

ICS 295Sudeep Pasricha and Nikil DuttSlides based on book chapter 6

1© 2008 Sudeep Pasricha & Nikil Dutt

Page 2: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

OutlineOutlineIntroduction

Topology Synthesis

Protocol Parameter Synthesis

Topology and Protocol Parameter Synthesis

Physically aware Synthesis

Co-synthesis with Memory

2© 2008 Sudeep Pasricha & Nikil Dutt

Page 3: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

IntroductionIntroductionDesigning on-chip communication

architectures is becoming more and more challenging◦ increasing number of components in today's

systems translates into more inter-component communication

Multi-dimensional design constraints ◦ ↑ performance, reliability◦ ↓ power, cost, area, time-to-market

System designers need techniques that can◦ optimize for individual design goals◦ allow design decisions to provide a good balance

between other design goals

3© 2008 Sudeep Pasricha & Nikil Dutt

Page 4: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

IntroductionIntroduction Exploration and synthesis

techniques can broadly be classified into 3 categories: ◦ Static, dynamic, hybrid

Commercial toolkits available for standard bus architectures, ◦ AMBA Designer/Design Kit ◦ STBus GenKit◦ Sonics Studio

Not very useful for automating exploration and synthesizing communication architectures that satisfy diverse design constraints

4© 2008 Sudeep Pasricha & Nikil Dutt

Page 5: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

IntroductionIntroductionBus Architecture Synthesis:

◦ process of designing a bus architecture topology and/or its protocol parameters to satisfy application constraints

5© 2008 Sudeep Pasricha & Nikil Dutt

S1S1

S3S3

S2S2

MEM3MEM3M3M3

MEM2MEM2

M2M2

CPU1CPU1

MEM1MEM1

S4S4

M2M2

CPU1CPU1

S1S1

MEM3MEM3

MEM2aMEM2a

S3S3

S2S2

periphmain1

bridgebridge

MEM1MEM1 S4S4

MEM2bMEM2b

main2

M3M3

bridge bridge

bridge bridge

main3

bridgebridge

Bus ArchitectureSynthesis

Constraints-Performance-Power-Cost-Area-reliability M2M2

CPU1CPU1

S1S1

MEM3MEM3

MEM2aMEM2a

S3S3

S2S2

periph

MEM1MEM1 S4S4

MEM2bMEM2b

main1

M3M3

bridge bridge

main2

bridgebridgeM2M2

CPU1CPU1

S1S1

MEM3MEM3

MEM2aMEM2a

S3S3

S2S2

periph

MEM1MEM1 S4S4

MEM2bMEM2b

main1

M3M3

bridgebridge

M2M2

CPU1CPU1S1S1

MEM3MEM3

MEM2aMEM2a

S3S3

S2S2

periph

MEM1MEM1

S4S4

MEM2bMEM2b

main1

M3M3

bridge bridge

main2

bridgebridge

M2M2 CPU1CPU1

S1S1

MEM3MEM3

MEM2aMEM2a

S3S3

S2S2

periph

MEM1MEM1

S4S4

MEM2bMEM2b

main1

M3M3

bridge bridge

main2

bridgebridge

Topology Space

Arbitration strategy

Data bus widths

Bus clock frequencies

Buffer sizes

Parameter Space

Page 6: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

OutlineOutlineIntroduction

Topology Synthesis

Protocol Parameter Synthesis

Topology and Protocol Parameter Synthesis

Physically aware Synthesis

Co-synthesis with Memory

6© 2008 Sudeep Pasricha & Nikil Dutt

Page 7: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology SynthesisTopology SynthesisTopology of a bus-based on-chip communication

architecture determines◦ number of buses in the system◦ manner in which they are interconnected to each other◦ how components are allocated to the buses

Early work focused on allocating inter-component comm. to buses for distributed real-time embedded systems◦ Yen et al.[ICCAD ‘95] proposed techniques to estimate

comm. delay on a bus using static analysis for a system with periodic tasks assigned a PE to existing bus, or created a new bus to meet task

deadlines

◦ Ortega et al. [ICCAD ‘98] explored mapping of PEs in to a set of off-chip bus architecture configurations (shared buses or point-to-point) and protocols (such as CAN or I2C ) explored different performance vs. cost tradeoffs

7© 2008 Sudeep Pasricha & Nikil Dutt

Page 8: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology SynthesisTopology SynthesisLiveris et al. [DATE ‘04] proposed a bus

topology synthesis technique to reduce bus power consumption while meeting latency constraints◦ AMBA AHB bus architecture◦ Simple FIFO arbitration◦ Dynamic power reduction ◦ Switching activity α is taken as 0.5 for data bus, and

a lower value for address bus control wire switching is ignored

◦ Each master has a latency constraint that determines number of cycles available to complete a communication operation

8© 2008 Sudeep Pasricha & Nikil Dutt

Page 9: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology SynthesisTopology SynthesisTo improve latency response of

communication architecture and also reduce power consumption on the bus wires, Liveris et al. proposed using 3 different topology transformations

9© 2008 Sudeep Pasricha & Nikil Dutt

Page 10: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology SynthesisTopology SynthesisPrivate slave creation

◦ making a slave private to a master is possible if the master is the only one accessing the slave

◦ removes a slave from the shared bus, which reduces the fanout by one for all the signals driven by the AMBA logic

10© 2008 Sudeep Pasricha & Nikil Dutt

Page 11: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology SynthesisTopology SynthesisSlave isolation

◦ Moving a slave to another layer

11© 2008 Sudeep Pasricha & Nikil Dutt

Page 12: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology SynthesisTopology SynthesisGrouping masters

◦ Moving masters to another layer to reduce arbitration conflict

12© 2008 Sudeep Pasricha & Nikil Dutt

Page 13: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology SynthesisTopology SynthesisSynthesis heuristic

◦ initially, all masters and slaves are mapped to a single layer◦ private slave creation transformation is applied for all eligible

slaves◦ in case a latency violation exists for a master, slave isolation

transformation is applied to the slowest slave◦ if violation persists, grouping masters transformation is

performed by transferring masters with less stringent latency requirements to a new layer

◦ once a solution that satisfies latency constraints is obtained, slave isolation and grouping masters transformations are performed to reduce power

◦ at every iteration power of current solution is calculated, by using probability-based formulations to estimate switching activity on the

wires

◦ transformations are carried out till no more improvement is obtainable 13© 2008 Sudeep Pasricha & Nikil Dutt

Page 14: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology SynthesisTopology Synthesis

14© 2008 Sudeep Pasricha & Nikil Dutt

Heuristic was implemented in C and applied to◦ Sobel Transform SoC

29.6% less power

Page 15: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology SynthesisTopology SynthesisMurali et al. [DATE ‘05] proposed a methodology

for STBus crossbar (matrix) synthesis

Compared to a full crossbar, a partial crossbar has ◦ fewer communication components (buses, arbiters,

decoders, etc.), lower area, reduced power consumptionGoal:

◦ design a minimal cost partial crossbar bus architecture for a given MPSoC application

◦ average and maximum packet latencies must lie within acceptable bounds from the latencies obtained for a full crossbar

15© 2008 Sudeep Pasricha & Nikil Dutt

Page 16: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology SynthesisTopology Synthesis

Phase 1: SystemC simulation◦ window-based traffic analysis -> window size is parametrizable

Phase 2: Preprocessing to identify◦ overlapping critical traffic streams to be mapped to separate

buses◦ targets with large traffic overlap in a window to map to separate

buses◦ max. no. of targets to be connected to a bus (to bound max.

latency)Phase 3: MILP based partial crossbar generation

16© 2008 Sudeep Pasricha & Nikil Dutt

Page 17: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology SynthesisTopology SynthesisApplied methodology to synthetic MPSoC

applications

17© 2008 Sudeep Pasricha & Nikil Dutt

Page 18: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology SynthesisTopology SynthesisThepayasuwan et al. [DATE ‘04] proposed a simulated

annealing (SA)-based approach to synthesize a hierarchical shared bus architecture topology◦ cost function accounts for criteria such as number of

buses, communication conflict, and bus utilization◦ SA based optimization depends on weights in cost

function Yoo et al. [ASPDAC ‘07] presented an SA-based

approach for synthesizing a cascaded crossbarTopology synthesis for segmented bus was presented

by Guo et al. [ASPDAC ‘06] to◦ obtain a solution with minimum wire energy◦ generate a set of solutions to trade-off chip area, energy,

delay

18© 2008 Sudeep Pasricha & Nikil Dutt

Page 19: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

OutlineOutlineIntroduction

Topology Synthesis

Protocol Parameter Synthesis

Topology and Protocol Parameter Synthesis

Physically aware Synthesis

Co-synthesis with Memory

19© 2008 Sudeep Pasricha & Nikil Dutt

Page 20: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Protocol Parameter Protocol Parameter SynthesisSynthesis

Bus-based communication architectures are characterized by several protocol parameters◦ bus widths, bus clock frequencies, transaction burst

sizes, arbitration schemes, buffer sizesProtocol parameter synthesis determines values

for one or more parameter for a fixed topology◦ while satisfying constraints of the application

Early work in protocol parameter synthesis focused on determining bus width◦ Narayan et al. [DATE ‘94]

for simple shared bus architecture trade-off bus width with system performance no arbitration assumed; traffic conflict on shared bus

ignored

20© 2008 Sudeep Pasricha & Nikil Dutt

Page 21: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Protocol Parameter Protocol Parameter SynthesisSynthesisLahiri et al. [ICCAD ’00] proposed an

approach to determine bus protocol parameters as well as component mapping on buses to improve performance

21© 2008 Sudeep Pasricha & Nikil Dutt

Page 22: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Protocol Parameter Protocol Parameter SynthesisSynthesis

22© 2008 Sudeep Pasricha & Nikil Dutt

Page 23: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Protocol Parameter Protocol Parameter SynthesisSynthesis Step 1: Co-simulate entire system ◦ assuming completely parallel (conflict-free) comm.

between cores◦ generate execution traces

Step 2: save traces as a comm. analysis graph (CAG) Step 3: Performance analysis to generate comm.

graph (CG)◦ Represents statistics gathered by performance analysis◦ Single weight derived for each edge

23© 2008 Sudeep Pasricha & Nikil Dutt

Page 24: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Protocol Parameter Protocol Parameter SynthesisSynthesis

Step 4: Generate initial component mapping to buses◦ analyze CG◦ calculate demand from component on comm. architecture

demand of component = sum of weights of outgoing edges

◦ arrange components in a descending order of demand◦ rank buses in comm. architecture by analyzing topology

template higher rank is given to buses that have higher performance and are

well connected to the rest of the buses

◦ Select highest ranked component and map to bus with maximum interaction level; repeat till no more components left

Step 5: Generate initial protocol parameters◦ High arbitration priority for higher ranked component◦ Maximum block transfer size calculated as weighted average

of the size of transactions between components on the bus

24© 2008 Sudeep Pasricha & Nikil Dutt

Page 25: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Protocol Parameter Protocol Parameter SynthesisSynthesis

Step 7: Generate transformations/moves to improve performance◦ Create communication conflict graph (CCG) where edges

between components represent communication overlap◦ Changed congestion levels used to recalculate time taken for

transactions◦ Move with maximum time reduction (potential gain) is selected◦ Repeat till no more improvement possible

25© 2008 Sudeep Pasricha & Nikil Dutt

Page 26: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Protocol Parameter Protocol Parameter SynthesisSynthesisExperimental results

◦ ATM: cell forwarding unit of an output queued ATM switch, with a fixed topology having three buses connected by two bridges

◦ SYS: simple communication system with two buses connected by a single bridge

26© 2008 Sudeep Pasricha & Nikil Dutt

Page 27: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Protocol Parameter Protocol Parameter SynthesisSynthesis

Shin et al. [DATE ‘04] proposed a methodology to automatically determine slot schedule for a time division multiple access (TDMA)-based arbitration scheme

27© 2008 Sudeep Pasricha & Nikil Dutt

Page 28: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Protocol Parameter Protocol Parameter SynthesisSynthesisObjective function

◦ To meet throughput requirements for masters

28© 2008 Sudeep Pasricha & Nikil Dutt

Page 29: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Protocol Parameter Protocol Parameter SynthesisSynthesisObjective function

◦ To meet throughput and latency requirements for masters

29© 2008 Sudeep Pasricha & Nikil Dutt

Page 30: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Protocol Parameter Protocol Parameter SynthesisSynthesisExperimental results

◦ Best results with following GA parameters: crossover rate of 70%, mutation rate of 25%, population size of 80%

30© 2008 Sudeep Pasricha & Nikil Dutt

Page 31: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

OutlineOutlineIntroduction

Topology Synthesis

Protocol Parameter Synthesis

Topology and Protocol Parameter Synthesis

Physically aware Synthesis

Co-synthesis with Memory

31© 2008 Sudeep Pasricha & Nikil Dutt

Page 32: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis

Unlike previous approaches, a few approaches consider both topology and protocol parameter synthesis simultaneously◦ more comprehensive synthesis

Pandey et al. [FPLA ‘05] proposed a technique to simultaneously synthesize hierarchical shared bus topology and width of data buses ◦ while satisfying the performance constraints◦ using integer linear programming (ILP) formulation

Pasricha et al. [ASPDAC ‘05] proposed a technique to automate synthesis of hierarchical bus topology and multiple protocol parameters ◦ data bus widths, bus clock speeds, OO buffer sizes, DMA burst

sizes◦ using several heuristics

32© 2008 Sudeep Pasricha & Nikil Dutt

Page 33: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis

Pasricha et al. [ASPDAC ‘06] proposed automated topology and parameter synthesis methodology for bus matrix architectures

Goal: minimal cost partial bus matrix tailored to application◦ Has fewer busses (consequently fewer arbiters, decoders,

buffers)◦ Maximizes bus utilization◦ Reduces implementation cost, area and power dissipation

33© 2008 Sudeep Pasricha & Nikil Dutt

Page 34: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis

MPSoC designs have performance constraints that can be represented in terms of Data Throughput Constraints

Communication Throughput Graph, CTG = G(V,A) incorporates SoC components and throughput constraints

Throughput Constraint Path (TCP) is a CTG sub-graph

34© 2008 Sudeep Pasricha & Nikil Dutt

Page 35: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesisCommunication Parameter Constraint Set (Ψ)

◦ Used to ensure that approach generates realistic communication architecture

◦ constraints are in the form of a discrete set of valid values for protocol parameters to be synthesized

◦ e.g., specifying that bus clock frequency for a bus can only be multiples of 33 MHz, up to a maximum of 330 MHz

Allows designer to bias synthesis process based on knowledge of design and technology being targeted

35© 2008 Sudeep Pasricha & Nikil Dutt

Page 36: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis

36© 2008 Sudeep Pasricha & Nikil Dutt

Page 37: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis

B&B Goal: cluster slave modules to minimize matrix cost Start by clustering two slave clusters at a time

◦ Initially, each slave cluster has only one slave However, the total number of clustering configurations possible for n

slaves is nC2 + (nC2 .n-1C2) + (nC2 .n-1C2 .n-2C2) + … + (n! x (n-1)!)/2(n-1)

◦ Extremely large number for even medium sized SoCs! To quickly prune out invalid clustering configurations and converge on

an optimal solution, use a powerful bounding function Bounding function

◦ Called after every clustering operation◦ Uses lookup table to discard duplicate clustering ops◦ Discards all non-beneficial clustering ops (i.e. no savings in no. of busses)◦ Discards incompatible clustering ops

e.g. mergers of busses with conflicting bus speeds

◦ Discards clustering which cannot theoretically support b/w requirements

37© 2008 Sudeep Pasricha & Nikil Dutt

Page 38: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesisExperimental results on four MPSoC

applications from the networking domain

Significant matrix component savings◦ 4.6x to 9x when compared with a full bus matrix

38© 2008 Sudeep Pasricha & Nikil Dutt

Page 39: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis

Methodology extended by Pasricha et al. [CODES+ISSS ‘06] to synthesize bus matrix topology and protocol parameters ◦ with the incorporation of energy estimation models

for bus wires and bus logic componentsGoal: generate multiple candidate bus matrix

solutions, on which to perform a power-performance trade-off analysis

Methodology applied to an MPSoC application

39© 2008 Sudeep Pasricha & Nikil Dutt

Page 40: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesis

40© 2008 Sudeep Pasricha & Nikil DuttCTG

Results

Up to 20% in power and 40% in performance possible trade-off

Up to 8% in runtime and 15% in energy possible trade-off

Page 41: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Topology and Protocol Parameter Topology and Protocol Parameter SynthesisSynthesisPasricha et al. [VLSID ‘08] further extended

this synthesis methodology by incorporating a PVT (process, voltage, temperature) variation aware power estimation technique

Incorporating PVT variation-awareness in the system level bus matrix synthesis technique resulted in a set of curves for power and energy in the trade-off graph outputs◦ instead of a single curve for power and energy

Allowed for a more accurate power characterization in the face of PVT variations early in the design flow◦ enabling designers to make more informed decisions

when selecting a bus matrix configuration

41© 2008 Sudeep Pasricha & Nikil Dutt

Page 42: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

OutlineOutlineIntroduction

Topology Synthesis

Protocol Parameter Synthesis

Topology and Protocol Parameter Synthesis

Physically aware Synthesis

Co-synthesis with Memory

42© 2008 Sudeep Pasricha & Nikil Dutt

Page 43: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware SynthesisPhysically-aware SynthesisMost synthesis approaches design the communication

architecture without considering physical implementation issues that can influence performance◦ such as the layout of the components on the chip or the lengths

and routing of the bus wires interconnecting componentsPhysical level information can be extremely important

to guarantee that the synthesis results are reliableHowever, such physical level information is typically

available much later in the design flow◦ challenging to abstract up this information to early in the design

flow during communication architecture designA few approaches have looked at this problem of

physically aware synthesis

43© 2008 Sudeep Pasricha & Nikil Dutt

Page 44: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware SynthesisPhysically-aware Synthesis

Dick et al. [DATE ‘99] proposed physically aware topology synthesis technique to ensure hard real-time communication deadlines between components were satisfied◦ used a high level floorplanner to create a block placement,

and estimate global wiring delays◦ genetic algorithm (GA) was used to iterate over

different bus topology configurations having low contention task assignments on components

Drinic et al. [ICCAD ‘00] and Meguerdichian et al. [DAC ‘01] used a high level floorplanner to determine design feasibility during bus topology synthesis◦ compared estimates of wire length with upper bound on wire

length◦ does not account for varying capacitive loads of components

on a bus 44© 2008 Sudeep Pasricha & Nikil Dutt

Page 45: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware Physically-aware SynthesisSynthesis

Thepayasuwan et al. [ICCD ‘03] proposed a topology synthesis framework that used a high level floorplanner to obtain wire lengths◦ lengths are incorporated into an SA cost function that is used to

synthesize bus topology◦ SA minimizes the cost function, and selects a topology solution

with low total wire length

Guo et al. [ASPDAC ‘06] used a high level floorplanner during segmented bus topology synthesis◦ floorplanner aims to reduce length of critical wires with high

switching activity to reduce wire energy consumption

Pasricha et al. [CODES+ISSS ‘06] used a high level floorplanner to obtain wire length for estimating wire energy◦ during bus matrix topology and parameter synthesis

45© 2008 Sudeep Pasricha & Nikil Dutt

Page 46: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware SynthesisPhysically-aware Synthesis Pasricha et al. [DAC ‘05] proposed physically aware hierarchical

bus topology and protocol parameter synthesis technique (FABSYN)◦ detects and eliminates clock cycle timing violations

To meet performance constraints, bus clk speed set to 333 MHz (3 ns cycle time)

After layout, signal delay 3.5 ns, which violates 3 ns clock timing constraint!◦ adverse effect on cost, complexity, constraint satisfiability

To eliminate such violations, designers use repeaters, pipeline elements◦ can severely affect performance, power◦ requires considerable manual RTL re-work, re-verification 46© 2008 Sudeep Pasricha & Nikil Dutt

IP1

IP2

DMAC

ASIC1

MEM4

MEM1

ARM

ITCM

DTCMMEM2

MEM3

ASIC2

SoC floorplan

Page 47: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware SynthesisPhysically-aware Synthesis

47© 2008 Sudeep Pasricha & Nikil Dutt

Page 48: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware SynthesisPhysically-aware SynthesisSimple bus mapping

48© 2008 Sudeep Pasricha & Nikil Dutt

Busmapping

Page 49: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware SynthesisPhysically-aware SynthesisMutate topology

49© 2008 Sudeep Pasricha & Nikil Dutt

Create new bus

and/or migrate IPs

Page 50: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware SynthesisPhysically-aware SynthesisMutate topology

50© 2008 Sudeep Pasricha & Nikil Dutt

Create new bus

and/or migrate IPs

Page 51: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware Physically-aware SynthesisSynthesis

If a timing violation is detected ◦ TCPs that have components on buses with violations flagged◦ feedback loop is used to go back and attempt to eliminate

violations◦ first the TCP that has components on the violated bus with the

largest load capacitance on its pins is selected from the flagged TCPs since cumulative capacitive load of components directly contributes to

increasing signal propagation delay

◦ the components are iteratively migrated to another existing bus or a new bus if migration to existing buses causes TCP constraint

violations

◦ If there is still a violation, another flagged TCP is selected and its components migrated away from the violated bus

◦ Another way used to eliminate clock cycle violations is to reduce bus clock frequency increases cycle times

51© 2008 Sudeep Pasricha & Nikil Dutt

Page 52: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware Physically-aware SynthesisSynthesisSynthesized hierarchical bus architecture

52© 2008 Sudeep Pasricha & Nikil Dutt

Parameter Values

main1 main2 main3 periph

bus width 32 32 32 32

bus speed 133 133 133 66

arb priority CPU1 > M3 > M2 (static)

Page 53: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware SynthesisPhysically-aware SynthesisExperimental study

53© 2008 Sudeep Pasricha & Nikil Dutt

CTG

Constraint Set

Page 54: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware SynthesisPhysically-aware Synthesis

54© 2008 Sudeep Pasricha & Nikil Dutt

Page 55: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Physically-aware Physically-aware SynthesisSynthesis

Quality of the FABSYN synthesis solution was compared with other synthesis approaches◦ Initial: solution with just 2 buses (initial mapping)◦ ABS: synthesis approach without integrated

floorplanners◦ Manual: designer driven manual synthesis approach

with floorplanner

55© 2008 Sudeep Pasricha & Nikil Dutt

Page 56: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

OutlineOutlineIntroduction

Topology Synthesis

Protocol Parameter Synthesis

Topology and Protocol Parameter Synthesis

Physically aware Synthesis

Co-synthesis with Memory

56© 2008 Sudeep Pasricha & Nikil Dutt

Page 57: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with MemoryMemory can take up a large chunk of on-chip area, as

much as 70% in some cases◦ Estimates indicate that this will go up to 90% in coming years

Variety of different memory types available to satisfy storage requirements in MPSoC applications◦ DRAMs, SRAMs, EPROMs, EEPROMs etc.

Typically ◦ DRAMs -> larger memory requirements, slower, cheaper◦ SRAMs -> smaller memory requirements, faster, expensive◦ EPROMs and EEPROMs -> read-only data

Several tradeoffs during memory architecture synthesis◦ SRAM vs. DRAM

cost vs. performance vs. area◦ ports vs. number of memory blocks

57© 2008 Sudeep Pasricha & Nikil Dutt

Page 58: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with MemoryMemory architecture synthesis determines the

◦ number, type, size of the memories in the system◦ application data mapping to memories

Memory architecture significantly contributes to data traffic on communication architectures

Design of memory architecture has a substantial influence on communication architecture design

Traditionally, in platform-based design, memory synthesis is performed before communication architecture synthesis◦ can lead to inferior design decisions

58© 2008 Sudeep Pasricha & Nikil Dutt

Page 59: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with Memory

59© 2008 Sudeep Pasricha & Nikil Dutt

Motivational study (Pasricha et al. [DATE ‘06])

MPSoC memory and comm. architecture synthesis

Co-synthesis

Separatesynthesis

Page 60: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with Memory Shalan et al. [SASIMI ‘03] proposed a tool to automatically

generate a full crossbar and a dynamic memory management unit

Grun et al. [DATE ‘02] considered the system connectivity topology early in the design flow, in conjunction with memory exploration, for simple processor–memory systems◦ most active access patterns extracted from application data structures◦ different memory architecture configurations that can match needs of

access patterns are obtained, assuming a simple connectivity model◦ next, different comm. architectures are considered for these memory

architecture configurations, and the most suitable interconnect and memory architecture is selected from a pareto-optimal curve

Srinivasan et al. [DATE ‘05] presented an approach to simultaneously consider bus topology splitting and memory bank partitioning during synthesis◦ with the goal of reducing system energy

60© 2008 Sudeep Pasricha & Nikil Dutt

Page 61: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with Memory Pasricha et al. [DATE ‘06] proposed the COSMECA

methodology for memory and comm. architecture synthesis◦ Synthesize bus matrix topology and protocol parameters

Goal: obtain a least cost system, having minimal number of buses while satisfying performance and memory area constraints

COSMECA selects memory blocks from a library populated by several types of memories◦ on-chip SRAMs, DRAMs, EPROMs, EEPROMs, …

Each memory type can have variants in library, having different ◦ capacities, areas, ports, operating frequencies and access

times Memory synthesis in COSMECA

◦ selects appropriate physical memories from library◦ maps application arrays, scalars to physical memories selected

from library

61© 2008 Sudeep Pasricha & Nikil Dutt

Page 62: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with MemoryApplication memory requirements are

initially represented by abstract data blocks (DBs) in a CTG

DBs are initially grouped together into virtual memories

62© 2008 Sudeep Pasricha & Nikil Dutt

Page 63: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with MemoryDBs are merged at this initial step only if they

have◦ similar edges (i.e., edges from the same masters) and◦ non-overlapping access

Subsequently, the enhanced CTG with VMs is used as an input to a branch and bound based bus matrix synthesis framework to generate minimal cost solution

63© 2008 Sudeep Pasricha & Nikil Dutt

Page 64: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with Memory Heuristic used to map VMs to physical memories from

library◦ finds N solutions that satisfy memory area and

performance constraints of design Generate memory access traces that are used to

determine the extent of access overlap of VMs at each slave access point (SAP)◦ after simulating best solution

If the overlap is below a user defined overlap threshold T, the VMs are merged

64© 2008 Sudeep Pasricha & Nikil Dutt

Page 65: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with MemoryVMs are then mapped to physical memories from library Initially, best memory from the library is selected for a VM

that fits capacity requirements and has max. port bandwidth If performance constraints are not met even for the memory

with best performance, the matrix solution is discarded◦ the next best matrix solution from the set of (ranked) matrix solutions

is selected If performance constraints and memory area constraints are

met, the solution is added to the final solution databaseNext, to lower memory area, VMs at SAPs are randomly

selected and the mapped physical memory replaced with one that meets capacity requirements and has lower area◦ If violation detected, then move is reversed, otherwise solution is kept◦ Procedure repeated iteratively till N solutions obtained

65© 2008 Sudeep Pasricha & Nikil Dutt

Page 66: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with MemoryExperiments with MPSoC applications

◦ Shown below: PYTHON application synthesis

66© 2008 Sudeep Pasricha & Nikil Dutt

Page 67: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with MemoryTrade-off curve between number of buses and

memory area

Impact of threshold value

67© 2008 Sudeep Pasricha & Nikil Dutt

Page 68: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with Memory COSMECA saves 25–40% in the number of buses in the

matrix and from 17–29% in memory area compared to traditional approach

68© 2008 Sudeep Pasricha & Nikil Dutt

Page 69: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

Co-synthesis with MemoryCo-synthesis with MemoryMeyer et al. [CODES+ISSS ‘07] attempted to extend

COSMECA by adding layout-awareness during co-synthesis◦ co-synthesis is performed using a SA-based algorithm

Results indicate 20–27% cost reduction for a synthetic DSP software pipeline case study by using the approach ◦ compared to an approach that separately allocates

memory and synthesizes busesA few limitations

◦ Only bus topology synthesis is performed – bus parameter synthesis is neglected

◦ memory synthesis does not consider different memory types - only SRAM memories are supported

69© 2008 Sudeep Pasricha & Nikil Dutt

Page 70: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

SummarySummaryDesigners need techniques that can efficiently explore the

increasingly intractable comm. architecture design space◦ to satisfy and optimize constraints during comm. architecture design

Presented research on techniques for efficient bus-based communication architecture synthesis◦ Scope to extend synthesis techniques for emerging applications

A lot of open problems still remain to be solved, especially in the areas of low level physical and circuit level synthesis approaches (refer book chapter for more details)◦ wire metal layer assignment◦ wire sizing optimization◦ inductance estimation◦ timing-driven floorplanning◦ shield wire insertion algorithms

© 2008 Sudeep Pasricha & Nikil Dutt 70

Page 71: On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.

71© 2008 Sudeep Pasricha & Nikil Dutt