Download - Handling Global Traffic in Future CMP NoCs

Page 1: Handling Global Traffic in Future CMP NoCs

Handling Global Traffic in Future CMP NoCs

Ran Manevich, Israel Cidon, and Avinoam Kolodny.


Modu le Module

Modu le Modu le

Modu le Modu le

Modu le


Modu le

Modu le

Modu leGroup


Electrical Engineering DepartmentTechnion – Israel Institute of Technology

Haifa, Israel

SLIP 2012

Page 2: Handling Global Traffic in Future CMP NoCs

Bandwidth Version of Rent’s Rule

B – Cluster external bandwidth.k – Average bandwidth per

module.G – Number of modules in a

cluster.R – Rent’s exponent, 0<R<1.

B = kGR

G = 16B = ∑

Greenfield et al., “Implications of Rent’s Rule for NoC Design and Its Fault-Tolerance”, NOCS 2007

Page 3: Handling Global Traffic in Future CMP NoCs

Rent’s Exponent Reflects Traffic Locality

Page 4: Handling Global Traffic in Future CMP NoCs

CMP NoC Traffic Follows Rent’s Rule

2D Mesh NoC

~Average of CMP parallel programs*

* Heirman et al., “Rent’s Rule and Parallel Programs: Characterizing Network Traffic Behaviour”, SLIP 2008

Page 5: Handling Global Traffic in Future CMP NoCs

2D Mesh – Packets Classification by Distance For illustration purposes, packets are

classified according to distances between sources and destinations.


Nearest Neighbor (NN) –Dist = 1

Local – 1<Dist<2+K/8

Global – Dist ≥ 2+K/8K=


Page 6: Handling Global Traffic in Future CMP NoCs

Fraction of global packets decreases in large systems

Rent’s exponent (R) = 0.7


Page 7: Handling Global Traffic in Future CMP NoCs

Dominance of Global Packets in BW/Router and Light Load Latency

Nearest Neighbor traffic is dominant in small systems.

* Zarkesh-Ha et al., “Hybrid Network on Chip (HNoC): local buses with globalmesh architecture”, SLIP 2010


In large systems:1.Global packets are

minority.2.Global packets

dominate BW/router and average latency.

Page 8: Handling Global Traffic in Future CMP NoCs


In large systems, global packets (minority):

Consume most of the network’s BW.Significantly increase average light load latency.

Page 9: Handling Global Traffic in Future CMP NoCs

Solution - PyraMesh

Overall hops-count is reduced.Average latency is reduced.

Average BW per router is reduced.

Hierarchical 2D mesh. Global packets are routed

through higher hierarchy levels.

12345678 hopsinstead of 14!



Page 10: Handling Global Traffic in Future CMP NoCs

PyraMesh - ArchitectureK – The size of the base

mesh.NL – Number of levels.NP – Number of pyramids on

top of the base mesh.

αi – Ratio between the sizes of levels i and i+1.

Ci – Number of routers in level i that are connected to a router in level i+1 along a single dimension.

K = 8, NL = 2, NP = 1αi = 4, Ci = 2

K = 8, NL = 3, NP = 1αi = 2, Ci = 1

K = 8, NL = 2, NP = 4αi = 4, Ci = 1

Page 11: Handling Global Traffic in Future CMP NoCs

Addressing – On each level i, node (X,Y)Base Mesh is represented by the nearest router in the North-East quarter:

Routing – XY:

PyraMesh – Addressing and Routing


,( , )1

, ;


i X Y i mmi i

X YAddress at a

at at

Page 12: Handling Global Traffic in Future CMP NoCs

Packets are distributed among levels i according to their travel distance (D) in the base mesh.

DThi – Distance threshold of level i. If D > DThi , the packet is directed to level

i+1. Example: DThi = 6, 12, 20

PyraMesh – Packets Classification

Highest Level Travel Distance

4 D>203 12<D≤202 6<D≤12

1 (Base Mesh) D≤6

Page 13: Handling Global Traffic in Future CMP NoCs

Area overhead,

Wiring overhead,

Maximum bandwidth per router*,

Average light-load latency* =




Page 14: Handling Global Traffic in Future CMP NoCs

Optimization Results Example of 16x16 System, R = 0.7

Throughput optimized PyraMesh:

Light load latency optimized PyraMesh:


D>8Packets distance thresholds



Page 15: Handling Global Traffic in Future CMP NoCs

Light Load Latency Performance

BMesh – The baseline meshScaled Mesh (SMesh) – Links wider than inBMesh by PyraMesh area overhead factor.

HNoC –

Page 16: Handling Global Traffic in Future CMP NoCs

Throughput Results, R = 0.7

Page 17: Handling Global Traffic in Future CMP NoCs

Our Contributions

The observation that global packets limit scalability of large systems.

PyraMesh – A novel framework for hierarchical NoCs design.

Characterization of Rentian traffic in large NoCs.

Page 18: Handling Global Traffic in Future CMP NoCs

Conclusions Global packets limit performance in

large (future) CMP systems.

PyraMesh – A novel class of hierarchical 2D mesh topologies.

PyraMesh handles global traffic in future CMP NoCs.

Page 19: Handling Global Traffic in Future CMP NoCs

Thank You!

Page 20: Handling Global Traffic in Future CMP NoCs

Related Work

CMesh J. D. Balfour and W. J. Dally. “Design tradeoffs for tiled CMP on-chip

networks”. International Conference on Supercomputing, 2006.

GigaNoC C. Puttmann, J.-C. Niemann, M. Porrmann, and U. Rückert. “GigaNoC – A

hierarchical network-on-chip for scalable chip-multiprocessors.” Euromicro DSD 2007.

Long Range Links U. Y. Ogras and R. Marculescu. “ ‘It’s a small world after all’: NoC performance

optimization via long-range link insertion”. IEEE Trans. on Very Large Scale Integr. (VLSI) Syst. 2006.

Hierarchical Rings on a Mesh S. Bourduas and Z. Zilic. “Latency reduction of global traffic in wormhole-routed

meshes using hierarchical rings for global routing”. ASAP 2007.

Hierarchical 2-Levels 2D MeshMarkus Winter and Steffen Prusseit and Gerhard P. Fettweis. Hierarchical routing architectures in clustered 2D-mesh networks-on-chip. ISOCC 2010.