Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network...

40
Performance, Cost, and Energy Evaluation of Fat H- Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)

Transcript of Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network...

Page 1: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Performance, Cost, and Energy Evaluation of Fat H-

Tree:

A Cost-Efficient Tree-BasedOn-Chip Network

Hiroki Matsutani (Keio Univ, JAPAN)Michihiro Koibuchi (NII, JAPAN)

Hideharu Amano (Keio Univ, JAPAN)

Page 2: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Introduction• Network-on-Chips

– Tile architecture– On-chip routers– Packet switching

• Various NoC topologies– Mesh, Torus– H-Tree, Fat Trees

• Fat H-Tree (FHT)

• Evaluations of FHT– Performance– Area– EnergyA mesh-based on-chip network

0 1 2

3 4 5

6 7 8

Tile (RISC, DSP, RAM, I/O)

We proposed FHT as an alternative to Fat Trees

Page 3: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

NoCs’ topologies: Mesh & Torus

• 2-D Mesh • 2-D Torus– 2x bandwidth of meshRAW [Taylor, IEEE Micro’02]

Router Core

Fat H-Tree is a tree-based topology, but it includes a torus

structure

Page 4: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

NoCs’ topologies: Fat Trees

• Fat Tree (p, q, c)p: # of upward linksq: # of downward

linksc: # of core ports

Router Core

Fat Tree (2,4,2)Fat Tree (2,4,1)

Rank-1

Rank-2

Trees are duplicated in Fat Trees and Fat H-Tree, but the connection patterns of trees are different!

Page 5: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Outline• NoCs’ topologies

– Mesh, Torus– H-Trees, Fat Trees

• Fat H-Tree (FHT)– Structure– 2-D layout– Routing algorithm (DTR)

• Evaluations of FHT– Network logic area– Energy consumption– Throughput

Page 6: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Fat H-Tree: Structure

• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)

[Yamada, EUC’04]

Combining two H-Trees (red & black)

Router Core Router Core

Location of black tree is shifted lower-right direction of red tree

By shifting the location of black tree, the connection pattern of trees

are different from original Fat Trees

Page 7: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Fat H-Tree: Structure

• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)

[Yamada, EUC’04]

Combining two H-Trees (red & black)

Router Core Router Core

Fat H-Tree is formed on red & black trees

Page 8: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Fat H-Tree: Structure

• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)

[Yamada, EUC’04]

Combining two H-Trees (red & black)

Router Core Router Core

Fat H-Tree is formed on red & black trees

Page 9: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Fat H-Tree: Structure

• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)

[Yamada, EUC’04]

Combining two H-Trees (red & black)

Router Core Router Core

Fat H-Tree is formed on red & black trees

Page 10: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Fat H-Tree: Structure

• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)

[Yamada, EUC’04]

Combining two H-Trees (red & black)

Router Core Router Core

Rank-2 or upper routers are omitted in this figure

Each core is connected to

both red & black trees

Ring is formed with cores & rank1

routers

Torus-level performance by combing only two H-Trees

Page 11: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Fat H-Tree: 2-D layout on VLSI

• Fat H-Tree– Torus structure Folded as well as the folded layout of 2-D Torus

Fat H-Tree’s 2-D layoutRouter Core

Topologically equivalent

(Long feedback links across chip)

Page 12: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Fat H-Tree: Routing algorithm

• Paths on a single H-tree– Only red tree, or– Only black tree

Only red tree 6-

hopOnly black

tree 6-hop

Page 13: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Fat H-Tree: Routing algorithm

• Paths on a single H-tree– Only red tree, or– Only black tree

• Paths across trees– Transit between

trees– Minimum paths

Firstly red is used

Then black is used, total 4-hop (minimum)

Transit!

Exploiting such paths is key for improving the

performance

Page 14: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Fat H-Tree: Dual tree routing (DTR)

• Dual tree routing– Transit trees for

minimum paths– Cycles across trees

• Deadlock avoidance– VC# is increased

when a packet transits from red to black

VC#0 is used

VC#1 is used

Transit!

Sufficient number of VCs is only TWO in 64-node FHT

Page 15: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Outline• NoCs’ topologies

– Mesh, Torus– H-Trees, Fat Trees

• Fat H-Tree (FHT)– Structure– 2-D layout– Routing algorithm (DTR)

• Evaluations of FHT– Network logic area– Energy consumption– Throughput

Page 16: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Ideal throughput: Channel bisection

Bandwidth of FHT is much improved by the torus structure

N=16 N=64 N=256

HT 4 4 4 4

FT1 8 16 32

FT2 16 32 64

FHT 24 40 72

Mesh 8 16 32

Torus 16 32 64

FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2)

nn 22N

1n2

2n2

2n2

1n2

82 2n

due to torus

due to two H-Trees

Page 17: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Number of routers

Router count of FHT is less than Fat Tree(2,4,2)

N=16 N=64 N=256

HT 5 21 85

FT1 6 28 120

FT2 12 56 240

FHT 10 42 170

Mesh 16 64 256

Torus 16 64 256

FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2)

nn 22N

2/)24( nn nn 24

N

3/)14(2 n

3/)14( n

N

Note number of NI is not considered.

FHT requires 2-port NIs for red & black

Page 18: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Network logic area (routers & NIs)

• Synthesis of NoC– 16-core, 64-core– Design Compiler– 0.18um CMOS

• Router architecture– 1-flit = 32-bit– 4-stage pipeline– Wormhole, 2VCs

• NI architecture– In: 2-flit FIFO– Out: 2-flit FIFO

CrossbarInput Ports

Buf

Wormhole router

Buf

Buf

Buf

2VCs

2VCs

FHT’s NI is implemented as a “router” to forward packets

between trees

Page 19: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Synthesis result (64-

core)

Network logic area: 16/64-core

Synthesis result (16-

core)

Network logic area of FHT is smaller than Fat Tree(2,4,2)

FHT’s NI is larger than others

Page 20: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Total wire length of all links

• Total unit-length of links– Core router– Router router

1-unit link

1-unit link

How many unit-links would FHT require?

1-unit = distance between neighboring cores

N=16 N=64 N=256

HT 24 112 480

FT1 32 192 1,024

FT2 64 384 2,048

FHT 72 392 1,800

Mesh 24 112 480

Torus 48 224 960

FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2)

nn 22N

nN

)2(2 nN 1

1

2

)12(88

n

nN

nN2

)2(4 nN

n

nN

2

)12(2

Wire length of FHT is almost the same as Fat Tree(2,4,2)

Page 21: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Energy: NoC’s energy model

• Ave. flit energy– Send 1-flit to dest.– How much

energy[J] ?

• Parameters– 12mm square chip– 16/64-core– 0.18um CMOS

• Switching energy– 1-bit switching @ router– Gate-level sim– 1.88 [pJ / hop]– 1.27 [pJ / hop]– 1.45 [pJ / hop]

• Link energy– 1-bit transfer @ link– 0.67 [pJ / mm]

flitE

swE

linkE)( linkswaveflit EEHwE

[Wang, DATE’05]

12mm

for routers

for NI

for NI(fht)

Page 22: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Energy consumption: 16/64-core

Simulation result (16-

core)

Energy consumption of FHT is less than Fat Tree(2,4,2)

Simulation result (64-

core)

Page 23: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Throughput: Simulation environment

• Flit-level simulation– Throughput / latency– 16/64-core

• Topology (routing)– Mesh, Torus (DOR)– Fat Trees (up/down)– Fat H-Tree (DTR)

• Traffic patterns– Uniform– BT.W– SP.W– CG.W– MG.W– IS.W

Packet size 16-flit (1-flit header)Buffer size 1-flit per channel

Switching Wormhole

# of VCs 2Latency 3-cycle per 1-hop

NAS Parallel Benchmark

Page 24: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

FHT vs. FTs: Uniform (16/64-core)• FHT (DTR) • Fat Tree(2,4,2)• Fat Tree(2,4,1)

FHT outperforms FT2 in 16-core,but it doesn’t in 64-core

Uniform (16-core) Uniform (64-core)

FHT(DTR) causes

congestion around root of

trees

Page 25: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

FHT vs. FTs: BT (16/64-core)

BT has neighboring communications. Advantage for FHT(DTR)

BT traffic (64-core)

• FHT (DTR) • Fat Tree(2,4,2)• Fat Tree(2,4,1) FHT(DTR)

doesn’t cause congestion

around roots

BT traffic (16-core)

Page 26: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

FHT vs. FTs: MG (16/64-core)

Performance is … FHT(DTR) > FT2 > FT1

MG traffic (16-core) MG traffic (64-core)

• FHT (DTR) • Fat Tree(2,4,2)• Fat Tree(2,4,1)

Page 27: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Summary: Evaluations of FHT

• Performance– FHT outperforms Fat Tree (FT2), except for

uniform

• Network logic area– FHT requires 20.5%-28.1% smaller area than FT2

• Energy consumption– FHT requires 6.7%-7.0% less energy than FT2

• Wire length– Wire length of FHT is almost the same as FT2

• Ongoing works– Evaluation in 90nm CMOS– 3-D layout of FHT for 3-D NoCs

wafer

wafer

wafer

(stacked ICs)

Page 28: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Thank you for your attention

Page 29: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Feasibility of Fat H-Tree

• Total wire length– Slightly longer than Fat Trees– But a lot of wire resources are available on-chip

• Wire delay– Length of the longest wire is same as Fat Trees

Fat Tree (2,4,1)Fat H-Tree

If Fat Trees are feasible, Fat H-Tree can be implemented with smaller area but higher

performance

Page 30: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Routings for FHT: Torus routing(TOR)

• Single tree (STR)– Select a single tree

per packet– Can’t transit trees

• Dual tree (DTR)– Transit trees for

minimal paths– VCs are needed

• Torus routing (TOR)– Use torus formed

with rank1 & cores– VCs are needed

Fat H-Tree’s torus structure

Can’t use rank-2 or upper

routers

To avoid congestion around roots, but non-minimal paths

Page 31: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

FHT vs. Torus: Uniform (16/64-core)

• FHT (DTR): • FHT (TOR): • 2-D Torus• 2-D Mesh

Minimum routing using links around roots

Using torus structure (can’t use links around roots)

Uniform (64-core)

FHT achieves torus-level throughput using only torus structure

Uniform (16-core)

Page 32: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Number of VCs in Dual Tree Routing

• # of VCs required is– H_max is the longest hop count in the

network

• E.g.,– 16-core FHT requires 2VCs– 64-core FHT requires 2VCs– …

14/max H

VC# is increased when a packet transits red to

black

Two VCs is not so costly…

Page 33: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

NIs in Fat H-Tree• Implemented as a

“simplified router”– Connecting red & black

trees

• Routing @ NI is simple– Forward packets to another

tree if dst is not me

Processing Core

Crossbar

for red tree for black tree

Fat H-Tree

Page 34: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Synthesis result (64-

core)

Network logic area: 16/64-core

Synthesis result (16-

core)

Network logic area of FHT is smaller than Fat Tree(2,4,2)

FHT’s NI is larger than others

Page 35: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Page 36: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

• Fat H-Tree– Minimum routing (DTR)

routing N=16 N=64 N=256

FT up/down 3.60 5.43 7.36

FHT DTR 3.20 4.84 6.78

Mesh DOR 2.67 5.33 10.67

Torus DOR 2.13 4.06 8.03

FHT offers shorter average hop count than Fat Trees

Average hop count

Nyx,

y)(x,2ave HN-N

H1

FT: Fat Trees

Page 37: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Wire length of links

• Case studies– 16-core (1-unit = 3.0mm)– 64-core (1-unit = 1.5mm)

1-unit = 3mm

Utilization rate of wire resources in 2 metal layers (%)

1-unit = 1.5mm

Flit-width = 32-bit @ 12mm square chip

12mm

N=16 N=64

HT 1.6% 3.7%

FT1 2.1% 6.4%

FT2 4.3% 12.8%

FHT 4.8% 13.1%

Mesh 1.6% 3.7%

Torus 3.2% 7.5%

Wire length of FHT is almost the same as Fat Tree(2,4,2)

Page 38: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Routings for FHT: Single tree (STR)

• Single tree (STR)– Select a single tree

per packet– Can’t transit trees

• Dual tree (DTR)– Transit trees for

minimal paths– VCs are needed

• Torus routing (TOR)– Use torus formed

with rank1 & cores– VCs are needed

Case 1: red tree 6-hop

Case 2: black tree 4-hop

Page 39: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Routings for FHT: Dual tree (DTR)

• Single tree (STR)– Select a single tree

per packet– Can’t transit trees

• Dual tree (DTR)– Transit trees for

minimal paths– VCs are needed

• Torus routing (TOR)– Use torus formed

with rank1 & cores– VCs are needed

Firstly red is used

Then black is used

# of VC is increased when a packet transits red to

black

Page 40: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Fat H-Tree: Structure

• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)

[Yamada, EUC’04]

Combining two H-Trees (red & black)

Router Core Router Core

Both edges are connected (folded)

By shifting and folding black tree, the connection pattern of trees are

different from original Fat Trees