Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect...

59
Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect

Transcript of Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect...

Page 1: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon1

ESE534:Computer Organization

Day 25: April 28, 2014

Interconnect 7: Dynamically Switched Interconnect

Page 2: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon2

Previously

• Configured Interconnect– Lock down route between source and sink

• Multicontext Interconnect– Switched cycle-by-cycle from Instr. Mem.

• Interconnect Topology

• Data-dependent control for computation

Page 3: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon3

Today

• Data-dependent control/sharing of interconnect

• Dynamic Sharing (Packet Switching)– Motivation– Formulation– Design– Assessment

Page 4: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon4

Motivation

Page 5: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Consider: Preclass 2

• How might this be inefficient with configured interconnect?

Penn ESE534 Spring2014 -- DeHon5

Page 6: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Consider: Preclass 2

• How might this be more efficient?

• What might be inefficient about it?

Penn ESE534 Spring2014 -- DeHon6

Page 7: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Consider: Preclass 2

• How might this be more efficient?

• What undesirable behavior might this have?

Penn ESE534 Spring2014 -- DeHon7

Page 8: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon8

Opportunity

• Interconnect major area, energy, delay

• Instantaneous communication << potential communication

• Question: can we reduce interconnect requirements by only routing instantaneous communication needs?

Page 9: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Examples

• Data dependent, unpredictable results– Search filter, compression

• Slowly changing values– Surveillance, converging values

Penn ESE534 Spring2014 -- DeHon9

Page 10: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon10

Formulation

Page 11: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon11

Dynamic Sharing• Don’t reserve resources

– Hold a resource for a single source/sink pair– Allocate cycles for a particular route

• Request as needed

• Share amongst potential users

Page 12: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon12

Bus Example• Time Multiplexed version

– Allocate time slot on bus for each communication

• Dynamic version– Arbitrate for bus on each cycle

Page 13: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon13

Bus Example• Time Multiplexed version

– Allocate time slot on bus for each communication

• Dynamic version– Arbitrate for bus on each cycle

Page 14: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon14

Dynamic Bus Example

• 4 PEs– Potentially each send out result on change– Value only changes with probability 0.1 on each

“cycle”– TM: Slot for each

• PE0 PE1 PE2 PE3 PE0 PE1 PE2 PE3– Dynamic: arbitrate based on need

• None PE0 none PE1 PE1 none PE3 ….

• TM either runs slower (4 cycles/compute) or needs 4 busses

• Dynamic single bus seldom bottleneck– Probability two need bus on a cycle?

Page 15: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon15

Network Example

• Time Multiplexed– As assumed so far in class– Memory says how to set switches on each

cycle

• Dynamic– Attach address or route designation– Switches forward data toward destination

Page 16: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon16

Penn ESE534 Spring 2010 -- DeHon16

Butterfly

• Log stages

• Resolve one bit per stage

0000000100100011010001010110011110001001101010111100110111101111

bit3 bit2 bit1 bit0

Page 17: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon17

Penn ESE534 Spring 2010 -- DeHon17

Tree Route

• Downpath resolves one bit per stage

Page 18: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon18

ESE534 -- Spring 2010 -- DeHon18

Mesh Route

• Destination (dx,dy)• Current location

(cx,cy)• Route up/down

left/right based on (dx-cx,dy-cy)

Page 19: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Prospect

• Use less interconnect

<OR> get higher throughput across fixed interconnect

• By– Dynamically sharing limited interconnect

• Bottleneck on instantaneous data needs– Not worst-case data needs

Penn ESE534 Spring2014 -- DeHon19

Page 20: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon20

Design

Page 21: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon21

Ring

• Ring: 1D Mesh

Page 22: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon22

How like a bus?

Page 23: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon23

Advantage over bus?

Page 24: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon24

Preclass 1

• Cycles from simulation?

• Max delay of single link?

• Best case possible?

Page 25: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon25

Issue 1: Local online vs Global Offline

• Dynamic must make local decision– Often lower quality than offline, global

decision

• Quality of decision will reduce potential benefits of reducing instantaneous requiements

Page 26: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon26

Experiment

• Send-on-Change for spreading activation task• Run on Linear-Population Tree network• Same topology both cases• Fixed size graph• Vary physical tree size

– Smaller trees more serial• Many “messages” local to cluster, no routing

– Large trees more parallel

Page 27: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon27

Spreading Activation

• Start with few nodes active• Propagate changes along edges

1

2 2

2

3

3

33

3

Page 28: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon28

Butterfly Fat Trees (BFTs)

PE

PE PE

PE

PE

PE PE

PE

PE

PE PE

PE

PE

PE PE

PE

• Familiar from Day 17, 18, HW8

• Similar phenomena with other topologies

• Directional version

Page 29: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon29

Iso-PEs

[Kapre et al., FCCM 200]

Page 30: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon30

Iso-PEs

• PS vs. TM ratio at same PE counts– Small number of PEs little difference

• Dominated by serialization (self-messages)• Not stressing the network

– Larger PE counts• TM ~60% better• TM uses global congestion knowledge while

scheduling

Page 31: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon31

Preclass 3

• Logic for muxsel<0>?• Logic for Arouted? Logic for Brouted?• Gates?• Gate Delay?

Page 32: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon32

Issue 2: Switch Complexity

• Requires area/delay/energy to make decisions

• Also requires storage area

• Avoids instruction memory

• High cost of switch complexity may diminish benefit.

Page 33: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon33

Mesh Congest• Preclass 2 ring

similar to slice through mesh

• A,B – corner turns• May not be able to

route on a cycle

Page 34: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon34

Mesh Congest• What happens when

inputs from 2 (or 3) sides want to travel out same output?

Page 35: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon35

FIFO Buffering

• Store inputs that must wait until path available– Typically store in FIFO buffer

• How big do we make the FIFO?

Page 36: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon36

PS Hardware Primitives

T-Switch

π -Switch

s

m m

ss

m

m m

s s

m m

m m

s s

s s

Red arrows backpressure when FIFOs are full

Page 37: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon37

FIFO Buffer Full?

• What happens when FIFO fills up?

s

m m

ss

m

PE

PE PE

PE

PE

PE PE

PE

PE

PE PE

PE

PE

PE PE

PE

Page 38: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon38

FIFO Buffer Full?

• What happens when FIFO fills up?

s

m m

ss

m

PE

PE PE

PE

PE

PE PE

PE

PE

PE PE

PE

PE

PE PE

PE

Page 39: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon39

FIFO Buffer Full?

• What happens when FIFO fills up?

s

m m

ss

m

PE

PE PE

PE

PE

PE PE

PE

PE

PE PE

PE

PE

PE PE

PE

Page 40: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon40

FIFO Buffer Full?

• What happens when FIFO fills up?

• Maybe backup network

• Prevent other routes from using– If not careful, can create

deadlock

PE

PE PE

PE

PE

PE PE

PE

PE

PE PE

PE

PE

PE PE

PE

Page 41: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon41

TM Hardware Primitives

tm -m erge

m m

m

mm

m m

Page 42: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon42

Area in PS/TM Switches

• Packet (32 wide, 16 deep)– 3split + 3 merge– Split 79

• 30 ctrl, 33 fifo buffer

– Merge 165• 60 ctrl, 66 fifo buffer

– Total: 244

• Time Multiplexed (16b)– 9+(contexts/16)– E.g. 41 at 1024 contexts

• Both use SRL16s for memory (16b/4-LUT)

• Area in FPGA slice countsm ergesplit

s

m m

ss

m

m m

m

Page 43: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon43

Area Effects

• Based on FPGA overlay model

• i.e. build PS or TM on top of FPGA

Page 44: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon44

Preclass 4

• Gates in static design: 8

• Gates in dynamic design: 8+?

• Which energy best?– Pd=1

– Pd=0.1

– Pd=0.5

Page 45: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon45

PS vs TM Switches

• PS switches can be larger/slower/more energy

• Larger:– May compete with PEs for area on limited

capacity chip

Page 46: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon46

Assessment

Following from

Kapre et al. / FCCM 2006

Page 47: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon47

Analysis

• PS v/s TM for same area– Understand area tradeoffs (PEs v/s

Interconnect)

• PS v/s TM for dynamic traffic– PS routes limited traffic, TM has to route

all traffic

Page 48: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon48

Area Analysis

• Evaluate PS and TM for multiple BFTs– Tradeoff Logic Area for Interconnect– Fixed Area of 130K slices

• p=0, BFT => 128 PS PEs => 1476 cycles• p=0.5, BFT => 64 PS PEs => 943 cycles

• Extract best topologies for PS and TM at each area point– BFT of different p best at different area points

• Compare performance achieved at these bests at each area point

Page 49: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon49

PS Iso-Area:Topology Selection

PEs

Serializ ation Bisection Latency

Page 50: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon50

TM Iso-AreaPEs

Serializ ation Bisection Latency

Page 51: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon51

Iso-Area

Page 52: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon52

Iso-Area Ratio

Page 53: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon53

Iso-Area

• Iso-PEs = TM 1~2x better

• With Area– PS 2x better at small areas– TM 4-5x better at large

areas– PS catches up at the end

Page 54: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon54

Activity Factors

• Activity = Fraction of traffic to be routed

• TM needs to route all• PS can route fraction• Variable activity queries in

ConceptNet– Simple queries ~1% edges– Complex queries ~40% edges

1

2 2

2

3

3

33

3

Page 55: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon55

Activity Factors

Page 56: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon56

Crossover could be less

Page 57: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon57

Lessons

• Latency– PS could achieve same clock rate– But took more cycles– Didn’t matter for this workload

• Quality of Route– PS could be 60% worse

• Area– PS larger, despite all the TM instrs– Big factor– Will be “technology” and PE-type dependent

• Depends on relative ratio of PE to switches• Depends on relative ratio of memory and switching

Page 58: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon58

Big Ideas[MSB Ideas]

• Communication often data dependent

• When unpredictable, unlikely to use potential connection– May be more efficient to share dynamically

• Dynamic may cost more per communication– More logic, buffer area, more latency– Less efficient due to local view

• Net win if sufficiently unpredictable

Page 59: Penn ESE534 Spring2014 -- DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.

Penn ESE534 Spring2014 -- DeHon59

Admin

• FM1 marked – see feedback

• Wednesday– Last class– FM2 due– End of discussion period

• Reading on Canvas