CS184A/284A AI in Biology and Medicine Course Introduction ...
CS184a: Computer Architecture (Structure and Organization)
-
Upload
stone-kirk -
Category
Documents
-
view
39 -
download
0
description
Transcript of CS184a: Computer Architecture (Structure and Organization)
![Page 1: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/1.jpg)
Caltech CS184 Winter2005 -- DeHon1
CS184a:Computer Architecture
(Structure and Organization)
Day 16: February 14, 2005
Interconnect 4: Switching
![Page 2: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/2.jpg)
Caltech CS184 Winter2005 -- DeHon2
Previously
• Used Rent’s Rule characterization to understand wire growth
IO = c Np
• Top bisections will be (Np)
• 2D wiring area
(Np)(Np) = (N2p)
![Page 3: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/3.jpg)
Caltech CS184 Winter2005 -- DeHon3
We Know
• How we avoid O(N2) wire growth for “typical” designs
• How to characterize locality
• How we might exploit that locality to reduce wire growth
• Wire growth implied by a characterized design
![Page 4: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/4.jpg)
Caltech CS184 Winter2005 -- DeHon4
Today
• Switching– Implications– Options
![Page 5: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/5.jpg)
Caltech CS184 Winter2005 -- DeHon5
Switching:
How can we use the locality captured by Rent’s Rule to reduce switching requirements? (How much?)
![Page 6: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/6.jpg)
Caltech CS184 Winter2005 -- DeHon6
Observation
• Locality that saved us wiring,
also saves us switching
IO = c Np
![Page 7: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/7.jpg)
Caltech CS184 Winter2005 -- DeHon7
Consider• Crossbar case to exploit wiring:
split into two halves, connect with limited wires N/2 x N/2 crossbar each half N/2 x (N/2)p connect to bisection wires 2(N2/4) + 2(N/2)p+1 < N2
![Page 8: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/8.jpg)
Caltech CS184 Winter2005 -- DeHon8
Recurse• Repeat at each level
– form tree
![Page 9: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/9.jpg)
Caltech CS184 Winter2005 -- DeHon9
Result• If use crossbar at each tree node
– O(N2p) wiring area• for p>0.5, direct from bisection
– O(N2p) switches• top switch box is O(N2p) • switches at one level down is
–2(1/2p)2 previous level
–(2/22p)=2(1-2p)
–coefficient < 1 for p>0.5
![Page 10: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/10.jpg)
Caltech CS184 Winter2005 -- DeHon10
Result• If use crossbar at each tree node
– O(N2p) switches• top switch box is O(N2p) • switches at one level down is
2(1-2p) previous level
• Total switches:N2p (1+2(1-2p) +22(1-2p) +23(1-2p) +…)get geometric series; sums to O(1)N2p (1/1-2(1-2p) )
= 2(2p-1) /(2(2p-1) -1) N2p
![Page 11: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/11.jpg)
Caltech CS184 Winter2005 -- DeHon11
Good News
• Good news – asymptotically optimal– Even without switches, area O(N2p)
• so adding O(N2p) switches not change
![Page 12: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/12.jpg)
Caltech CS184 Winter2005 -- DeHon12
Bad News
• Switches area >> wire crossing area– Consider 6 wire pitch crossing 362
– Typical (passive) switch 2500 2
– Passive only: 70x area difference• worse once rebuffer or latch signals.
• …and switches limited to substrate– whereas can use additional metal layers
for wiring area
![Page 13: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/13.jpg)
Caltech CS184 Winter2005 -- DeHon13
Additional Structure
• This motivates us to look beyond crossbars– can depopulate crossbars on up-down
connection without loss of functionality?
![Page 14: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/14.jpg)
Caltech CS184 Winter2005 -- DeHon14
Can we do better?
• Crossbar too powerful?– Does the specific down channel matter?
• What do we want to do?– Connect to any channel on lower level– Choose a subset of wires from upper level
• order not important
![Page 15: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/15.jpg)
Caltech CS184 Winter2005 -- DeHon15
N choose K
• Exploit freedom to depopulate switchbox
• Can do with:– K(N-K+1) swtiches– Vs. K N– Save ~K2
![Page 16: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/16.jpg)
Caltech CS184 Winter2005 -- DeHon16
N-choose-M
• Up-down connections– only require concentration
• choose M things out of N
– i.e. order of subset irrelevant
• Consequent:– can save a constant factor ~ 2p/(2p-1)
• (N/2)p x Np vs (Np - (N/2)p+1)(N/2)p
• P=2/3 2p/(2p-1) 2.7
• Similary, Left-Right
– order not important reduces switches
![Page 17: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/17.jpg)
Caltech CS184 Winter2005 -- DeHon17
Multistage Switching
![Page 18: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/18.jpg)
Caltech CS184 Winter2005 -- DeHon18
Multistage Switching
• Can route any permutation w/ less switches than a crossbar
• If we allow switching in stages– Trade increase in switches in path– For decrease in total switches
![Page 19: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/19.jpg)
Caltech CS184 Winter2005 -- DeHon19
Butterfly
• Log stages
• Resolve one bit per stage
0000000100100011010001010110011110001001101010111100110111101111
bit3 bit2 bit1 bit0
![Page 20: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/20.jpg)
Caltech CS184 Winter2005 -- DeHon20
What can a Butterfly Route?
• 0000 0001
• 1000 0010
0000
1000
0000000100100011010001010110011110001001101010111100110111101111
![Page 21: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/21.jpg)
Caltech CS184 Winter2005 -- DeHon21
Butterfly Routing
• Cannot route all permutations– Get internal blocking
• What required for non-blocking network?
![Page 22: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/22.jpg)
Caltech CS184 Winter2005 -- DeHon22
Decomposition
![Page 23: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/23.jpg)
Caltech CS184 Winter2005 -- DeHon23
Decomposition
• Switches: N/2 2 4 + (N/2)2 < N2
![Page 24: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/24.jpg)
Caltech CS184 Winter2005 -- DeHon24
Recurse
If it works once, try it again…
![Page 25: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/25.jpg)
Caltech CS184 Winter2005 -- DeHon25
Result: Beneš Network• 2log2(N)-1 stages (switches in path)
• Made of N/2 22 switchpoints – (4 switches)
• 4Nlog2(N) total switches
• Compute route in O(N log(N)) time
![Page 26: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/26.jpg)
Caltech CS184 Winter2005 -- DeHon26
Beneš Network Wiring
• Bisection: N
• Wiring O(N2) area (fixed wire layers)
![Page 27: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/27.jpg)
Caltech CS184 Winter2005 -- DeHon27
Beneš Switching
• Beneš reduced switches– N2 to N(log(N)) – using multistage network
• Replace crossbars in tree with Beneš switching networks
![Page 28: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/28.jpg)
Caltech CS184 Winter2005 -- DeHon28
Beneš Switching
• Implication of Beneš Switching – still require O(W2) wiring per tree node
• or a total of O(N2p) wiring– now O(W log(W)) switches per tree node
• converges to O(N) total switches!– O(log2(N)) switches in path across network
• strictly speaking, dominated by wire delay ~O(Np)• but constants make of little practical interest except
for very large networks
![Page 29: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/29.jpg)
Caltech CS184 Winter2005 -- DeHon29
Better yet…
• Believe do not need Beneš on the up paths• Single switch on up path• Beneš for crossover• Switches in path:
log(N) up+ log(N) down+ 2log(N) crossover= 4 log(N)= O(log (N))
![Page 30: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/30.jpg)
Caltech CS184 Winter2005 -- DeHon30
Linear Switch Population
![Page 31: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/31.jpg)
Caltech CS184 Winter2005 -- DeHon31
Linear Switch Population
• Can further reduce switches– connect each lower channel to O(1)
channels in each tree node– end up with O(W) switches per tree node
![Page 32: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/32.jpg)
Caltech CS184 Winter2005 -- DeHon32
Linear Switch (p=0.5)
![Page 33: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/33.jpg)
Caltech CS184 Winter2005 -- DeHon33
Linear Population and Beneš
• Top-level crossover of p=1 is Beneš switching
![Page 34: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/34.jpg)
Caltech CS184 Winter2005 -- DeHon34
Beneš Compare
• Can shuffle stages so local shuffles on outside and big shuffle in middle
![Page 35: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/35.jpg)
Caltech CS184 Winter2005 -- DeHon35
Linear Consequences:Good News
• Linear Switches– O(log(N)) switches in path– O(N2p) wire area– O(N) switches
– More practical than Beneš crossover case
![Page 36: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/36.jpg)
Caltech CS184 Winter2005 -- DeHon36
Linear Consequences:Bad News
• Lacks guarantee can use all wires– as shown, at least mapping ratio > 1– likely cases where even constant not
suffice • expect no worse than logarithmic
• Finding Routes is harder– no longer linear time, deterministic– open as to exactly how hard
![Page 37: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/37.jpg)
Caltech CS184 Winter2005 -- DeHon37
Mapping Ratio
• Mapping ratio says– if I have W channels
• may only be able to use W/MR wires
–for a particular design’s connection pattern• to accommodate any design
–forall channels
physical wires MR logical
![Page 38: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/38.jpg)
Caltech CS184 Winter2005 -- DeHon38
Mapping Ratio
• Example:– Shows MR=3/2
– For Linear Population, 1:1 switchbox
![Page 39: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/39.jpg)
Caltech CS184 Winter2005 -- DeHon39
Area ComparisonBoth: p=0.67 N=1024
M-choose-Nperfect map
LinearMR=2
![Page 40: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/40.jpg)
Caltech CS184 Winter2005 -- DeHon40
Area Comparison
M-choose-Nperfect map
LinearMR=2
• Since – switch >> wire
• may be able to tolerate MR>1
• reduces switches– net area savings
• Empirical:– Never seen greater
than 1.5
![Page 41: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/41.jpg)
Caltech CS184 Winter2005 -- DeHon41
Expanders
![Page 42: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/42.jpg)
Caltech CS184 Winter2005 -- DeHon42
Expander Theory,expansion
– Any group of size k=N will expand connect to a group of size kN in each logical direction
[Arora, Leighton, Maggs SIAM Journal of Comp. v25n3p600 1996]
![Page 43: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/43.jpg)
Caltech CS184 Winter2005 -- DeHon43
Expander Idea
• IF we can achieve expansion– Can guarantee non-blocking at each stage
• i.e.– Guarantee use less than N– Guarantee connections to more stuff in
next level– Since N>N available in next level
• Guaranteed to be an available switch
![Page 44: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/44.jpg)
Caltech CS184 Winter2005 -- DeHon44
Dilated Switches
• Have multiple outputs per logical direction– Dilation: number of outputs per direction– E.g. radix 2 switch w/ 4 outputs
• 2 per direction• Dilation 2
Up (0)
Down (1)
![Page 45: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/45.jpg)
Caltech CS184 Winter2005 -- DeHon45
Dilated Switches allow Expansion
• On Right– Any pair of nodes
connects to 3 switches
• Strictly speaking must have d>2 for expansion
![Page 46: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/46.jpg)
Caltech CS184 Winter2005 -- DeHon46
Random Wiring
• Random, dilated wiring for butterfly can achieve
– For tree…22p (?)[Upfal/STOC 1989,Leighton/Leiserson/Kravets MIT/LCS/RSS 8 1990]
![Page 47: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/47.jpg)
Caltech CS184 Winter2005 -- DeHon47
Constraints
• Total load should not exceed of net– L=mapping ratio (light loading factor) LW = number into each subtree LW W/2p
L 1/(2p)
• Cannot expand past the size of subtree NN/2p
1/2p
![Page 48: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/48.jpg)
Caltech CS184 Winter2005 -- DeHon48
Extra Switches
• Extra switch factor: d×L
• Try: =2, =1/10– d=8– dL40 (p=1)
• Try: =3/2, =1/10 dL30
![Page 49: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/49.jpg)
Caltech CS184 Winter2005 -- DeHon49
Putting it Together
• Base, linear-population trees have O(N) switches
• Make larger by a factor of L (linear factor)• Dilated version have a factor of d more
switches• Randomly wired expander
– Can have O(N) switches– Guarantee routes– Constants < 100
– Open: how tight can make it?
![Page 50: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/50.jpg)
Caltech CS184 Winter2005 -- DeHon50
Big Ideas[MSB Ideas]
• In addition to wires, must have switches– Have significant area and delay
• Rent’s Rule locality reduces– both wiring and switching requirements
• Naïve switches match wires at O(N2p)– switch area >> wire area– prevent benefit from multiple layers of metal
![Page 51: CS184a: Computer Architecture (Structure and Organization)](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812bc7550346895d9013b5/html5/thumbnails/51.jpg)
Caltech CS184 Winter2005 -- DeHon51
Big Ideas[MSB Ideas]
• Can achieve O(N) switches– plausibly O(N) area with sufficient metal layers
• Switchbox depopulation– save considerably on area (delay)– will waste wires– May still come out ahead (evidence to date)