WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through((...
Transcript of WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through((...
![Page 1: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/1.jpg)
WiSync: An Architecture for Fast Synchroniza5on through
On-‐Chip Wireless Communica5on
ASPLOS ’16 – Atlanta, GA – April 2-‐6, 2016
Sergi Abadal ([email protected]) Albert Cabellos-‐Aparicio, Eduard Alarcón, Josep Torrellas
UPC and UIUC
![Page 2: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/2.jpg)
Mo$va$on
4/21/16 WiSync – ASPLOS ’16 – April 4th, 2016 2
Manycores are becoming larger: Intel Knights Landing chip has 208 contexts Fine-‐grain synchroniza$on is costly to support in large manycores Global and broadcast communica$on are prohibi$ve
![Page 3: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/3.jpg)
Opportunity
4/21/16 WiSync – ASPLOS ’16 – April 4th, 2016 3
On-‐chip wireless technology provides a promising approach to address it Low latency, natural broadcast capabili$es
Our proposal: a miniature antenna and transceiver per each core
Antenna + Transceiver
Core + L1 + L2
![Page 4: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/4.jpg)
Contribu$on: WiSync Architecture
4/21/16 4 WiSync – ASPLOS ’16 – April 4th, 2016
Wireless on-‐chip communica$on for fine-‐grain synchroniza$on Applicable to large manycores (128+ cores) Per-‐core Broadcast Memory: replicates data in all cores on a write
![Page 5: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/5.jpg)
Emerging Interconnect Technologies
4/21/16 WiSync – ASPLOS ’16 – April 4th, 2016
Nanophotonics Vantrease et al, 2008
Transmission Line Interconnects Oh et al, 2013
Wireless on-‐chip Communica$on
Transceiver (transmi8er and receiver circuits)
On-‐chip Antenna
Core
![Page 6: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/6.jpg)
On-‐chip Wireless Communica$on
4/21/16 6
PROS Inherently broadcast Low latency Simplicity / Flexibility / Non-‐intrusiveness
CONS Less energy efficient than TL or photonics Low bandwidth
WiSync – ASPLOS ’16 – April 4th, 2016
S. Abadal et al, “Broadcast-‐Enabled Massive Mul5core Architectures: A Wireless RF Approach,” IEEE MICRO, vol. 35, no. 5, pp. 52–61, 2015.
![Page 7: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/7.jpg)
How WiSync Uses Wireless (I)
4/21/16 7
Main Use: Globally shared medium for data transmission One channel shared by all cores Everyone receives what someone transmits Consistent order of delivery Only one core should transmit at the same $me
Simultaneous transmissions “collide”
WiSync – ASPLOS ’16 – April 4th, 2016
t
t
Core 1
Core 2
![Page 8: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/8.jpg)
How WiSync Uses Wireless (I)
4/21/16 8
Main Use: Globally shared medium for data transmission One channel shared by all cores Everyone receives what someone transmits Consistent order of delivery Only one core should transmit at the same $me
Simultaneous transmissions “collide”
WiSync – ASPLOS ’16 – April 4th, 2016
t
t
Core 1
Core 2
USED FOR MOST SYNCHRONIZATION
![Page 9: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/9.jpg)
How WiSync Uses Wireless (II)
4/21/16 9
Special Use: collec$ve binary OR (tones)
WiSync – ASPLOS ’16 – April 4th, 2016
Transmiders Everyone silent
One transmits tone More than one transmits tone
Receiver ‘0’ ‘1’ ‘1’
![Page 10: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/10.jpg)
How WiSync Uses Wireless (II)
4/21/16 10
Special Use: collec$ve binary OR (tones)
WiSync – ASPLOS ’16 – April 4th, 2016
Transmiders Everyone silent
One transmits tone More than one transmits tone
Receiver ‘0’ ‘1’ ‘1’
USED FOR GLOBAL BARRIERS
![Page 11: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/11.jpg)
Wireless Performance and Cost
4/21/16 11 WiSync – ASPLOS ’16 – April 4th, 2016
X. Yu, J. Baylon, P. Wejn, D. Heo, P. Pande, and S. Mirabbasi, “Architecture and Design of Mul5-‐Channel Millimeter-‐Wave Wireless Network-‐on-‐Chip,” IEEE Design & Test, 2014
(Simulated) CMOS 65nm
30 GHz – 90 GHz 16 Gb/s – 48 Gb/s
Antenna+Transceiver: 0.25 mm2 – 0.79 mm2
31 mW – 97 mW
(Extrapolated) CMOS 22nm
60 GHz – 150 GHz 16 Gb/s – 48 Gb/s
Antenna+Transceiver: 0.1 mm2 – 0.3 mm2
16 mW – 48 mW
Technology Scaling
![Page 12: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/12.jpg)
Wireless Performance and Cost
4/21/16 12 WiSync – ASPLOS ’16 – April 4th, 2016
X. Yu, J. Baylon, P. Wejn, D. Heo, P. Pande, and S. Mirabbasi, “Architecture and Design of Mul5-‐Channel Millimeter-‐Wave Wireless Network-‐on-‐Chip,” IEEE Design & Test, 2014
(Simulated) CMOS 65nm
30 GHz – 90 GHz 16 Gb/s – 48 Gb/s
Antenna+Transceiver: 0.25 mm2 – 0.79 mm2
31 mW – 97 mW
(Extrapolated) CMOS 22nm
60 GHz – 150 GHz 16 Gb/s – 48 Gb/s
Antenna+Transceiver: 0.1 mm2 – 0.3 mm2
16 mW – 48 mW
Technology Scaling
![Page 13: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/13.jpg)
WiSync: Main Idea
4/21/16 13
Wireless network appears as a bus Variables declared in program as broadcast
Allocated in Broadcast Memory (BM)
Do not use the regular cache hierarchy
BM: Small per-‐core memory Replicates variables across cores on every write
Writes globally serialized
Each BM line: 64 bits C0 C1 C2 wr0 x wr1 y wr2 x
BM x y
BM x y
BM x y Long bus
(logical)
Wireless Network
![Page 14: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/14.jpg)
Wireless Transceiver
B-‐Memory (BM)
The WiSync Architecture
4/21/16 14
Two wireless networks: data & tone
WiSync – ASPLOS ’16 – April 4th, 2016
·∙·∙ ·∙·∙ ·∙·∙
·∙·∙ ·∙·∙ ·∙·∙
Core + L1 + L2
Atomicity Failure Bit (AFB)
(Data & Tone) Antennas
![Page 15: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/15.jpg)
WiSync: Wireless Channels
4/21/16 15 WiSync – ASPLOS ’16 – April 4th, 2016
Frequency (GHz)
Signal Stren
gth (dB)
Data Transfer Channel
19 GHz 1 GHz
Tone Channel
60 90
![Page 16: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/16.jpg)
Data Transfer Channel
4/21/16 16
A message takes 5 cycles (5 ns) in wireless network Transferring data takes 4 ns Handling collisions takes an addi$onal 1 ns
WiSync – ASPLOS ’16 – April 4th, 2016
1 C 2 3 4
1 C
1 C Core 1
Core 2 1 C 2 3 4
![Page 17: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/17.jpg)
Load from the BM: Local
4/21/16 17 WiSync – ASPLOS ’16 – April 4th, 2016
Data Transfer Channel BM Core
1. Ld (addr, PID)
Transceiv BM
Transceiv
BM
Transceiv
3. val
2. Check PID
![Page 18: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/18.jpg)
The BM keeps the new value in temporary registers
Store into the BM: Broadcast
4/21/16 18 WiSync – ASPLOS ’16 – April 4th, 2016
Data Transfer Channel BM Core
1. st (addr, val)
Transceiv 2. addr, val
BM
Transceiv
BM
Transceiv 3. OK 4. OK
The value is now updated in all BMs (including the local
one)
![Page 19: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/19.jpg)
Store into the BM: Broadcast
4/21/16 19 WiSync – ASPLOS ’16 – April 4th, 2016
Data Transfer Channel BM Core
1. st (addr, val)
Transceiv 2. addr, val
BM
Transceiv
BM
Transceiv
May collide and need to retry
![Page 20: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/20.jpg)
Store into the BM: Broadcast
4/21/16 20 WiSync – ASPLOS ’16 – April 4th, 2016
Data Transfer Channel BM Core
1. st (addr, val)
Transceiv 2. addr, val
BM
Transceiv
BM
Transceiv
Y. OK Z. OK
May collide and need to retry
X. addr, val
![Page 21: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/21.jpg)
Read-‐Modify-‐Write Opera$on to the BM
4/21/16 21 WiSync – ASPLOS ’16 – April 4th, 2016
Data Transfer Channel BM Core
1. F&A (addr) Transceiv
4. addr, new val
BM
Transceiv
BM
Transceiv
5. OK
2. curr val
AFB = 0 (Atomicity Failure Bit)
3. new val
6. OK Transceiver checks the addr of remote stores while RMW is being executed
![Page 22: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/22.jpg)
Read-‐Modify-‐Write Opera$on to the BM
4/21/16 22 WiSync – ASPLOS ’16 – April 4th, 2016
Data Transfer Channel BM Core
1. F&A (addr) Transceiv BM
Transceiv
BM
Transceiv
4. addr, val2
2. curr val
AFB = 1 (Atomicity Failure Bit)
3. new val
6. KO Transceiver checks the addr of remote stores while RMW is being executed
failed
![Page 23: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/23.jpg)
WiSync: Wireless Channels
4/21/16 23 WiSync – ASPLOS ’16 – April 4th, 2016
Frequency (GHz)
Signal Stren
gth (dB)
Data Transfer Channel
19 GHz 1 GHz
Tone Channel
60 90
![Page 24: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/24.jpg)
The Tone Barrier
4/21/16 24
J. Oh, M. Prvulovic, and A. Zajic, “TLSync: support for mul5ple fast barriers using on-‐chip transmission lines,” in Proceedings of ISCA-‐38, 2011, pp. 105–115.
WiSync – ASPLOS ’16 – April 4th, 2016
STEP 1: Start barrier execu$on
First arrival no$fies everyone
= Core not issuing Tone (Listening) = Core issuing Tone
![Page 25: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/25.jpg)
The Tone Barrier
4/21/16 25 WiSync – ASPLOS ’16 – April 4th, 2016
STEP 2: Raise tone
Those that have not arrived raise tone
= Core not issuing Tone (Listening) = Core issuing Tone
![Page 26: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/26.jpg)
The Tone Barrier
4/21/16 26
New arrivals stop sending the tone
WiSync – ASPLOS ’16 – April 4th, 2016
STEP 3: Stop tone
![Page 27: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/27.jpg)
The Tone Barrier
4/21/16 27 WiSync – ASPLOS ’16 – April 4th, 2016
Everyone arrived. The tone disappears. Everyone senses it.
Good to go.
STEP 4: Finish barrier execu$on
![Page 28: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/28.jpg)
The Tone Barrier
4/21/16 28 WiSync – ASPLOS ’16 – April 4th, 2016
Everyone arrived. The tone disappears. Everyone senses it.
Good to go.
STEP 4: Finish barrier execu$on
VERY LOW OVERHEAD BARRIER
![Page 29: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/29.jpg)
Evalua$on Environment
4/21/16 29
Architectural Simulator: Mul$2sim Architecture: manycore with 64 cores at 22nm Core: OOO, 2-‐issue wide, 1GHz Per-‐core BM: 16KB, 5-‐cycle RT, 64-‐bit entries Per-‐core Transceiver+Antennas: 0.12 mm2, 18 mW Applica5ons: SPLASH-‐2 and PARSEC
WiSync – ASPLOS ’16 – April 4th, 2016
![Page 30: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/30.jpg)
Simulated Architectures
4/21/16 30
Baseline: no BM, spinlocks with CAS, centralized barrier Baseline+: no BM, MCS locks, tournament barrier WiSyncNoT: BM, wireless locks and barriers (no tone) WiSync: BM, wireless locks, tone barriers
WiSync – ASPLOS ’16 – April 4th, 2016
![Page 31: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/31.jpg)
Applica$on Speedup over Baseline
4/21/16 31 WiSync – ASPLOS ’16 – April 4th, 2016
Speedup of WiSync over Baseline: 39% (arithme$c mean) over Baseline+: 19%
Tone Channel: not much impact b/c not much $ght barrier synch WiSync works well with many barriers / many locks
![Page 32: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/32.jpg)
Conclusions
4/21/16 32
WiSync: manycore with wireless-‐based synch Per-‐core Broadcast Memory: replicates data
Speed-‐up apps by 19% average over fancy synch Future work:
Rewrite apps to use fine-‐grain communica$on Map graph applica$ons, MPI collec$ves Use broadcast for non-‐synch accesses
WiSync – ASPLOS ’16 – April 4th, 2016
![Page 33: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/33.jpg)
The end
4/21/16 33
Thanks for your a8enNon. QuesNons?
WiSync – ASPLOS ’16 – April 4th, 2016
![Page 34: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/34.jpg)
Also in the Paper…
4/21/16 34
Alloca$on of variables in the BM Bulk transmission support and op$miza$on Sharing the tone channel Support for producer-‐consumer paderns, broadcasts, reduc$ons… Mul$programming, context switching and thread migra$on Barrier Synchroniza$on and CAS Throughput evalua$on
WiSync – ASPLOS ’16 – April 4th, 2016
![Page 35: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/35.jpg)
Tone Channel with Kernel
4/21/16 35 WiSync – ASPLOS ’16 – April 4th, 2016
Tone Barrier is useful with $ght barrier synch
![Page 36: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/36.jpg)
Read-‐Modify-‐Write Code
4/21/16 36
retry: fetch&inc R, BM_addr if(AFB) { jmp retry } else { /* success */ }
WiSync – ASPLOS ’16 – April 4th, 2016
![Page 37: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/37.jpg)
Tone Barrier Code
4/21/16 37
for () { /* work */ barrier: local_sense = !local_sense a
tone_write(addr) spin un$l (tone_read(addr) == local_sense)
}
WiSync – ASPLOS ’16 – April 4th, 2016
![Page 38: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/38.jpg)
Alloca$ng a Variable in the BMs
4/21/16 38 WiSync – ASPLOS ’16 – April 4th, 2016
Data Transfer Channel BM C
1. alloc (addr, PID)
T 2. addr, PID
BM
T
BM
T 3. OK 4. OK
![Page 39: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/39.jpg)
Tone Channel to Implement Barriers
4/21/16 39 WiSync – ASPLOS ’16 – April 4th, 2016
‘0’ ‘1’ ‘1’
White: No Tone (Listening) Blue: Tone
![Page 40: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/40.jpg)
Use of the Data Transfer Channel
4/21/16 40 WiSync – ASPLOS ’16 – April 4th, 2016
LESS THAN 1%!!!
![Page 41: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/41.jpg)
Sensi$vity Study
4/21/16 41 WiSync – ASPLOS ’16 – April 4th, 2016
![Page 42: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/42.jpg)
PHY in Wireless Communica$on
4/21/16 42 Wireless Networks-‐on-‐Chip for Manycore Processors
SER MOD DEMOD DESER
PHY
PHY
Simple schemes to keep area and power low. Example: On-‐Off Keying (OOK)
![Page 43: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/43.jpg)
MAC in Wireless Communica$on
4/21/16 43 Wireless Networks-‐on-‐Chip for Manycore Processors
SER MOD
PHY
DEMOD DESER
PHY MAC MAC
Access to the shared medium needs to be managed: Medium Access Control (MAC) protocol
Challenges: Scalability Low overhead Fairness
Ways to do it: Avoiding sharing Contending for the channel Request to send è Clear to send Token passing
![Page 44: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/44.jpg)
Tone opera$on
4/21/16 44
Barriers will be allocated in everyone’s allocated barrier table. If the core par$cipates in that barrier (PID), armed bit is set to 1. First guy to arrive broadcasts its arrival and causes the alloca$on of the barrier in everyone’s Ac$ve barrier table.
WiSync
Those who par$cipate (armed) raise the tone. Whenever they arrive to the barrier, stop the tone. When everyone has arrived, silence è barrier release. Upon release, deallocate from the ac$ve barrier table. At execu$on’s end, deallocate from the allocated barrier table.
![Page 45: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4](https://reader036.fdocuments.us/reader036/viewer/2022071218/605082ba1a98e56aee113e18/html5/thumbnails/45.jpg)
Sharing the tone channel
4/21/16 45
What happens when there is more than one ac$ve barrier at the same $me (mul$applica$on scenario) Time is sloded: each entry in the Ac$veB table will have its own.
WiSync
Round-‐robin scheduling. When a new barrier becomes ac$ve, reschedule. When a barrier is realeased, reschedule.