Hybrid Electric/Photonic Networks for Scientific Applications on Tiled CMPs Ankit Jain, Shoaib...

Hybrid Electric/Photonic Networks for Scientific Applications on Tiled CMPsAnkit Jain, Shoaib Kamil, Marghoob Mohiyuddin

Final Presentation

Prof. Kubiatowicz

May 14, 2008

Motivation

• Manycore: NoCs key to translating raw performance sustained performance

• Electrical NoC performance/energy doesn’t scale well with technology

• Optical NoC promising+ Low energy, high bandwidth– Setup overhead

• Use hybrid network– Small packets electrical NoC– Large packets optical NoC

Objective

• NoC comparison studies ignore real applications Use real application traces

• Compare NoC performance with simple models

• Study tradeoffs involved

Analytic Model• Three Models

– Bandwidth Model– Bandwidth + Latency Model– Bandwidth + Latency + Contention Model

ELECTRICAL HYBRID

Baseline Architecture

• Small, homogenous cores on a CMP

• Cores ~1.5mm x 1.5mm

• 22nm process, 5GHz

• 3D Integrated CMOS– layer for processors, layers for memory

• We examine two interconnect architectures to compare performance & energy efficiency

Electrical NoC

• Dally’s CMesh topology

• Wormhole routed

• Virtual channels

• Single electrical layer with multiple memory layers

Electrical Simulator (1/2)

• Processor– destination processors take flits out of

the network as soon as they arrive

• Router– XY dimension order routing– Express links on periphery– Virtual channel wormhole routing– Credit based flow control– 8 input ports 8x8 switch

Electrical Simulator (2/2)

• Channels– Buffering at both ends– Maximum wire length = side of

processor core

Hybrid NoC

• Mesh Topology• Electrical “Control Network” (ECN) on Processor Plane• Multiple optical networks on Photonic Plane• Small setup messages on ECN and bulk data transfer on optical network

Blocking Optical Switch

• On message turns• No inactive powerconsumption• Small switching cost• Small active power while

switched on

Capable of routing a single path from any source to any destination

Deadlock in Hybrid NoC

• Blocking 4x4 switch– only one path can be routed at a time

through a switch

• Deadlock is a known issue in circuit switching– Exponential backoff– Dimension order routing– Optical network choice

Hybrid Simulator (1/2)

• 1:1 processor to electrical router mapping– Each electrical router buffers up to 8

path setup messages from its corresponding processor

• Path setup packets are minimally sized: take one cycle to traverse between 2 routers

• Energy includes E-O-E conversions

Hybrid Simulator (2/2)

Synthetic Traces

• Random messages

• Nearest-Neighbor

• Bitreverse

• Tornado– explicitly designed to stress mesh

networks

• Results will be in final report

Real Applications

• SPMD style applications• Widely Used in scientific community• Broken into multiple phases of communication

– implicit barrier is assumed at the end of a communication phase

Parameter Exploration:Electrical NoC

Total buffer size = #vcs X buffer size router areaSmall total buffer size good enough!

Parameter Exploration: Hybrid NoC

•Sensitive to path multiplicity• more available paths = less contention

• Timeouts prevent over- and under-waiting

Performance

Energy

Process-Processor Mapping (1/2)

Process-Processor Mapping (2/2)

NoC as Part of a System• Use Merrimac FP unit numbers• Scale to 22nm using ITRS roadmap• Trace methodology records FP Operations• Compare energy used in FP unit vs energy used

in interconnect

Conclusions• Models accurately predict both performance and

energy consumption

• Hybrid NoC: Majority of energy due to Optical-to-Electrical and Electrical-to-Optical conv. (>94%).

• Additional hardware to the hybrid NoC low energy expended (in contrast to electrical NoC)

• Process-to-processor mapping can significantly impact performance as well as energy consumption.– Finding the optimal mapping is not always of utmost

importance— making sure not to use a ‘bad’ mapping is.

• Bulk-synchronous apps use most of their energy in FP ops

Future Work

• Non-blocking optical mesh interconnection network

• Account for data transfer onto chip

• More accurate full system simulators (for both performance and energy)– simulate FP operations & memory traffic

• More real applications including those that are not SPMD (with overlap of computation and communication)

• Explore applications with less synchronous communication models – Not SPMD

– Overlap of computation and communication

QUESTIONS

Acknowledgements

• John Shalf (CRD/NERSC, Lawrence Berkeley National Labs)

• Prof. John Kubiatowicz (UC Berkeley Computer Science Dept)

• Dr. Keren Bergman (Columbia University)

• BeBOP Research Group (UC Berkeley Computer Science Dept)

References• [1] Assaf Shacham, Keren Bergman, and Luca Carloni. On the Design of a Photonic Network-on-

Chip. In Proceedings of the First International Symposium on Networks-on-Chip, 2007.• [2] James Balfour, and William Dally. Design Tradeoffs for Tiled CMP On-Chip Networks. In

Proceedings of the International Conference on Supercomputing, 2006.• [3] Shoaib Kamil, Ali Pinar, Daniel Gunter, Michael Lijewski, Leonid Oliker, and John Shalf.

Reconfigurable Hybrid Interconnection for Static and Dynamic Applications. In Proceedings of the ACM International Conference on Computing Frontiers, 2007.

• [4] Bergman et. al.. Topology Exploration for Photonic NoCs for Chip Multiprocessors. Unpublished to date.

• [5] Cactus Homepage. http://www.cactuscode.org, 2004.• [6] Z. Lin, S. Ethier, T.S. Hahm, and W.M. Tang. Size Scaling of Turbulent Transport in Magnetically

Confined Plasmas. Phys. Rev. Lett., 88, 2002.• [7] Julian Borrill, Jonathan Carter, Leonid Oliker, David Skinner, and R. Biswas. Integrated

performance monitoring of a cosmology application on leading hec platforms. In Proceedings of the International Conference on Parallel Processing (ICPP), 2005.

• [8] A. Canning, L.W. Wang, A. Williamson, and A. Zunger. Parallel Empirical Pseudopotential Electronic Structure Calculations for Million Atom Systems. J. Comput. Phys., 160:29, 2000.

• [9] Xiaoye S. Li and James W. Demmel. SuperLU-dist: A Scalable Distributed-Memory Sparse Direct Solver for Unsymmetric Linear Systems. ACM Trans. Mathematical Software, 29(2):110140, June 2003.

• [10] J. Qiang, M. Furman, and R. Ryne. A Parallel Particle-in-Cell Model for Beam-Beam Interactions in High Energy Ring Colliders. J. Comp. Phys., 198, 2004.

• [11] IPM Homepage. http://www.nersc.gov/projects/ipm, 2005

Hybrid Electric/Photonic Networks for Scientific Applications on Tiled CMPs Ankit Jain, Shoaib...

Documents

Transcript of Hybrid Electric/Photonic Networks for Scientific Applications on Tiled CMPs Ankit Jain, Shoaib...

The J-Machine Multicomputer: An Architectural …kubitron/courses/cs258...This paper describes a range of experiments performed on the J-Machine to study the effectiveness of the selected

Flattened Butterﬂy : A Cost-Efﬁcient Topology for High-Radix …kubitron/cs258/... · 2008-02-09 · sors—Interconnection architectures; B.4.3 [Hardware]: In-terconnections—Topology

Linker and Libraries Guidesergey/io/cs258/2009/817... · 2008-10-30 · LinkerandLibrariesGuide SunMicrosystems,Inc. 4150NetworkCircle SantaClara,CA95054 U.S.A. PartNo:817–1984–17

Remote Code Execution Through Intel CPU Bugs – 2008 (PDF).sergey/cs258/2010/D2T1 - Kris... · Rustock.C: first rootkit to use PCI/ISA bridge info as crypto-key; Malware works only

CS 258 Parallel Computer Architecture Lecture 4 Network Topology and Routing February 4, 2008 Prof John D. Kubiatowicz kubitron/cs258.

Avoiding Communication in Sparse Matrix Computationsdemmel/Demmel_pubs_07_11_final/… · Avoiding Communication in Sparse Matrix Computations James Demmel, Mark Hoemmen, Marghoob

DTrace User Guide - Dartmouth Computer Sciencecs.dartmouth.edu/~sergey/cs258/DTrace-User-Guide.pdf · Preface TheDTraceUserGuideisalightweightintroductiontothepowerfultracingandanalysistool

Anatomy of a Remote Kernel Exploit - cs.dartmouth.edusergey/io/cs258/2012/Dan-Rosenberg...Copyright © 2011 Virtual Security Research, LLC. All Rights Reserved. 10 Primer: Process

Performance and Energy Comparison of Electrical and Hybrid Photonic Networks for CMPs Ankit Jain, Shoaib Kamil, Marghoob Mohiyuddin, John Shalf, John Kubiatowicz.

Memory Consistency and Event Ordering in Scalable Shared ...kubitron/cs258/handouts/papers/kourosh.pdf · Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors

C10M:& Defending&the&Internetatscale&sergey/cs258/2013/C10M... · how&this&talk&is&organized& • c10k&–Internetscalability&for&the&lastdecade& • C10M–Internetscalability&for&the&nextdecade&

Slide 1 Kubiatowicz, Chaiken and Agarwal, "Closing the Window of Vulnerability in Multiphase Memory Transactions" MIT Computer Science Dept. CS258 Lecture.

Intel x86 Hardware Security Primitives “What tools are in ...sergey/me/cs258/... · Intel system management mode (SMM) is a stealthy execution environment (“ring -2”) for chipset

Dynamic Networks CS 213, LECTURE 15 L.N. Bhuyan. 10/3/2015 CS258 S99 2 What is Dynamic Network Dynamic Network is the network that can connect any input.

Package ‘erer’ - The Comprehensive R Archive Network ‘erer’ February 25, 2016 Title Empirical Research in Economics with R Version 2.5 Date 2016-2-25 Author Changyou Sun

Intel x86 Hardware Security Primitives “What tools are in ...sergey/cs258/2016/... · Intel system management mode (SMM) is a stealthy execution environment (“ring -2”) for

L2 to Off-Chip Memory Interconnects for CMPs Presented by Allen Lee CS258 Spring 2008 May 14, 2008.

Dartmouth)CS258bis:)) )Advanced)Userlands)sergey/me/cs258/guest_lectures/Ro… · Dartmouth)CS258bis:)))Advanced)Userlands) The)an

CS 258 Parallel Computer Architecture Lecture 8 Network Interface Design February 20, 2008 Prof John D. Kubiatowicz kubitron/cs258.

CS 258 Parallel Computer Architecture Lecture 12 Shared Memory Multiprocessors II March 1, 2002 Prof John D. Kubiatowicz kubitron/cs258.