Post on 05-Oct-2020
Optical and molecular technologies in modern computer systems
Rick Lytel
Distinguished Engineer & Director
RAS Computer Analysis Laboratory
Agenda
• Who uses computers, anyway?
• Lay of the land
• System scaling
• Role for new technologies
• Is this different than 10 years ago?
Datacenter Tiers
Warehouse Storage
DatabaseTier
DSSOLTPOperational Storage
ApplicationTier
Web Tier
Users SunRay, Net Appliances,Thin Clients, PCs
Web Web Web Web Web Web Web Web Web Web Web
OLAP DataMart
AppsServer
DataMarts
AppsServer
AppsServer
OLTP Database Server DSS Database Server
10’s smallto big
Lots ofSmall sys
FewBig Sys
Vertical scaling
Horizontal scaling
Sun server platforms
SPARCF3800
F4800F6800
• Vertical scaling, 1 to 106 processors• Web, app, and database servers• SPARC, Solaris, and JAVA
F150002001 releases
Server revenue
Operating systems in place
Processor scaling
Performance scaling
Power scaling at Intel
Source: Fred Pollack, Intel Director of Microprocessor Research Labs, Micro32
Code scaling and complexity
0.01
0.10
1.00
10.00
100.00
1975 1980 1985 1990 1995 2000
Mill
ions
of l
ines
UNIX
BSDSunview
“The merge”
“UNIX WARS”Threads
Solaris 2.6
JAVA
*Windows 2000
Power scaling (ie, Moore’s law) ubiquitous?
High-end system
4 Fan Trays
18 Boardsets with18 CPU-Memory Boards
and 18 I/O orDual CPU Boards
4 Fan Trays
2 System Controllers
Six Dual Input 4 KWAC to 48 volt DCPower Supplies
75"
33" 65"
System board
Four banks of 8 SDRAM DIMMs
4 Data Switch ASICs
DataControl ASIC
AddressASIC
Boot bus ASIC
Boot bus
ASIC
Two sets of 8 CPU Data Switch ASICs
19.35"
16.5"C
PU
E$ DIMMs
CP
U
CP
U
CP
U
Po
wer
Po
wer
Po
wer
F15K Components
CPU-Memory board (18)
I/O or MaxCPUboard (18)
System Controller board (2)
System Controller peripheral board (2)
Control expander frame (2)
System expander frame (18)Expander board (18)
Fan trays (8)
Fan Center-planes (8)
Power centerplane
Logic centerplane
Control expander sockets (2)
System expander sockets (18)
Centerplane ASICs (20)
(One side shown)
Backplane connectivity
PCI
I
I
PCI
PCI
I
I
PCIAdrBusDataXbar
AdrBusDataXbar
P
PM
M
P
PM
M
P
PM
M
P
PM
M
P
PM
M
P
PM
M
P
PCI
PCI
I
IP M
PCI
PCI
I
IP
P
M
PCI
PCI
I
IP
P M
M
PCI
PCI
I
IP
P M
M
PCI
PCI
I
I
P M
P
P M
M
PCI
PCI
I
I
P M
P
P M
M
PCI
PCI
I
I
P
P M
M
P
P M
M
PCI
PCI
I
I
P
P M
M
P
P M
M
PCI
PCI
I
I
P
P
P
PP
P
P
P
I
I
P
P
P
P
I
I
P
PM
M
P
PM
M
PCI
PCI
I
I
P
PM
M
P
PM
M
PCI
PCI
I
I
PCI
I
I
PCI
PCI
I
I
PCIAdrBusDataXbar
AdrBusDataXbar
AdrBusDataXbar
AdrBusDataXbar
P
P
PP
P
P
P PCI
I
I
PCI
PCI
I
I
PCI
P
P
M
M
M
M
Passive centerplane
CPU/memory boards
PCI assemblies
Address crossbar
Data crossbar
CPU/memory boards
PCI assemblies
Interconnections in computing
Circuit Distance Speed Width Link BW Carrier
Gate-gate 1-100 mm 1 GHz 100’s 100 GHz e-
Chip-chip 1 cm 500 MHz 100’s 50 GHz e-
Board-board 10-100 cm 500 MHz 100’s 50 GHz e-
Cabinet-cabinet 1-10 m 2.5 GHz 10’s 25 GHz h?
Floor-floor 10-100 m 100 MHz 1’s 0.1 GHz h?
Campus 100-1000 m 1 GHz 1’s 1 GHz h?
Intracity 1-10 km 2.5 GHz 1’s 2.5 GHz h?
Intercity 10-100 km 2.5 GHz 10’s 25 GHz h? k
Continental 100-1000 km 10 GHz 100’s 1 THz h? k
Intercontinental 1000-10000 km 10 GHz 100’s 1 THz h? k
“the computer”
“the network”
Scaling according to…
EE Shrink Silicon process and lower voltageME Refrigerate computer, then do what EE doesOptics Photons, not electrons, in interconnectsChemist Organic molecular wires & logicBiologist Nucleic acid logic & processor, PCR chipsSS physicist Carbon nanotube gates, HT superconductorsTheorist Quantum computing: it’s been demo’d, QEDGrad student Can I get a job?Marketing “The network is the computer”Customer More for less moneySys Admin More for less workAl Gore “When I invented the internet…”
Expectation gap for optical solutions
What suppliers think we want
• Modest cost
• Reliability of best lasers
• Unique opportunity
• High density
• High speed
• WDM
• External modulation
What we actually want
• Copper cost
• Reliability of Si
• Standard product
• High density
• High speed
• What’s WDM?
• Huh? (we don’t care)
Real reason optics is interesting to us
• These two cables have the same bandwidth
• Big cable is 160 pair, 83 MHz LVDS, up to 10 m
• Small cable is 12 fiber, 1.25 Gbps per fiber multimode ribbon, up to 100 m
• Small cable scales to 2.5 Gbps per fiber, maybe to 10 Gbps) 6”
Modern systems have high RAS
Reliability• component fail rates, in FITs (1 FIT = 1fail per 109
device-hrs)
• subsystem FIT rates and failure modes
Availability• system up-time as fraction of total time (e.g. #9’s…)
• Markov models
Serviceability• state capture and accessibility to service personnel
• ease of repair
If you ignore RAS, it makes you pay
Nov 11, 2001
Nov 12, 2001
• Super-Kamiokande Observatory• electron neutrino detector• 50 M liters water
• Fundamental discovery about ‘missing’ solar neutrinos
• 10 events per year• very high criticality
• One photomultiplier tube exploded
• cascaded to ALL 11,200 tubes• “What happened?”
NOT DESIGNED WITH HIGH RAS
Graduate students in boat cleaning detectors
Glass shards at the bottom of the tank
Trends in microelectronics
• Processor logic density nearing air-cooling limits• low power CMOS, selective clocking, Cu metal, SOI
• IBM’s already announced all four
• Low-voltage 70 nm, but soft error rates and noise margins are not yet measured
• ECC on the pipes, TLBs, ALUs, most registers
• s/w mitigation - checkpointing, CPU sparing
• Packaging moving back to multichip modules• IBM power4 chip multiprocessor (8 cores/module)
• diamond, Cu heat spreaders; novel air flow
• Moore’s (CMOS) law is bending but not yet broken
Trends in high availability systems
• S/w fault management stack
• system complexity prohibits 100% test coverage
• fault boundaries delineated and managed
• Lockstep cores, processor failover w/state capture
• Dynamic system monitoring and fail prediction
• Software rejuvenation to mitigate s/w aging
• 10,000s embedded h/w sensors, registers
• End-end system checksum
Infiniband as the system area networkG
atew
ay
serv
ers
Fro
nt-
En
d s
erve
rs
Mai
l Sto
rage
New
s/W
EB
sto
rag
e
Router
Internet
• Fiber-optic Tx/Rx• 2.5 Gbps/channel• 1, 4, 12 channels• Scaling path to 10 Gbps• No WDM
System board interconnection
MCM
Driver
Lightguide
Fiber
MCM
rcvr
PDarray
lightguide
VCSELarray
• Free space, compliant interconnect (patent pending)•~ 500 lines @ 1 Gbps each• areal routing between system boards• mechanical latch provides alignment
Challenges for chip-chip optics
• Optical elements are much larger than VLSI elements
• 2-10 mm VCSEL and 20-50 mm PD apertures
• 5-10 micron optical waveguides
• VLSI device elements are getting smaller
• 0.10 micron features, micron-sized gates
• 70 nm features are only a few years away
• Utilization of third dimension requires known good die
• Hybrids require new layout and validation tools
• Cost increases vs. performance gained?
The optical opportunities
Inside the box (packaging & process)
• Active cooling
• Low voltage CMOS
• Merged logic and memory
• Asynchronous circuits & systems
• SiGe HBT @ 100K gates, fT ~ 75 GHz
• Free-space or fiber board-board links
Outside the box (transport and switching)
• Infiniband and 10 Gb Ethernet• Router and OC-192 interfaces
A real system scaling limitation
• DRAM cheap, but slow and far (100s nsec) from the processor
• Cache model uses SRAM (L1, L2, L3) with 2-10 clock tick latency
• SRAM limited to < 1 MB in 2000
• Frequent branches generate cache misses & add latency)
1
10
100
1000
1980 1985 1990 1995 2000
Rel
. per
f. c
om
par
ed t
o 1
980
CPU
DRAMPerformance gap
(CPU-memory wall)
UltraSPARC II cache memory
I$
D$
Core Tag RAM
Data RAM
Data
buffer
UPA
Switch
System
Controller
300 + MHzUltraSparc-II Main
MemoryMain
MemoryMain
MemoryMain
Memory
Load to UseMemory Latency
~ 200 ns
System address bus
/144@150 MHz
E-Cache tag address
E-Cache tag data
E-Cache data address
E-Cache data bus
Memory control(RAS, CAS)
/576/144@100 MHz
UPA“port”
6.6 nsec
10 nsec
10 nsec10 nsec
10 nsec
10 nsec
90 nsec
Closing the CPU-memory gap
• Merged logic and memory• fastest memory next to the CPU (practical)
• Hide latency• multithreading, prefetch, out-of-order execution• little help in SMP systems (“Nearly Uniform MA”)
• Larger SRAM• 20 MB/die in 5-7 years• molecular electronics (discussed next)
• Faster DRAM• custom memory modules (lots of $$$)
• Optical interconnects? No...• Latency due to cache model, not wires• Increase, not decrease power, latency, noise
Potential SRAM cache densities
• Six transistors per SRAM cell
• Silicon transistors
• 108 logic transistors/cm2 in 2008 (SIA)
• 109 SRAM transistors/cm 2 in 2008 (SIA)
• 20 MByte SRAM L2 (1 cm2) cache chip
• Nano-transistors (fast, < 1 nsec)
• 1 nm x 10 nm, so 1013 SRAM transistors/cm2
• ~ 0.2 TByte SRAM L2 (1 cm2) cache chip, but…
• power/bit must scale down accordingly
How much is a mole of memory?
• 1 mole = 6.022 x 1023 ‘things’
• A single processor generating addresses at 10
GHz will take 2 x 106 years to touch every word
• Current large servers may have up to 1 sec of
DRAM
• Would need ~ 1013, 10 GHz processors to balance
one mole of memory with present architectures
How large is a mole of memory?
• If we spread out a mole of memory on a 50 A grid, it would cover 1.5 x 107 m2, a square 3.9 km on a side
• Assuming 104 kT per fetch, 1013 processors fetching at 10 GHz dissipate 4 MW
• comparable to one of the ASCI machines
• at 109 kT/op, this is 4 x 105 MW
• If we pack it in a cube we get 0.075 m3, about 42 cm on a side
• ~ 5 MW/m3 at 104 kT/op
Applications to backing store
• Write once
• slow is acceptable
• behaving like tape is acceptable
• Archive the whole file state of machine forever
• infinite undo
• should be non-volatile at zero power
• Trees more probable than meshes…
• Fat trees might be fault-tolerant
Logic gates
• Fan-in and fan-out are required for existing designs
• Level restoration required to assemble a functional
system from billions of devices
• It is possible to obtain gain from a tunnel diode
• discrete component tunnel diode logic attempted in 1960’s
• it was intractable
• twitchy
• Three terminal devices w/power and ground planes
Power distribution
• All electrical logic families require a reference voltage or current to set threshold and a return path
• Immersion in a conductive liquid for global return...
• ...common ground return impedances are a noise source
• Assuming probabilistic assembly it is very important that any power distribution follow the actual logic structure
• it might be easier to power the gates from an energetic compound dissolved in the ground return path
• pump liquid to clear decomposition products and move heat
Essential physics to solve
• Nano-transistor with gain
• Speed < 1 nsec desirable, 1 msec usable
• Low impedance power rails
• Long data buses for multiple SRAMs
• Energy per SRAM bit ~ 102 - 104 kT
• Deterministic nets and devices
Fundamental question for nano is...
• Start with molecules, molecular wires...
• Assemble mesoscopic wires, junctions...
• Assemble stable and predictable computing elements (e.g., gates)by forming structures using the wires, junctions...
• Assemble large-scale circuits for memory, logic, using the computing elements...
• Add ECC circuits...
• Add power and ground circuits, signal planes, I/Os…
How does the bit density, power density, and I/O bandwidth compare to Silicon CMOS @ 50 nm feature size?
What’s changed since ten years ago?
Then
• “A wall” on packaging 50 MHz CMOS CPU
• 2 micron features
• 512 KB RAM chip
• MHz laser modulation
• ARPANET
• Mainframes
• Wires for transport
Now
• CMOS scaling to 5 GHz through 2010
• 0.1 micron features
• 64 MB RAM chip
• 10 GHz laser modulation
• Ubiquitous internet
• SMP servers
• Fiber, EDFA for transport
The more things change, the more they remain the same!