3dfx, nvidia, Moore's Law and more...
-
Upload
azul-systems -
Category
Technology
-
view
287 -
download
0
Transcript of 3dfx, nvidia, Moore's Law and more...
NST 121 Computer Systems Fundamentals
INTRODUCTION TO COMPUTERS
Gary Tarolli - 3dfx and Nvidia 3D Graphics Engineer Monday, April 27
3D Graphics from my career perspective 1974-1978 BS. Math RPI (minor in CS)
1979-1980 MS CS Caltech
1980-1983 Digital Equipment Corp
1984-1992 Silicon Graphics, Inc
1992-1993 consulting
1993-2000 3dfx
2000- nvidia
or “Moore’s Law viewed from my career” Moore’s law at 50 (years) publication came in the mail last week …
Various articles in the news too … should we throw a party or a wake ?
Moore’s law in action over 4 decades Moore’s Law : http://www.mooreslaw.org
The most popular formulation is : the number of transistors on and integrated circuit doubles about every two years. (same size chip)
e.g. 500nm to 350nm is sqrt(2) shrink on one side of a chip, so square = 2x as dense (# transistors)
Note: in addition the clock speed increases and the chip area increases (better manufacturing)
Cost per transistor or performance drops!
Result: trends over 4 decades … Mainframe (IBM) => minicomputer (DEC) => workstation (SGI) => PC (3dfx)
The rise of importance of 3D graphics and hence graphics chips
Consolidation in the 3d graphics industry ◦ ~40 3d graphics chip startups in 1994
◦ Only a few independent companies left : nvidia, Imagination Technologies (Power VR)
◦ 2 cpu/system companies : Intel, AMD , Apple
Surprise: graphics chips power supercomputers
Surprise: cars ◦ 8 million cars with nvidia chips in them, many more coming
◦ Self driving cars are coming: enabled by supercomputing power in cheap chips
Surprise: deep neural net learning enabled by this computing power is exploding
Coming soon … ??? The Age of Intelligent Machines by Ray Kurzweil
http://en.wikipedia.org/wiki/The_Singularity_Is_Near
You probably don’t believe this now, see if you do in an hour …
So let’s begin the journey …
1974-1978 : BS. Math & CS RPI 1974 – my first calculator : HP-35 purchased for college ($270? – a few weeks salary)
1975 – my first computer program on an IBM 360 mainframe (using my friends engineering account)
1979-1980 : MS CS Caltech 1979 – played networked Star Trek on Xerox Alto : black and white bit-mapped graphics until 4am , living off of $.25 ice cream sandwiches
1979-1980 : MS CS Caltech … Worked on VLSI CAD tools for custom chips, humans draw every single wire for every single transistor on a chip
inverter inverter
1979-1980 : MS CS Caltech … MIT class projects in 1978
1980-1983 : DEC (minicomputer) #93246 CPUS were still many boards of logic
I worked on VLSI CAD tools so we could design a single chip VAX, called microVAX
And go from this :
A refrigerator filled with boards …
1980-1983 : DEC (minicomputer) … To this …
1984-1992 : SGI (workstation) #36 IRIS 1000 workstation (1984) : $10,000 to $30,000 - 8 MHz Motorola 68010
IRIS 1400 workstation: ran at 10 MHz , had 1.5 MB of RAM and a 73 MB disk drive
My other claim to fame: http://en.wikipedia.org/wiki/SGI_Dogfight
1984-1992 : SGI, Silicon Graphics, Inc …
IRIS Indigo (1992) : $6000 - 33 MHz MIPS R3000 ◦ 100k lines/sec, 10k triangles/sec
◦ Almost all of SGI GL library implemented in software on MIPs
1984-1992 : SGI, Silicon Graphics, Inc …
1991: IRIS vision: $4000 board set for the PC, ISA and microchannel ◦ http://en.wikipedia.org/wiki/IrisVision
Intel 486 and bus architecture just too slow, so died in obscurity …
But a few of us (Sellers, Smith, Tarolli, aka SST) and others realized what was coming … faster Pentiums, Moore’s law (smaller, denser chips) , PCI bus …. and that SGI would be out of business some day if it didn’t transform itself
But going from 80% margins to 20% margins is not easy to swallow. They did not … we voted with our feet and left (along with others who went to Nvidia and elsewhere)
and they paid the price…by 2000 SGI was in decline … died in 2009 … about 20 years later …
$0 to $5 billion back to $0
Onyx Reality Engine (1992) : $50,000 to $80,000 – 100 MHz R4400
Beautiful real-time texture mapped graphics (divide per pixel) ◦ 1M triangles/sec, 100 Mpixels/sec
1984-1992 : SGI, Silicon Graphics, Inc …
1993-2000 : 3Dfx (PC) employee #1 Why:
◦ Entrepreneurs – eventually need to start their own company (and hopefully get rich in the process)
◦ We saw a problem within SGI, and an opportunity in 3d PC graphics
◦ Engineers – we saw a cool problem and wanted to solve it
◦ We realized the gaming market was a lot bigger than anyone knew ◦ ~$5B at the time, almost as big as movie industry
◦ Today it is MUCH larger, over $100B worldwide for all games, dwarfs the movie industry
Goal: ◦ Produce similar images as Reality Engine for $500 in real-time, i.e. 30 fps
◦ Similar means reduced quality (less bit depth) but still excellent
Activation energy: Caroline said “Just do it” one day
1993-2000 : 3Dfx (PC) … How:
◦ Take maximum usage of just arriving technology
◦ Aim high – don’t sacrifice quality, do the entire Reality Engine pipeline at full speed
◦ Make it easy to program , no difficult choices : e.g. trading off speed for quality ◦ Included ALL the important features of Reality Engine: shading, zbuffering, alpha-blending, fog, quality texturing and filtering
◦ Listened to game developers and professionals – tech. advisory board ◦ John Carmack (id)
◦ Tim Sweeney (Epic)
◦ Tom Porter (Pixar)
A bit of luck, ok a lot? ◦ $500 too costly for consumer market, so we targeted the arcades
◦ And 3dfx ended up in various arcade machines, SF Rush, Gretzky Hockey, NFL Blitz, Mace, etc.
◦ Memory prices fell dramatically resulting in a $300 board and enabled the consumer market
1993-2000 : 3Dfx (PC) … Key to quality texture mapping is per-pixel divide
◦ Very costly
◦ Key is to be just good enough
◦ We didn’t need 32 bit results, only about 18-20 bits ◦ Just enough to not be visually distracting
◦ So we used a table lookup, and then linear interpolation (which helped a lot) ◦ Remember those sin/cos/tan tables in high school trig? Same basic idea
◦ 6 bit index (64 entries, 15 bits wide, ends up in a PLA optimized ROM)
◦ 4 bit interpolation, adds another 3-4 bits
◦ Input is float, so shift result by exponent since log(1/x) = -log(x) = -exponent(x) in float representation
Simplify full equations using math, e.g. LOD = .5 * Log2 ( sqrt(dsdx2 + dsdy2))
◦ Log2 (sqrt(x)) = .5 * Log2
(x)
1993-2000 : 3Dfx
1993-2000 : 3Dfx … C simulator
◦ Very fast bit accurate simulator for the chip
◦ 10k to 50k lines of C code
◦ Can research algorithms quickly
◦ Up and running well before RTL simulator
◦ You can develop software and hardware tests on C simulator
RTL simulator ◦ Verilog
Before tapeout, we compare C vs Verilog results for chip functional tests that we write
Story time : code then test, vs test then code
1993-2000 : 3Dfx… debugging Yogi Berra: In theory there is no difference between theory and practice. In practice there is.
From Bandits? : Always expect the unexpected, except of course the truly unexpected …
Me: If you cannot believe there is a bug (in your code), then you will never find it.
1993-2000 : 3Dfx Voodoo 1 Voodoo 1 – 50 Mhz chip, 500 nm chip, 50 Mhz mem (4MB), 50 Mpixels/sec
◦ Each chip was ~1 million transistors, 250k gates
1993-2000 : 3Dfx Voodoo 1 System architecture – perhaps my best work ever (along with Scott Sellers)
1993-2000 : 3Dfx Voodoo 1 results Images tell the story … compared to Reality Engine …
1993-2000 : 3Dfx Voodoo 2
1993-2000 : 3Dfx Voodoo 2 , 3 Voodoo 3 : ~4 years after Voodoo 1
1 chip vs 2-3 chips
Density: 250 nm vs 500 nm = 4x more logic (2x went to reduce the chip count)
Clock rate: 50 Mhz to 200 Mhz
Memory: 50 Mhz to 166 Mhz , 4 MB to 16 MB
https://en.wikipedia.org/wiki/Comparison_of_3dfx_graphics_processing_units
2000-now : nvidia We goofed, missed a product cycle/schedule, tactical and strategic mistakes and poof!
◦ Another one bites the dust
One strategic mistake – we did not put T&L on a chip until too late ◦ our next product had T&L , but it was still in the lab
◦ I thought CPU companies (Intel, IBM, AMD) had more at stake in floating point than we did
◦ They peaked out at 8-16 cores, and IEEE float performance was not their #1 priority
◦ GPUs became more important than I think anyone ever thought (we didn’t truly believe ourselves?)
◦ Enabled high $$$ investment in GPU floating point, where I thought it would end up on CPU
◦ Supercomputer speed floating point is basically for free on a GPU ◦ 80% of the GPU area is just a massively parallel SIMD floating point supercomputer
◦ Many times more powerful than the early CRAY supercomputers
2000-now : nvidia Titan X Unreal Engine demo: http://content.jwplatform.com/previews/tDgR1DxI-sy1F28d9
4x8 green dots = one SM (SIMD cpu)
3072 of them on the die
Each is ~Voodoo 2 or more
2000-now : 1995 + 20 years = 2015 over 20 years Moore’s law says we should expect 2**10 increase or 1000x
Voodoo 1 Titan X x increase
Transistors 2 M (2 chips) 8000 M 4000
Cores 1 2000-3000 2500
Technology 500 nm 28 nm 300
Area 100 mm2 600 mm2 6
Triangles/sec 1 M 6000 M 6000
Mpixels/sec 100M 100,000 M 1000
Ops/sec 5 B (8b) 7000 B (32b ieee) 1000
Memory b/w < 1 GB/sec 340 GB/sec 400
Power 4 watts 250 watts (the price you pay)
Frequency 50 Mhz 1000 Mhz 20
Memory 4 MB 12,000 MB 3000
Cost $500 $1000 2
Design 5 man years ($5M) >500 man years ($500M) 100
CPUs vs GPUS Graphics is embarrassingly parallel ! (millions of pixels on the screen)
◦ Which is why 1000-3000 cores can be efficient
◦ If your PC has 1000-3000 cores, what would they do?
PIXAR field trip (while at 3dfx) ◦ Server room full of Sun workstations
◦ Limit is how much computing power you can fit in that physical room (and A/C)
Supercomputers ◦ Super computers are often limited to a power budget in MWatts for cpus and A/C
◦ Once GPUs were general enough and supported 32b and 64b IEEE floating point ….
2000-now : 3dfx + nvidia … looking back Need I say more:
1995: 0% of consumer PCs have 3d graphics accelerators
2015: 100% penetration (embedded accelerator in all Intel and AMD chips)
Deep neural net analysis, deep learning Is this the key to Artificial Intelligence becoming real?
Intel 16 core XEON = 43 days to train a DNN problem
Titan-X = 1.5 days
Next year < 1 day
5 years … 1 hour (with software advances)
20 years … 1 sec to 1 minute ?
Coming soon … ??? The Age of Intelligent Machines by Ray Kurzweil
Now do you believe?
Is Artificial Intelligence really almost here?
GPU Fanatic (last week this came in my nvidia email)
Ray Kurzweil, a renowned futurist and the director of engineering at Google: “…the hardware needed to emulate the human brain may be ready even sooner than he predicted — in around 2020 — using technologies such as graphics processing units (GPUs), which are ideal for brain-software algorithms.” (Washington Post, 4/23/14)
Self promoting Links:
http://www.thedodgegarage.com/3dfx/
https://en.wikipedia.org/wiki/3dfx_Interactive
simply google everything else, e.g. deep learning (that’s what I did)