3dfx, nvidia, Moore's Law and more...

NST 121 Computer Systems Fundamentals

INTRODUCTION TO COMPUTERS

Gary Tarolli - 3dfx and Nvidia 3D Graphics Engineer Monday, April 27

3D Graphics from my career perspective 1974-1978 BS. Math RPI (minor in CS)

1979-1980 MS CS Caltech

1980-1983 Digital Equipment Corp

1984-1992 Silicon Graphics, Inc

1992-1993 consulting

1993-2000 3dfx

2000- nvidia

or “Moore’s Law viewed from my career” Moore’s law at 50 (years) publication came in the mail last week …

Various articles in the news too … should we throw a party or a wake ?

Moore’s law in action over 4 decades Moore’s Law : http://www.mooreslaw.org

The most popular formulation is : the number of transistors on and integrated circuit doubles about every two years. (same size chip)

e.g. 500nm to 350nm is sqrt(2) shrink on one side of a chip, so square = 2x as dense (# transistors)

Note: in addition the clock speed increases and the chip area increases (better manufacturing)

Cost per transistor or performance drops!

Result: trends over 4 decades … Mainframe (IBM) => minicomputer (DEC) => workstation (SGI) => PC (3dfx)

The rise of importance of 3D graphics and hence graphics chips

Consolidation in the 3d graphics industry ◦ ~40 3d graphics chip startups in 1994

◦ Only a few independent companies left : nvidia, Imagination Technologies (Power VR)

◦ 2 cpu/system companies : Intel, AMD , Apple

Surprise: graphics chips power supercomputers

Surprise: cars ◦ 8 million cars with nvidia chips in them, many more coming

◦ Self driving cars are coming: enabled by supercomputing power in cheap chips

Surprise: deep neural net learning enabled by this computing power is exploding

Coming soon … ??? The Age of Intelligent Machines by Ray Kurzweil

http://en.wikipedia.org/wiki/The_Singularity_Is_Near

You probably don’t believe this now, see if you do in an hour …

So let’s begin the journey …

http://en.wikipedia.org/wiki/The_Age_of_Intelligent_Machines


1974-1978 : BS. Math & CS RPI 1974 – my first calculator : HP-35 purchased for college ($270? – a few weeks salary)

1975 – my first computer program on an IBM 360 mainframe (using my friends engineering account)

1979-1980 : MS CS Caltech 1979 – played networked Star Trek on Xerox Alto : black and white bit-mapped graphics until 4am , living off of $.25 ice cream sandwiches

1979-1980 : MS CS Caltech … Worked on VLSI CAD tools for custom chips, humans draw every single wire for every single transistor on a chip

inverter inverter

1979-1980 : MS CS Caltech … MIT class projects in 1978

1980-1983 : DEC (minicomputer) #93246 CPUS were still many boards of logic

I worked on VLSI CAD tools so we could design a single chip VAX, called microVAX

And go from this :

A refrigerator filled with boards …

1980-1983 : DEC (minicomputer) … To this …

1984-1992 : SGI (workstation) #36 IRIS 1000 workstation (1984) : $10,000 to $30,000 - 8 MHz Motorola 68010

IRIS 1400 workstation: ran at 10 MHz , had 1.5 MB of RAM and a 73 MB disk drive

My other claim to fame: http://en.wikipedia.org/wiki/SGI_Dogfight

1984-1992 : SGI, Silicon Graphics, Inc …

IRIS Indigo (1992) : $6000 - 33 MHz MIPS R3000 ◦ 100k lines/sec, 10k triangles/sec

◦ Almost all of SGI GL library implemented in software on MIPs


1991: IRIS vision: $4000 board set for the PC, ISA and microchannel ◦ http://en.wikipedia.org/wiki/IrisVision

Intel 486 and bus architecture just too slow, so died in obscurity …

But a few of us (Sellers, Smith, Tarolli, aka SST) and others realized what was coming … faster Pentiums, Moore’s law (smaller, denser chips) , PCI bus …. and that SGI would be out of business some day if it didn’t transform itself

But going from 80% margins to 20% margins is not easy to swallow. They did not … we voted with our feet and left (along with others who went to Nvidia and elsewhere)

and they paid the price…by 2000 SGI was in decline … died in 2009 … about 20 years later …

$0 to $5 billion back to $0

Onyx Reality Engine (1992) : $50,000 to $80,000 – 100 MHz R4400

Beautiful real-time texture mapped graphics (divide per pixel) ◦ 1M triangles/sec, 100 Mpixels/sec


1993-2000 : 3Dfx (PC) employee #1 Why:

◦ Entrepreneurs – eventually need to start their own company (and hopefully get rich in the process)

◦ We saw a problem within SGI, and an opportunity in 3d PC graphics

◦ Engineers – we saw a cool problem and wanted to solve it

◦ We realized the gaming market was a lot bigger than anyone knew ◦ ~$5B at the time, almost as big as movie industry

◦ Today it is MUCH larger, over $100B worldwide for all games, dwarfs the movie industry

Goal: ◦ Produce similar images as Reality Engine for $500 in real-time, i.e. 30 fps

◦ Similar means reduced quality (less bit depth) but still excellent

Activation energy: Caroline said “Just do it” one day

1993-2000 : 3Dfx (PC) … How:

◦ Take maximum usage of just arriving technology

◦ Aim high – don’t sacrifice quality, do the entire Reality Engine pipeline at full speed

◦ Make it easy to program , no difficult choices : e.g. trading off speed for quality ◦ Included ALL the important features of Reality Engine: shading, zbuffering, alpha-blending, fog, quality texturing and filtering

◦ Listened to game developers and professionals – tech. advisory board ◦ John Carmack (id)

◦ Tim Sweeney (Epic)

◦ Tom Porter (Pixar)

A bit of luck, ok a lot? ◦ $500 too costly for consumer market, so we targeted the arcades

◦ And 3dfx ended up in various arcade machines, SF Rush, Gretzky Hockey, NFL Blitz, Mace, etc.

◦ Memory prices fell dramatically resulting in a $300 board and enabled the consumer market

1993-2000 : 3Dfx (PC) … Key to quality texture mapping is per-pixel divide

◦ Very costly

◦ Key is to be just good enough

◦ We didn’t need 32 bit results, only about 18-20 bits ◦ Just enough to not be visually distracting

◦ So we used a table lookup, and then linear interpolation (which helped a lot) ◦ Remember those sin/cos/tan tables in high school trig? Same basic idea

◦ 6 bit index (64 entries, 15 bits wide, ends up in a PLA optimized ROM)

◦ 4 bit interpolation, adds another 3-4 bits

◦ Input is float, so shift result by exponent since log(1/x) = -log(x) = -exponent(x) in float representation

Simplify full equations using math, e.g. LOD = .5 * Log2 ( sqrt(dsdx2 + dsdy2))

◦ Log2 (sqrt(x)) = .5 * Log2

(x)

1993-2000 : 3Dfx

1993-2000 : 3Dfx … C simulator

◦ Very fast bit accurate simulator for the chip

◦ 10k to 50k lines of C code

◦ Can research algorithms quickly

◦ Up and running well before RTL simulator

◦ You can develop software and hardware tests on C simulator

RTL simulator ◦ Verilog

Before tapeout, we compare C vs Verilog results for chip functional tests that we write

Story time : code then test, vs test then code

1993-2000 : 3Dfx… debugging Yogi Berra: In theory there is no difference between theory and practice. In practice there is.

From Bandits? : Always expect the unexpected, except of course the truly unexpected …

Me: If you cannot believe there is a bug (in your code), then you will never find it.

1993-2000 : 3Dfx Voodoo 1 Voodoo 1 – 50 Mhz chip, 500 nm chip, 50 Mhz mem (4MB), 50 Mpixels/sec

◦ Each chip was ~1 million transistors, 250k gates

1993-2000 : 3Dfx Voodoo 1 System architecture – perhaps my best work ever (along with Scott Sellers)

1993-2000 : 3Dfx Voodoo 1 results Images tell the story … compared to Reality Engine …

1993-2000 : 3Dfx Voodoo 2

1993-2000 : 3Dfx Voodoo 2 , 3 Voodoo 3 : ~4 years after Voodoo 1

1 chip vs 2-3 chips

Density: 250 nm vs 500 nm = 4x more logic (2x went to reduce the chip count)

Clock rate: 50 Mhz to 200 Mhz

Memory: 50 Mhz to 166 Mhz , 4 MB to 16 MB

https://en.wikipedia.org/wiki/Comparison_of_3dfx_graphics_processing_units




2000-now : nvidia We goofed, missed a product cycle/schedule, tactical and strategic mistakes and poof!

◦ Another one bites the dust

One strategic mistake – we did not put T&L on a chip until too late ◦ our next product had T&L , but it was still in the lab

◦ I thought CPU companies (Intel, IBM, AMD) had more at stake in floating point than we did

◦ They peaked out at 8-16 cores, and IEEE float performance was not their #1 priority

◦ GPUs became more important than I think anyone ever thought (we didn’t truly believe ourselves?)

◦ Enabled high $$$ investment in GPU floating point, where I thought it would end up on CPU

◦ Supercomputer speed floating point is basically for free on a GPU ◦ 80% of the GPU area is just a massively parallel SIMD floating point supercomputer

◦ Many times more powerful than the early CRAY supercomputers

2000-now : nvidia Titan X Unreal Engine demo: http://content.jwplatform.com/previews/tDgR1DxI-sy1F28d9

4x8 green dots = one SM (SIMD cpu)

3072 of them on the die

Each is ~Voodoo 2 or more

2000-now : 1995 + 20 years = 2015 over 20 years Moore’s law says we should expect 2**10 increase or 1000x

Voodoo 1 Titan X x increase

Transistors 2 M (2 chips) 8000 M 4000

Cores 1 2000-3000 2500

Technology 500 nm 28 nm 300

Area 100 mm2 600 mm2 6

Triangles/sec 1 M 6000 M 6000

Mpixels/sec 100M 100,000 M 1000

Ops/sec 5 B (8b) 7000 B (32b ieee) 1000

Memory b/w < 1 GB/sec 340 GB/sec 400

Power 4 watts 250 watts (the price you pay)

Frequency 50 Mhz 1000 Mhz 20

Memory 4 MB 12,000 MB 3000

Cost $500 $1000 2

Design 5 man years ($5M) >500 man years ($500M) 100

CPUs vs GPUS Graphics is embarrassingly parallel ! (millions of pixels on the screen)

◦ Which is why 1000-3000 cores can be efficient

◦ If your PC has 1000-3000 cores, what would they do?

PIXAR field trip (while at 3dfx) ◦ Server room full of Sun workstations

◦ Limit is how much computing power you can fit in that physical room (and A/C)

Supercomputers ◦ Super computers are often limited to a power budget in MWatts for cpus and A/C

◦ Once GPUs were general enough and supported 32b and 64b IEEE floating point ….

2000-now : 3dfx + nvidia … looking back Need I say more:

1995: 0% of consumer PCs have 3d graphics accelerators

2015: 100% penetration (embedded accelerator in all Intel and AMD chips)

Deep neural net analysis, deep learning Is this the key to Artificial Intelligence becoming real?

Intel 16 core XEON = 43 days to train a DNN problem

Titan-X = 1.5 days

Next year < 1 day

5 years … 1 hour (with software advances)

20 years … 1 sec to 1 minute ?

Coming soon … ??? The Age of Intelligent Machines by Ray Kurzweil

Now do you believe?

Is Artificial Intelligence really almost here?

GPU Fanatic (last week this came in my nvidia email)

Ray Kurzweil, a renowned futurist and the director of engineering at Google: “…the hardware needed to emulate the human brain may be ready even sooner than he predicted — in around 2020 — using technologies such as graphics processing units (GPUs), which are ideal for brain-software algorithms.” (Washington Post, 4/23/14)



Self promoting Links:

http://www.thedodgegarage.com/3dfx/

https://en.wikipedia.org/wiki/3dfx_Interactive

simply google everything else, e.g. deep learning (that’s what I did)





3dfx, nvidia, Moore's Law and more...

Technology

Transcript of 3dfx, nvidia, Moore's Law and more...