Evolution of the Graphical Processing Unit
description
Transcript of Evolution of the Graphical Processing Unit
Evolution of the Graphical Processing Unit
A professional paper submitted in partial fulfillment of the requirements for the degree of Master of Science with a major in Computer Science.
Thomas Scott Crow
February 3, 2005
Acknowledgements
I would like to thank Dr. Harris for his considerable patience and help.
I would like to thank my committee members, Dr. Egbert and Dr. Mensing for their valuable time.
Overview
Introduction“Computer Graphics” MilestonesThe Modern GPUGeneral Purpose GPU ComputingFuture of the GPU
Introduction
Definition: Used primarily for 3D applications, a graphical processing unit (GPU) is a single chip processor that creates lighting effects and transforms objects every time a 3D scene is redrawn. These are mathematically intensive tasks, which otherwise would put quite a strain on the CPU.
History: Graphics computation has evolved from software written to perform graphics functions and run on the main CPU to specialized hardware to run certain types of graphics computation and the CPU performing the rest, to a fully implemented 3D graphics pipeline run entirely on a GPU. This history has followed closely the idea of the “Wheel of Reincarnation” first presented by Sutherland and Myers in a 1968 ACM paper.
Introduction
Sutherland and
Myer’s, “Wheel of Reincarnation”
“Computer Graphics” Milestones
MIT’s Whirlwind Project - 1944Significance: First computer built specifically for interactive, real-
time control which displayed real-time text and graphics on a video terminal.
“Computer Graphics” Milestones
“Magnetic” Core Memory (RAM) – 1951Significance: Miniaturization, speed, and non-volatility.
“Computer Graphics” Milestones
SAGE (Semi-Automatic Ground Environment) – 1958
Significance: Introduced real-time software, showed feasibility of CRTs in interactive computing, and the light-pen as an input device.
“Computer Graphics” Milestones
SAGE (Semi-Automatic Ground Environment) –1958
With light-pen
“Computer Graphics” Milestones
MIT’s TX-0 (Transistorized Experimental Computer Zero) – 1956
Significance: First real-time, programmable, general-purpose computer made entirely from transistors and first ever operating system.
“Computer Graphics” Milestones
MIT’s TX-2 – 1959Significance: Specialized I/O circuitry allowed for “online”
computing which allowed for the creation of Sutherland’s “Sketchpad”.
“Computer Graphics” Milestones
Ivan Sutherland’s Sketchpad – 1963Significance: Precursor of the direct manipulation computer
graphic interface of today. Ancestor of Computer Aided Design (CAD) and the modern graphical user interface.
“Computer Graphics” Milestones
Digital Equipment Corporation (DEC) and the Minicomputer – 1957
Significance: Drastic shift away from the mainframe “time-sharing” model of computing. The VAX supermini would become the workhorse for the CAD industry.
“Computer Graphics” Milestones
Computer Aided Design (CAD) SystemsSignificance: Furthered the concept of Sketchpad by allowing the
creation, rotation, and manipulation of 3D models.
General Motors DAC-1
“Computer Graphics” Milestones
Information Displays IDIIOM
“Computer Graphics” Milestones
The PC RevolutionSignificance: Allowed the computing power of the early
mainframes and minicomputers to be available to consumers.
Intel 4004, the first Microprocessor
“Computer Graphics” Milestones
The Altair 8800 is considered the first personal computer.
The Modern GPU
Graphical Processing Unit (GPU)
The Modern GPU
Professional Graphics Adapter (PGA) First processor based video card with an Intel 8088
microprocessor onboard. All video related tasks were performed by onboard
microprocessor.
The Modern GPU
Silicon Graphics Inc. (SGI) – 1980’sSGI’s two most important contributions to the modern
GPU - Vendor independent
Application Programming Interface (API) for the development of 2D and 3D graphics applications.
has become an industry standard API used and supported by all major vendors.
Graphics Pipeline - A conceptual model of stages that graphics data is sent through. It is simply a process for converting 3D coordinates of a model into 2D screen images.
The Modern GPU
3D Graphics Pipeline from nVidia
Generalized 2-Step Graphics Pipeline
Geometry Stage – Changes 3D object coordinates into 2D window coordinates.
Rendering Stage - Fills the area of pixels between the 2D coordinates with pixels to represent the surface of the object.
The Modern GPU
The Modern GPU Main Components of the Geometry
Stage
Transform and Lighting – Transform is the process of displaying the coordinates of a 3D object onto a 2D space and lighting is the process of providing lighting effects to the scene.
Triangle Setup – Converts triangle vertices into pixels and computes the rate of change of color values between pixels.
The Modern GPU
GPU Timeline
The Modern GPU
Transform Matrix Multiplication
Transform Matrix – Made up of many interim action matrices multiplied together.
Interim Action Matrix – Includes such actions as scaling, rotation, translation, etc.
The Modern GPU
Fixed Function Pipeline
The Modern GPU
Programmable Pipeline
Vertex Programs replace the T&L stages of pipeline Fragment Programs replace multi-texturing and blending
The Modern GPU The Classic Von Neumann Architecture
Von Neumann Bottleneck is the separation between the CPU and memory.
The Modern GPU
The Stream Processing Model
Streams are sets of sequential data elements that require similar computation.
Kernels are pieces of code that operate on every element of a stream.
The Modern GPU
Three Levels of Parallelism Exposed by the Stream Processing Model
Instruction-Level Parallelism – Simultaneous execution of multiple instructions within a kernel.
Data-Level Parallelism – Instruction execution on multiple stream elements simultaneously.
Task-Level Parallelism – Multiple stream processors can divide the work from one kernel or different kernels run on different stream processors.
The Modern GPU Memory Access is Expensive:CPUs use caches to reduce off-chip memory access.Caches benefit from:
Spatial Locality – Items located physically near an item referenced in the near past will have a higher probability of being referenced in the near future.
Temporal Locality – Items referenced in the near past have a higher probability of being re-referenced in the near future.
GPUs benefit from: Producer-Consumer Locality – Production of a stream that
is immediately consumed by another kernel.Memory-to-Arithmetic Operations Ratio:
Traditional Accumulator 1:1 Scalar Processor 1:4 Stream Processor 1:100
General Purpose GPU Computing
Why General Purpose Computing on a GPU? GPUs are not hampered by the classic sequential code
structure of the CPU. Basically means that GPUs can more effectively utilize additional transistors.
Moore’s Law says transistor count at a given die size doubles every 18 months. That of a GPU doubles every 6 months.
Pentium 4 has 222 million transistors. GeForce 6 has more than double.
Speed - The lure of raw computational power; parallelism. Cost - The multi-billion dollar gaming industry drives down
the cost of the commodity GPU making it a very cost effective alternative to the CPU.
General Purpose GPU Computing
Moore’s Law Cubed
From ‘Stream Programming Environments’ – Hanrahan, 2004
General Purpose GPU Computing
Current Research Topics Computer Vision Computational Geometry Stream Processing Cloud Simulation Ice Crystal Growth Simulation Database Queries Monte Carlo Methods Computational Fluid Dynamics Collision Detection Voronoi Computations Molecular Dynamics Many More…
General Purpose GPU Computing
Stanford’s “General Purpose” Imagine Stream Processor
General Purpose GPU Computing
Imagine Bandwidth Hierarchy
General Purpose GPU Computing
Matrix-Matrix Multiplication – A Test CaseC=AB, where A and B are large, dense NxN matrices.
System Requirements:CPU Test:
Pentium III 750MHz ScienceMark 2.0 – BLAS (Basic Linear Algebra
Subprograms) software suite.GPU Test:
GeForce FX 5200 – 1st fully programmable 3D Graphics Pipeline GPU.
Source code from GPUBench suite of performance testing tools, which is written in Cg “C for Graphics”.
Microsoft Visual Studio .Net 2003 – Programming Environment. Cygwin – Linux environment for MS Windows.
General Purpose GPU Computing
Results
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 200 400 600 800 1000 1200 1400
GF
LO
PS
Dimension of Square Matrices
GeForce FX 5200
Pentium III 750MHz
General Purpose GPU Computing
Efficiencye =
CPU: Theoretical peak GFLOPS for the Pentium III 750MHz is 3 GFLOPS. Observed Peak GFLOPS for this test was 1.2 GFLOPS.
e = 40% efficiency
GPU: Theoretical peak GFLOPS for the GeForce FX 5200 is 4 GFLOPS. Observed Peak GFLOPS for this test was 0.6 GFLOPS.
e = 15% efficiency
NOT EXPECTED In this test the GPU is capable of 25% more GFLOPS than
the CPU, but was found to perform ½ as well.
General Purpose GPU Computing
c
Future of the GPU
Potential Improvements Design of new algorithms New languages that are highly parallel and data
streaming capable. Compilers and tools to advance parallel stream
programming.• Stanford University’s BrookGPU
Memory bandwidth hierarchy improvements.
Future of the GPU
GPU Clusters nVIDIA SLI (Scalable Link Interface)
Can double the performance from a single GPU
Future of the GPU
Examples of Load Balancing: Alternate Frame Rendering
Future of the GPU
Examples of Load Balancing: Split Frame Rendering
Future of the GPUGPU Clustering at Stony Brook
University
Evolution of the Graphical Processing Unit
Questions