Post on 08-May-2015
AMD: DV Club - Westford MA22 May 30, 2008
How Shaders are Created
Application
GPU DriverVideo BIOS
API
AMD: DV Club - Westford MA23 May 30, 2008
Images
AMD: DV Club - Westford MA24 May 30, 2008
No correction
Display Processing
Advanced Gamma and Color Correction
AMD: DV Club - Westford MA25 May 30, 2008
No correctionAvivo Display Engine 10-bitgamma and color correction
Display Processing
Advanced Gamma and Color Correction
AMD: DV Club - Westford MA26 May 30, 2008
“Call of Juarez” using DirectX 9
AMD: DV Club - Westford MA27 May 30, 2008
“Call of Juarez” using DirectX 10
AMD: DV Club - Westford MA28 May 30, 2008
GPU Verification
AMD: DV Club - Westford MA29 May 30, 2008
Graphics Verification Challenges
Large complex ASICs:
� Approaching 1B xtrs; >50 different clocks; > 600 MHZ; >100 top level tiles
� Parallel SIMDs, Multiple pipelines; hundreds of threads in flight; >300 ALUs
� High BW memory/cache interface; PCI Express; Display Ports
3rd party compliance: DirectX and OpenGL Graphic APIs and Apps
Firmware critical to ASIC function
� ASIC validation utilizes firmware release as part of tape out
� Firmware debug requires significant amounts of time
Full frames processing requires days/weeks of RTL simulation
Market window small – consumer market is harsh!
� Schedule is KING
� Need incremental development; hierarchy and reuse prior
� Respins are costly; time to market is critical
� Christmas, Dads/Grads, or bust!
AMD: DV Club - Westford MA30 May 30, 2008
GPU Architecture
AMD: DV Club - Westford MA31 May 30, 2008
Top LevelRadeon 2900
Red – Compute
Yellow – Cache
Unified shader
Shader R/W
Instr./Const. cache
Unified texture cache
Compression
4 SIMDs
16 Pipelines/SIMD
5 Stream processes
(32bit FP) per pipeline
320 ALU ops in parallel
Over 700M transistors
Z/S
tencil
Cache
Color Cache
VertexAssembler
Command Processor
Geometry
Assembler
Rasterizer
InterpolatorsHie
rarc
hic
al Z
ShaderC
aches
Instru
ctio
n &
Consta
nt
Vertex Index Fetch
Stream
Out
L1 T
extu
re C
ache
L2 T
extu
re C
ache
Tessellator
UltraUltra--Threaded Dispatch ProcessorThreaded Dispatch Processor
Shader Export
Unified
Shader
Processors
Unified
Shader
Processors
Render Back-EndsRender Back-Ends
Textu
re U
nits
Textu
re U
nits
Mem
ory
Read/W
rite
Cache
Setup Unit
Setup Unit
Z/S
tencil
Cache
Color Cache
VertexAssembler
Command Processor
Geometry
Assembler
Rasterizer
InterpolatorsHie
rarc
hic
al Z
ShaderC
aches
Instru
ctio
n &
Consta
nt
Vertex Index Fetch
Stream
Out
L1 T
extu
re C
ache
L2 T
extu
re C
ache
Tessellator
UltraUltra--Threaded Dispatch ProcessorThreaded Dispatch Processor
Shader Export
Unified
Shader
Processors
Unified
Shader
Processors
Render Back-EndsRender Back-Ends
Textu
re U
nits
Textu
re U
nits
Mem
ory
Read/W
rite
Cache
Setup Unit
Setup Unit
AMD: DV Club - Westford MA32 May 30, 2008
Technical Solutions
Layered CODE Methodology
� Multiple Layers of Testbenches
� Maximize Controllability, Observability, and Debug Efficiency
� Reference Model
Tools
� Coverage and assertions
� Visualization
HW Emulation
AMD: DV Club - Westford MA33 May 30, 2008
Layered CODE Verification
minutes
minutes-hours
hours – days
days -weeks
Debug / Fix Efficiency
MostMaxMax – closest to design; internal corner states
Sub Block
ManyHighHigh: block I/OBlock
FewMedMed: chip I/OChip/System
ZEROLowLowSilicon in Lab
Expected Bugs found for efficiency
Observability(Checking results in I/O; internal states)
Controllability(I/O; pipeline timing; sequencing; internal state; error injection)
Level
Testbench Capability - Maximize Controllability, Observability, and Debug Efficiency
AMD: DV Club - Westford MA34 May 30, 2008
Reference Model Methodology
C++ reference model of the DUT
� One “block” = one C++ object
� Non-synthesizeable => easier to write than RTL
� Very fast
–Several orders of magnitude faster than the design
–Used by driver, performance teams
Transaction-level accuracy
� Block-block interfaces modeled (see SystemVerilog definition)
� Matches design exactly (almost)
� Sub-transaction debug taps for added accuracy
AMD: DV Club - Westford MA35 May 30, 2008
Testbenches
Sub-block testbenches : designer boot-strap
Block-level testbenches: constrained-random
� Tests written in C++
� Test library in C++
–SCV, other randomization
� Threaded transport layer
–Based on SystemC
–C++ to C++
–C++ to verilog
� Two-pass approach
–Ref model, then RTL, then compare
SystemVerilog testbenches also used
Test
Test library
Transport
Block reference
model OR RTL
AMD: DV Club - Westford MA36 May 30, 2008
Testbenches
Chip/system testbenches
� Tests written in C++
� Tests debugged on chip reference model
–Collection of block ref models; see prev slide
� Test library in C++
–Mimics OpenGL, an industry standard
Test portability
� Write once, run everywhere
–Reference model
–Design
–H/W emulation
– Lab/diags
– Production drivers
� Overall TTM improved
–Driver schedule is nontrivial
Test
OpenGL-like test library
ORproduction driver
Transport
Chip reference model OR RTL OR emulation OR real H/W
AMD: DV Club - Westford MA37 May 30, 2008
HW Emulation
Usage:
In-Ckt Emulation of full chip design and running Chip DV and SW stack
� Simulates up to 1000X faster than SW (RTL) simulation
� Capable of rendering full image frames in minutes/hrs vs days/weeks
� Capture/playback scenes of benchmarks and games
Pre Silicon
� Verifying chip/system level functionalities and performance, block interactions, stress
� Allows for longer runs of random tests to look for hangs
� Prototype and test SW drivers and Diag
� Develop Boot Up settings
Post Silicon: BringUp to Production
� Debug platform for silicon
� Validate ECOs
AMD: DV Club - Westford MA38 May 30, 2008
Coverage and Assertions
Assertions are a Good Thing
� White-box testing
� Designer impact on DV
� Etc.
Functional coverage is a Good Thing
� Deep corner cases
� API spec does not show all implementation details
� Etc.
� Bug rates/DV closure improved greatly when func covg was adopted
AMD: DV Club - Westford MA39 May 30, 2008
Visualization
It is graphics, after all
� Nice to see pretty pictures for what you are drawing
Two overlapping textured
triangles, with depth
AMD: DV Club - Westford MA40 May 30, 2008
Visualization
Corruptions become easier to see; recognize patterns
Color
corruption
AMD: DV Club - Westford MA41 May 30, 2008
Summary
AMD + ATI = positioned for success
Graphics business/technology has many challenges
Market window is everything
Techniques mostly leverage standard industry practice, with some twists
� Reference-model-based flow
� High quality is required
– Rely on coverage, constrained-random, etc.
� H/W and S/W are both key to product success
– Seamless integration required
We are growing
� Always looking for good people!
Shaw.Yang@amd.com
Gary.Greenstein@amd.com
AMD: DV Club - Westford MA42 May 30, 2008
Backup Slides