Digital Image Processing With GPU
description
Transcript of Digital Image Processing With GPU
Digital Image Processing With GPU
By: Aniruddha Marathe
Agenda
What’s a GPU?
What should you expect to from this presentation? What’s the motivation?
The GPU Pipeline Programming the GPU
Applications
Performance
What Should You Expect From This Presentation?
A Talk centered on the Architecture of underlying
hardware rather than the Algorithms that run on
them.
What’s the motivation?Image Processing Algorithms:
Are involved with large volumes of specific types of data,
Need high computational power (possibly parallel),
Demand real-time processing requirements (in most applications)
These needs can’t be fulfilled by a CPU
What’s a GPU? GPU – Graphical Processing Unit
A Specialized Co-ProcessorVery Efficient For
Fast Parallel Floating Point Processing Single Instruction Multiple Data Operations High Computation per Memory Access
Not As Efficient For Double Precision Logical Operations on Integer Data Branching-Intensive Operations Random Access, Memory-Intensive Operations
What’s a GPU?
Dedicated graphics rendering device:Personal computer, server, game console,
mobile device. GPU chips:
90%: integrated on motherboard (low end),10%: add-on video card (low to high end).
Memory:Dedicated Video RAM,Shared System RAM
GPU: Designed for?
As an Image rendering device:Highly parallel processorHigh bandwidth memory
Advanced rendering Capabilities:Multi-texturing effects.Realistic lights and shadows effects.Post processing visual effects.
Originally in consumer PCs for gaming.
Some Definitions Vertex
A data structure for a point in a mesh, containing position, normal and texture coordinates
FragmentA pixel, possibly sub-pixel, of a rasterized
image Shaders
Small programs run in the GPU at specific stages of the GPU pipeline
GPU pipelineProgram/
API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations
Framebuffer
DriverCPU
GPUBus
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations
Framebuffer
DriverCPU
GPUBus
GPU pipeline
ProgramYour Program
APIEither OpenGL or DirectX Interface
Program/APIGPU pipeline
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations
Framebuffer
DriverCPU
GPUBus
GPU pipeline
DriverBlack-box
Implementations are Company SecretsLargest Bottleneck in many GPU programs
DriverGPU pipeline
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations
Framebuffer
DriverCPU
GPUBus
GPU pipeline
GPU Front EndReceives commands & data from driverCommunication bridge between the CPU and the GPUPulls geometry information from system memoryOutputs a stream of vertices in object space with all
their associated information (normals, texture coordinates, per vertex color etc)
PCI Express Bus helps at this stage
GPU Front EndGPU pipeline
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations
Framebuffer
DriverCPU
GPUBus
GPU pipeline
Vertex ProcessingReceives vertices from the GPU Front End in object
space and outputs them in screen spaceNo new vertices are created in this stage, and no
vertices are discarded (input/output has 1:1 mapping)Normals, texcoords etc are also transformed Programmable
VertexProcessing
VertexProcessor
Vertex
Data for Interpolation
Data for Rasterization
POSITION
PSIZE
FOG
TEXCOORD[0-7]COLOR[0-1]
Shader
POSITION, NORMAL, BINORMAL*, TANGENT*, TEXCOORD[0-7], COLOR[0-1], PSIZE
textures
GPU pipeline
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations
Framebuffer
DriverCPU
GPUBus
GPU pipeline
Primitive AssemblyCompiles Vertices into Points, Lines
and/or Polygons
PrimitiveAssemblyGPU pipeline
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations
Framebuffer
DriverCPU
GPUBus
GPU pipeline
RasterizationDetermines respective area of triangle or other
primitive for each fragment Interpolation
Rasterization &Interpolation
data for interpolation
data for rasterization
POSITION
PSIZE
FOG
TEXCOORD[0-7]COLOR[0-1]
Rasterizer
Interpolator interpolated data
TEXCOORD[0-7]COLOR[0-1]
rasterized data
DEPTHBarycentricCoordinates
PrimitiveAssemblerPrimitive Type
GPU pipeline
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations
Framebuffer
DriverCPU
GPUBus
GPU pipeline
RasterOperations
Fragment ProcessingProgrammable
FragmentProcessing
FragmentProcessor
shader
textures
interpolated data
TEXCOORD[0-7]COLOR[0-1]
rasterized data
DEPTHCOLOR[0-3]
DEPTH
data for raster operations with texture and lighting information
GPU pipeline
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations
Framebuffer
DriverCPU
GPUBus
GPU pipeline
Depth CheckingCheck framebuffer to see if lesser depth
already exists (Z-Buffer)Limited Programmability
BlendingUse alpha channel to combine colors
already in the framebufferLimited Programmability
RasterOperationsGPU pipeline
ExampleProgram/
API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations Framebuffer(s)
Driver
Bus
Code Snippet (OpenGL)
….glBegin(GL_TRIANGLES); glTexCoord2f(1,0); glVertex3f(0,1,0); glTexCoord2f(0,1); glVertex3f(-1,-1,0); glTexCoord2f(0,0); glVertex3f(1,-1,0);glEnd();…
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations Framebuffer(s)
Driver
Bus
01001001100…. GPU
Example
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations Framebuffer(s)
Driver
Bus
viewing frustum
Example
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations Framebuffer(s)
Driver
Bus
screen space
Example
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations Framebuffer(s)
Driver
Bus
framebuffer
Example
Program/API
GPU Front End
VertexProcessing
PrimitiveAssembly
Rasterization &Interpolation
FragmentProcessing
RasterOperations Framebuffer(s)
Driver
Bus
framebuffer
Example
Broader View
L2
FB
SP SPL1
TF
Thre
ad P
roce
ssor
Vtx Thread Issue
Setup / Rstr / ZCull
Prim Thread Issue Frag Thread Issue
Data Assembler
Application
SP SPL1
TF
SP SPL1
TF
SP SPL1
TF
SP SPL1
TF
SP SPL1
TF
SP SPL1
TF
SP SPL1
TF
Vertex assembly
Primitive assembly
Rasterization
Fragment operations
Vertex operations
Application
Primitive operations
NVIDIA GeForce 8800
OpenGL Pipeline
Frame BufferL2
FB
L2
FB
L2
FB
L2
FB
L2
FB
Correspondence (By Color)
L2
FB
SP SPL1
TF
Thre
ad P
roce
ssor
Vtx Thread Issue
Setup / Rstr / ZCull
Prim Thread Issue Frag Thread Issue
Data Assembler
Application
SP SPL1
TF
SP SPL1
TF
SP SPL1
TF
SP SPL1
TF
SP SPL1
TF
SP SPL1
TF
SP SPL1
TF
L2
FB
L2
FB
L2
FB
L2
FB
L2
FB
Vertex assembly
Primitive assembly
Rasterization(fragment assembly)
Fragment operations
Vertex operations
Application
Primitive operations
NVIDIA GeForce 8800
OpenGL Pipeline
Framebuffer
this was missing
Application-programmable parallel processor
Fixed-function assembly
processors
Fixed-function framebuffer operations
Streaming Processors, Texture Units, and On-chip Caches
Modern GPU has more ALU’s
NVIDIA G80 GPU Architecture Overview 16 Multiprocessors Blocks Each Block Has:
• 8 Streaming Processors• 16K Shared Memory• 64K Constant Cache• 8K Texture Cache
Shared Memory: 2 cycle latency Device Memory: 300 cycle latency
Programmability in the GPU In a simplified view, three programmable stages:
Vertex Engine Fragment Engine Texture Load/Filter Engine
Programmability in the GPU For non-graphics applications, two programmable
blocks running serially: Vertex Processor Fragment Processor
Programmability in the GPU Both Vertex and Fragment Processors
Support FP32 operands and intermediate values. Use Texture unit as a random-access data fetch unit at 35
GB/sec.
The programmer can write programs that are executed for every vertex as well as for every fragment
This allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applications
NVIDIA - CUDA
CUDA – ‘Compute Unified Device Architecture’ – a Parallel Computing Architecture developed by NVIDIA.
NVIDIA provides a GPU processing library for programming the GeForce 8800 GPUs.
‘C’ Style programming.
Time For Some Applications!
Fast De-noising of Images - 1
Fast De-noising of Images - 2
Fast Border Recognition(From GPU4Vision)
Performance
The NVIDIA G80 GPU 128 streaming floating point processors @1.5Ghz. 1.5 Gb Shared RAM with 86Gb/s bandwidth 320 GFLOPS on one chip (single precision)
NVidia G80 GPU Vs.
Intel Core 2 Duo
Yannick Allusse et al. page 52
Let’s Get Back To Image Processing!
Paper: GPU based Saliency Map for High-Fidelity Selective Rendering
Idea:GPU implementation for calculating the image preview
of a 3D scene and generating the saliency map that highlights the objects of importance in the scene.
Parallel selective rendering algorithm that exploits human visual attention process using the saliency map.
Overview of the Framework
Input Preview
Saliency Map
Selective Renderer
Working of the Algorithm
Rendering Final Image
Selective rendering fine tunes the output image by using the object importance information from the saliency map.
The processing of output image is performed in parallel by multiple processors.
Test Scenes
Scene 1 Scene 2 Scene 3 Scene 4
Preview:
Saliency Map:
SelectiveRendering:
Performance
Nvidia 6600GT GPU vs. P4 3.4Ghz CPU
For the resolution of 768 x 768 GPU based approach is approximately 70x faster than the CPU based approach.
Final Remarks
GPUs Provide: Parallel processing capability on large
volumes of specific type of data,
High computational power as compared to the CPUs,
Programmability for graphics as well as non-graphics applications
Questions ?
Thank You!!
Hope You Enjoyed It