Digital Image Processing With GPU

61
Digital Image Processing With GPU By: Aniruddha Marathe

description

By: Aniruddha Marathe. Digital Image Processing With GPU. What should you expect to from this presentation?. What’s the motivation?. What’s a GPU?. The GPU Pipeline. Agenda. Programming the GPU. Performance. Applications. - PowerPoint PPT Presentation

Transcript of Digital Image Processing With GPU

Page 1: Digital Image Processing With GPU

Digital Image Processing With GPU

By: Aniruddha Marathe

Page 2: Digital Image Processing With GPU

Agenda

What’s a GPU?

What should you expect to from this presentation? What’s the motivation?

The GPU Pipeline Programming the GPU

Applications

Performance

Page 3: Digital Image Processing With GPU

What Should You Expect From This Presentation?

A Talk centered on the Architecture of underlying

hardware rather than the Algorithms that run on

them.

Page 4: Digital Image Processing With GPU

What’s the motivation?Image Processing Algorithms:

Are involved with large volumes of specific types of data,

Need high computational power (possibly parallel),

Demand real-time processing requirements (in most applications)

These needs can’t be fulfilled by a CPU

Page 5: Digital Image Processing With GPU

What’s a GPU? GPU – Graphical Processing Unit

A Specialized Co-ProcessorVery Efficient For

Fast Parallel Floating Point Processing Single Instruction Multiple Data Operations High Computation per Memory Access

Not As Efficient For Double Precision Logical Operations on Integer Data Branching-Intensive Operations Random Access, Memory-Intensive Operations

Page 6: Digital Image Processing With GPU

What’s a GPU?

Dedicated graphics rendering device:Personal computer, server, game console,

mobile device. GPU chips:

90%: integrated on motherboard (low end),10%: add-on video card (low to high end).

Memory:Dedicated Video RAM,Shared System RAM

Page 7: Digital Image Processing With GPU

GPU: Designed for?

As an Image rendering device:Highly parallel processorHigh bandwidth memory

Advanced rendering Capabilities:Multi-texturing effects.Realistic lights and shadows effects.Post processing visual effects.

Originally in consumer PCs for gaming.

Page 8: Digital Image Processing With GPU

Some Definitions Vertex

A data structure for a point in a mesh, containing position, normal and texture coordinates

FragmentA pixel, possibly sub-pixel, of a rasterized

image Shaders

Small programs run in the GPU at specific stages of the GPU pipeline

Page 9: Digital Image Processing With GPU

GPU pipelineProgram/

API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

Page 10: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

Page 11: Digital Image Processing With GPU

ProgramYour Program

APIEither OpenGL or DirectX Interface

Program/APIGPU pipeline

Page 12: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

Page 13: Digital Image Processing With GPU

DriverBlack-box

Implementations are Company SecretsLargest Bottleneck in many GPU programs

DriverGPU pipeline

Page 14: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

Page 15: Digital Image Processing With GPU

GPU Front EndReceives commands & data from driverCommunication bridge between the CPU and the GPUPulls geometry information from system memoryOutputs a stream of vertices in object space with all

their associated information (normals, texture coordinates, per vertex color etc)

PCI Express Bus helps at this stage

GPU Front EndGPU pipeline

Page 16: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

Page 17: Digital Image Processing With GPU

Vertex ProcessingReceives vertices from the GPU Front End in object

space and outputs them in screen spaceNo new vertices are created in this stage, and no

vertices are discarded (input/output has 1:1 mapping)Normals, texcoords etc are also transformed Programmable

VertexProcessing

VertexProcessor

Vertex

Data for Interpolation

Data for Rasterization

POSITION

PSIZE

FOG

TEXCOORD[0-7]COLOR[0-1]

Shader

POSITION, NORMAL, BINORMAL*, TANGENT*, TEXCOORD[0-7], COLOR[0-1], PSIZE

textures

GPU pipeline

Page 18: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

Page 19: Digital Image Processing With GPU

Primitive AssemblyCompiles Vertices into Points, Lines

and/or Polygons

PrimitiveAssemblyGPU pipeline

Page 20: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

Page 21: Digital Image Processing With GPU

RasterizationDetermines respective area of triangle or other

primitive for each fragment Interpolation

Rasterization &Interpolation

data for interpolation

data for rasterization

POSITION

PSIZE

FOG

TEXCOORD[0-7]COLOR[0-1]

Rasterizer

Interpolator interpolated data

TEXCOORD[0-7]COLOR[0-1]

rasterized data

DEPTHBarycentricCoordinates

PrimitiveAssemblerPrimitive Type

GPU pipeline

Page 22: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

RasterOperations

Page 23: Digital Image Processing With GPU

Fragment ProcessingProgrammable

FragmentProcessing

FragmentProcessor

shader

textures

interpolated data

TEXCOORD[0-7]COLOR[0-1]

rasterized data

DEPTHCOLOR[0-3]

DEPTH

data for raster operations with texture and lighting information

GPU pipeline

Page 24: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

Page 25: Digital Image Processing With GPU

Depth CheckingCheck framebuffer to see if lesser depth

already exists (Z-Buffer)Limited Programmability

BlendingUse alpha channel to combine colors

already in the framebufferLimited Programmability

RasterOperationsGPU pipeline

Page 26: Digital Image Processing With GPU

ExampleProgram/

API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations Framebuffer(s)

Driver

Bus

Code Snippet (OpenGL)

….glBegin(GL_TRIANGLES); glTexCoord2f(1,0); glVertex3f(0,1,0); glTexCoord2f(0,1); glVertex3f(-1,-1,0); glTexCoord2f(0,0); glVertex3f(1,-1,0);glEnd();…

Page 27: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations Framebuffer(s)

Driver

Bus

01001001100…. GPU

Example

Page 28: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations Framebuffer(s)

Driver

Bus

viewing frustum

Example

Page 29: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations Framebuffer(s)

Driver

Bus

screen space

Example

Page 30: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations Framebuffer(s)

Driver

Bus

framebuffer

Example

Page 31: Digital Image Processing With GPU

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations Framebuffer(s)

Driver

Bus

framebuffer

Example

Page 32: Digital Image Processing With GPU

Broader View

L2

FB

SP SPL1

TF

Thre

ad P

roce

ssor

Vtx Thread Issue

Setup / Rstr / ZCull

Prim Thread Issue Frag Thread Issue

Data Assembler

Application

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

Vertex assembly

Primitive assembly

Rasterization

Fragment operations

Vertex operations

Application

Primitive operations

NVIDIA GeForce 8800

OpenGL Pipeline

Frame BufferL2

FB

L2

FB

L2

FB

L2

FB

L2

FB

Page 33: Digital Image Processing With GPU

Correspondence (By Color)

L2

FB

SP SPL1

TF

Thre

ad P

roce

ssor

Vtx Thread Issue

Setup / Rstr / ZCull

Prim Thread Issue Frag Thread Issue

Data Assembler

Application

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

L2

FB

L2

FB

L2

FB

L2

FB

L2

FB

Vertex assembly

Primitive assembly

Rasterization(fragment assembly)

Fragment operations

Vertex operations

Application

Primitive operations

NVIDIA GeForce 8800

OpenGL Pipeline

Framebuffer

this was missing

Application-programmable parallel processor

Fixed-function assembly

processors

Fixed-function framebuffer operations

Page 34: Digital Image Processing With GPU

Streaming Processors, Texture Units, and On-chip Caches

Page 35: Digital Image Processing With GPU

Modern GPU has more ALU’s

Page 36: Digital Image Processing With GPU

NVIDIA G80 GPU Architecture Overview 16 Multiprocessors Blocks Each Block Has:

• 8 Streaming Processors• 16K Shared Memory• 64K Constant Cache• 8K Texture Cache

Shared Memory: 2 cycle latency Device Memory: 300 cycle latency

Page 37: Digital Image Processing With GPU

Programmability in the GPU In a simplified view, three programmable stages:

Vertex Engine Fragment Engine Texture Load/Filter Engine

Page 38: Digital Image Processing With GPU

Programmability in the GPU For non-graphics applications, two programmable

blocks running serially: Vertex Processor Fragment Processor

Page 39: Digital Image Processing With GPU

Programmability in the GPU Both Vertex and Fragment Processors

Support FP32 operands and intermediate values. Use Texture unit as a random-access data fetch unit at 35

GB/sec.

The programmer can write programs that are executed for every vertex as well as for every fragment

This allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applications

Page 40: Digital Image Processing With GPU

NVIDIA - CUDA

CUDA – ‘Compute Unified Device Architecture’ – a Parallel Computing Architecture developed by NVIDIA.

NVIDIA provides a GPU processing library for programming the GeForce 8800 GPUs.

‘C’ Style programming.

Page 41: Digital Image Processing With GPU

Time For Some Applications!

Page 42: Digital Image Processing With GPU

Fast De-noising of Images - 1

Page 43: Digital Image Processing With GPU
Page 44: Digital Image Processing With GPU

Fast De-noising of Images - 2

Page 45: Digital Image Processing With GPU
Page 46: Digital Image Processing With GPU

Fast Border Recognition(From GPU4Vision)

Page 47: Digital Image Processing With GPU
Page 48: Digital Image Processing With GPU

Performance

Page 49: Digital Image Processing With GPU

The NVIDIA G80 GPU 128 streaming floating point processors @1.5Ghz. 1.5 Gb Shared RAM with 86Gb/s bandwidth 320 GFLOPS on one chip (single precision)

Page 50: Digital Image Processing With GPU

NVidia G80 GPU Vs.

Intel Core 2 Duo

Page 51: Digital Image Processing With GPU
Page 52: Digital Image Processing With GPU

Yannick Allusse et al. page 52

Let’s Get Back To Image Processing!

Page 53: Digital Image Processing With GPU

Paper: GPU based Saliency Map for High-Fidelity Selective Rendering

Idea:GPU implementation for calculating the image preview

of a 3D scene and generating the saliency map that highlights the objects of importance in the scene.

Parallel selective rendering algorithm that exploits human visual attention process using the saliency map.

Page 54: Digital Image Processing With GPU

Overview of the Framework

Input Preview

Saliency Map

Selective Renderer

Page 55: Digital Image Processing With GPU

Working of the Algorithm

Page 56: Digital Image Processing With GPU

Rendering Final Image

Selective rendering fine tunes the output image by using the object importance information from the saliency map.

The processing of output image is performed in parallel by multiple processors.

Page 57: Digital Image Processing With GPU

Test Scenes

Scene 1 Scene 2 Scene 3 Scene 4

Preview:

Saliency Map:

SelectiveRendering:

Page 58: Digital Image Processing With GPU

Performance

Nvidia 6600GT GPU vs. P4 3.4Ghz CPU

For the resolution of 768 x 768 GPU based approach is approximately 70x faster than the CPU based approach.

Page 59: Digital Image Processing With GPU

Final Remarks

GPUs Provide: Parallel processing capability on large

volumes of specific type of data,

High computational power as compared to the CPUs,

Programmability for graphics as well as non-graphics applications

Page 60: Digital Image Processing With GPU

Questions ?

Page 61: Digital Image Processing With GPU

Thank You!!

Hope You Enjoyed It