Digital Image Processing With GPU

Post on 23-Feb-2016

84 views 4 download

Tags:

description

By: Aniruddha Marathe. Digital Image Processing With GPU. What should you expect to from this presentation?. What’s the motivation?. What’s a GPU?. The GPU Pipeline. Agenda. Programming the GPU. Performance. Applications. - PowerPoint PPT Presentation

Transcript of Digital Image Processing With GPU

Digital Image Processing With GPU

By: Aniruddha Marathe

Agenda

What’s a GPU?

What should you expect to from this presentation? What’s the motivation?

The GPU Pipeline Programming the GPU

Applications

Performance

What Should You Expect From This Presentation?

A Talk centered on the Architecture of underlying

hardware rather than the Algorithms that run on

them.

What’s the motivation?Image Processing Algorithms:

Are involved with large volumes of specific types of data,

Need high computational power (possibly parallel),

Demand real-time processing requirements (in most applications)

These needs can’t be fulfilled by a CPU

What’s a GPU? GPU – Graphical Processing Unit

A Specialized Co-ProcessorVery Efficient For

Fast Parallel Floating Point Processing Single Instruction Multiple Data Operations High Computation per Memory Access

Not As Efficient For Double Precision Logical Operations on Integer Data Branching-Intensive Operations Random Access, Memory-Intensive Operations

What’s a GPU?

Dedicated graphics rendering device:Personal computer, server, game console,

mobile device. GPU chips:

90%: integrated on motherboard (low end),10%: add-on video card (low to high end).

Memory:Dedicated Video RAM,Shared System RAM

GPU: Designed for?

As an Image rendering device:Highly parallel processorHigh bandwidth memory

Advanced rendering Capabilities:Multi-texturing effects.Realistic lights and shadows effects.Post processing visual effects.

Originally in consumer PCs for gaming.

Some Definitions Vertex

A data structure for a point in a mesh, containing position, normal and texture coordinates

FragmentA pixel, possibly sub-pixel, of a rasterized

image Shaders

Small programs run in the GPU at specific stages of the GPU pipeline

GPU pipelineProgram/

API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

ProgramYour Program

APIEither OpenGL or DirectX Interface

Program/APIGPU pipeline

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

DriverBlack-box

Implementations are Company SecretsLargest Bottleneck in many GPU programs

DriverGPU pipeline

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

GPU Front EndReceives commands & data from driverCommunication bridge between the CPU and the GPUPulls geometry information from system memoryOutputs a stream of vertices in object space with all

their associated information (normals, texture coordinates, per vertex color etc)

PCI Express Bus helps at this stage

GPU Front EndGPU pipeline

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

Vertex ProcessingReceives vertices from the GPU Front End in object

space and outputs them in screen spaceNo new vertices are created in this stage, and no

vertices are discarded (input/output has 1:1 mapping)Normals, texcoords etc are also transformed Programmable

VertexProcessing

VertexProcessor

Vertex

Data for Interpolation

Data for Rasterization

POSITION

PSIZE

FOG

TEXCOORD[0-7]COLOR[0-1]

Shader

POSITION, NORMAL, BINORMAL*, TANGENT*, TEXCOORD[0-7], COLOR[0-1], PSIZE

textures

GPU pipeline

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

Primitive AssemblyCompiles Vertices into Points, Lines

and/or Polygons

PrimitiveAssemblyGPU pipeline

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

RasterizationDetermines respective area of triangle or other

primitive for each fragment Interpolation

Rasterization &Interpolation

data for interpolation

data for rasterization

POSITION

PSIZE

FOG

TEXCOORD[0-7]COLOR[0-1]

Rasterizer

Interpolator interpolated data

TEXCOORD[0-7]COLOR[0-1]

rasterized data

DEPTHBarycentricCoordinates

PrimitiveAssemblerPrimitive Type

GPU pipeline

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

RasterOperations

Fragment ProcessingProgrammable

FragmentProcessing

FragmentProcessor

shader

textures

interpolated data

TEXCOORD[0-7]COLOR[0-1]

rasterized data

DEPTHCOLOR[0-3]

DEPTH

data for raster operations with texture and lighting information

GPU pipeline

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations

Framebuffer

DriverCPU

GPUBus

GPU pipeline

Depth CheckingCheck framebuffer to see if lesser depth

already exists (Z-Buffer)Limited Programmability

BlendingUse alpha channel to combine colors

already in the framebufferLimited Programmability

RasterOperationsGPU pipeline

ExampleProgram/

API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations Framebuffer(s)

Driver

Bus

Code Snippet (OpenGL)

….glBegin(GL_TRIANGLES); glTexCoord2f(1,0); glVertex3f(0,1,0); glTexCoord2f(0,1); glVertex3f(-1,-1,0); glTexCoord2f(0,0); glVertex3f(1,-1,0);glEnd();…

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations Framebuffer(s)

Driver

Bus

01001001100…. GPU

Example

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations Framebuffer(s)

Driver

Bus

viewing frustum

Example

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations Framebuffer(s)

Driver

Bus

screen space

Example

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations Framebuffer(s)

Driver

Bus

framebuffer

Example

Program/API

GPU Front End

VertexProcessing

PrimitiveAssembly

Rasterization &Interpolation

FragmentProcessing

RasterOperations Framebuffer(s)

Driver

Bus

framebuffer

Example

Broader View

L2

FB

SP SPL1

TF

Thre

ad P

roce

ssor

Vtx Thread Issue

Setup / Rstr / ZCull

Prim Thread Issue Frag Thread Issue

Data Assembler

Application

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

Vertex assembly

Primitive assembly

Rasterization

Fragment operations

Vertex operations

Application

Primitive operations

NVIDIA GeForce 8800

OpenGL Pipeline

Frame BufferL2

FB

L2

FB

L2

FB

L2

FB

L2

FB

Correspondence (By Color)

L2

FB

SP SPL1

TF

Thre

ad P

roce

ssor

Vtx Thread Issue

Setup / Rstr / ZCull

Prim Thread Issue Frag Thread Issue

Data Assembler

Application

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

SP SPL1

TF

L2

FB

L2

FB

L2

FB

L2

FB

L2

FB

Vertex assembly

Primitive assembly

Rasterization(fragment assembly)

Fragment operations

Vertex operations

Application

Primitive operations

NVIDIA GeForce 8800

OpenGL Pipeline

Framebuffer

this was missing

Application-programmable parallel processor

Fixed-function assembly

processors

Fixed-function framebuffer operations

Streaming Processors, Texture Units, and On-chip Caches

Modern GPU has more ALU’s

NVIDIA G80 GPU Architecture Overview 16 Multiprocessors Blocks Each Block Has:

• 8 Streaming Processors• 16K Shared Memory• 64K Constant Cache• 8K Texture Cache

Shared Memory: 2 cycle latency Device Memory: 300 cycle latency

Programmability in the GPU In a simplified view, three programmable stages:

Vertex Engine Fragment Engine Texture Load/Filter Engine

Programmability in the GPU For non-graphics applications, two programmable

blocks running serially: Vertex Processor Fragment Processor

Programmability in the GPU Both Vertex and Fragment Processors

Support FP32 operands and intermediate values. Use Texture unit as a random-access data fetch unit at 35

GB/sec.

The programmer can write programs that are executed for every vertex as well as for every fragment

This allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applications

NVIDIA - CUDA

CUDA – ‘Compute Unified Device Architecture’ – a Parallel Computing Architecture developed by NVIDIA.

NVIDIA provides a GPU processing library for programming the GeForce 8800 GPUs.

‘C’ Style programming.

Time For Some Applications!

Fast De-noising of Images - 1

Fast De-noising of Images - 2

Fast Border Recognition(From GPU4Vision)

Performance

The NVIDIA G80 GPU 128 streaming floating point processors @1.5Ghz. 1.5 Gb Shared RAM with 86Gb/s bandwidth 320 GFLOPS on one chip (single precision)

NVidia G80 GPU Vs.

Intel Core 2 Duo

Yannick Allusse et al. page 52

Let’s Get Back To Image Processing!

Paper: GPU based Saliency Map for High-Fidelity Selective Rendering

Idea:GPU implementation for calculating the image preview

of a 3D scene and generating the saliency map that highlights the objects of importance in the scene.

Parallel selective rendering algorithm that exploits human visual attention process using the saliency map.

Overview of the Framework

Input Preview

Saliency Map

Selective Renderer

Working of the Algorithm

Rendering Final Image

Selective rendering fine tunes the output image by using the object importance information from the saliency map.

The processing of output image is performed in parallel by multiple processors.

Test Scenes

Scene 1 Scene 2 Scene 3 Scene 4

Preview:

Saliency Map:

SelectiveRendering:

Performance

Nvidia 6600GT GPU vs. P4 3.4Ghz CPU

For the resolution of 768 x 768 GPU based approach is approximately 70x faster than the CPU based approach.

Final Remarks

GPUs Provide: Parallel processing capability on large

volumes of specific type of data,

High computational power as compared to the CPUs,

Programmability for graphics as well as non-graphics applications

Questions ?

Thank You!!

Hope You Enjoyed It