CREATING a decision framework for OpenCL...

Post on 25-May-2018

256 views 2 download

Transcript of CREATING a decision framework for OpenCL...

CREATING A DECISION FRAMEWORK

FOR OpenCL USAGE

Graham Brown

CTO Corel Corporation

2 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

AGENDA

OpenCL Overview

Corel’s View of Optimization

Sample of Corel’s Decision Framework

Additional Considerations

3 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

OpenCL OVERVIEW

4 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

QUICK POLL:

WHO IS WORKING WITH OR

INVESTIGATING OpenCL?

OR HAS A COLLEAGUE WHO

IS DOING SO?

5 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

OpenCL | KEY POINTS

Acronym for Open Computing Language

– allows cross-platform parallel programming

– Open and royalty–free standard

– Improves application speed and responsiveness

– Can leverage all computing resources (CPU, GPU, APU)

Source: Khronos.org

6 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

OpenCL | GOALS

Near transparent use of computing resources

Use GPU resources for non-graphics processing w/o impacting power

usage or graphics rendering speed

Data or task-based parallel processing

Familiar / compatible with existing programming models

Source: Khronos.org

7 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

OPENCL | History – from A to K

Created by Apple

Standardized in 2008, v1.1 released 2010

Wide industry participation

Maintained and evolved by Khronos

Source: Khronos.org

http://www.khronos.org

8 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

GPU PROCESSING | IHV Support

– GPU first supported as ATI Stream SDK

– AMD APP SDK 2.4 released April 2011, fully conformant with

OpenCL 1.1, includes CPU support

– OpenCL support initially added for SandyBridge

– Released fully conformant OpenCL 1.1 SDK Beta May, 2011

– GPU first supported on CUDA platform

– OpenCL 1.0 support ships with production drivers

9 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

OpenCL | Code Sample

Think C99, Minus

– recursion

– function pointers

– variable-length arrays

But With

+ Math functions (matrix / vector)

+ extensions for

“work item” support

Vector, Image types

+ Image manipulation ops (AMD)

+ H.264 encode (Intel)

OCL Kernel:

workspace = (img_width, img_height)

__constant sampler_t sampler = CLK_ADDRESS_CLAMP |

CLK_FILTER_NEAREST | CLK_NORMALIZED_COORD_FALSE;

__kernel void RGBA2GRAY(__read_only image2d_t RGBAImg,

__write_only image2d_t GrayImg)

{

int2 coord;

coord.x = get_global_id(0);

coord.y = get_global_id(1);

uint4 rgba = read_imageui(RGBAImg, sampler, coord);

float4 frgba = convert_float4(rgba);

float4 coef = (0.299f,0.587f,0.114f,0.0f);

float res = dot(frgba, coef);

uint res_int = round(res);

uint4 res_vector = (res_int, res_int, res_int, res_int);

write_imageui(GrayImg, coord, res_vector);

}

10 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

OpenCL | Drawbacks and Challenges

DRAWBACKS

– Not quite C or C++

– Hardware-specific “Tweaking” required

– (Potentially) requires 4 or more code streams to maximize performance

CHALLENGES

– Initial implementation costs

– Code complexity

– Support costs

– Install package complexity

11 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

SO WHY USE GPU / OpenCL?

12 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

VIDEO STUDIO X4 – Software vs OpenCL Encode

The Treasure Hunters

Clip Length: 199 Seconds

OpenCL Encode

137.5 Seconds

69% of realtime

Software Encode

322.4 Seconds

162% of realtime

OS: Windows 7 64-bit Ultimate

CPU: AMD PhenomII X6 1055T 2.8GHz

VGA: AMD Radeon HD 6870

Memory: 4.00GB

13 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

OpenCL | In the future

Transparent data and kernel switching between devices (CPU and GPU)

Operation of kernels on CPU or GPU without re-compilation

Corel’s Wishlist:

– Speedy transition of extensions to core OpenCL spec

– IHV alignment on rapidly moving OpenCL forward

– Continued focus on C99 “Likeness”

– Focus on tools by all IHV’s

14 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

COREL’S VIEW OF

OPTIMIZATION

15 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

COREL | A brief history

Entrepreneur start-up

Products over 26y history:

– Laser printers, SCSI cards

– CorelDRAW grew out of an extension to a DTP product

– Desktop Video Conferencing

– Linux OS

– 100’s of Windows Apps – many from acquisitions

Private / Public / Private / Public / Private(!) ownership

= Adept at change management!

16 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

WITH CREDIT TO WINSTON CHURCHILL

CHURCHILL:

COREL:

“Study History, study

history. In history lies all

the secrets of statecraft”.

Study History, study

history. In history lies all

the secrets of statecraft

optimization!

17 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

TOP 3 TAKEAWAYS:

18 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

TOP 3 TAKEAWAYS:

TAKE POST-MORTEM’S SERIOUSLY, AND

IMPLEMENT PROCESS CHANGES

19 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

TOP 3 TAKEAWAYS:

TAKE POST-MORTEM’S SERIOUSLY, AND

IMPLEMENT PROCESS CHANGES

COMMIT TO KEY STAFF

20 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

TOP 3 TAKEAWAYS:

TAKE POST-MORTEM’S SERIOUSLY, AND

IMPLEMENT PROCESS CHANGES

COMMIT TO KEY STAFF

“OPTIMIZATION SHOULD BE

ALL ABOUT THE USER”

21 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

CASE STUDY | CorelDRAW 7

“IT’S ALL ABOUT THE USER”

Success: Where optimization was focused on our users:

– “Real-Time” editing in CorelDRAW

– Increased battery life in WinDVD

Failure : Where it was not:

– Re-architecture / optimization not focuses on explicit user benefit

22 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

FOR OPTIMIZATION, IT’S

REALLY

ALL ABOUT THE USER

23 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

CASE STUDY | CorelDRAW 7

Test Image

500 objects, 2,000 points

Optimization Result:

– Display speed increased 10x

Engineer’s Test Drawing

24 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

CASE STUDY | CorelDRAW 7

Validation: Image

9 x 568 objects, 9 x 3,888 points

Optimization Result:

– Display speed increased 10x

25 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

CASE STUDY | CorelDRAW 7

Litmus Test:

6,254 objects, 52,378 points

Before Optimization:

– Display time ~2 minutes

26 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

CASE STUDY | CorelDRAW 7

Litmus Test:

6,254 objects, 52,378 points

Before Optimization:

– Display time ~2 minutes

After Optimization:

– Display time ~90 sec

27 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

CASE STUDY | CorelDRAW 7

What Happened?

Good News: In the end, “The Huntress” achieved the same 10x speed-up, displaying in about 10 seconds

Cause of discrepancy:

– Train creator used nested hierarchy of objects

– Huntress creator used linear object list

28 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

CASE STUDY | CorelDRAW 7

What Happened?

Good News: In the end, “The Huntress” achieved the same 10x speed-up, displaying in about 10 seconds

Cause of discrepancy:

– Train creator used nested hierarchy of objects

– Huntress creator used linear object list

KEY TAKEAWAY: Without knowing all user scenarios, can’t predict bottlenecks

29 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

CASE STUDY | …AND CorelDRAW X5

Only 429 Objects, 25,061 Points

– but mesh fills = a new bottleneck!

30 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

SO:

“WHY OpenCL?”

31 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

TRENDS IN DIGITAL MEDIA – COREL’S VIEWPOINT

Trend User Impact Enhanced by OpenCL*

GPU & CPU advances More processing power

Mobile Computing More data on the cloud

Social Networking More cloud-based fix & edit

Touch-Based UI’sIncreased demand for

instant feedback

More 3D More data to process

Streaming Video Processing time extends wait

Enhanced User experience Rising user expectations

*Corel’s current optimization plans – does not reflect general OpenCL applicability

32 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

WHAT MATTERS TO OUR DIGITAL MEDIA USERS? | Many Variables

Open / Save Speed

Rendering / Encoding Speed

Battery Life

Execution of operations:

– Real-time preview of effects

– Image correction operations

33 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

PaintShop Pro – Original Image

34 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

PaintShop Pro – 3 Clicks / 30 seconds later

- Straighten

- Smart Photo Fix

- Local Tone Map

35 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

GUIDING OPTIMIZATION DECISIONS

Corel’s approach: Employ a framework to guide optimization decisions

Case Study: VideoStudio Encoding Optimization

– Data Flow Block Diagram

– Stated Objectives / desired outcomes (from PM / UED)

– Decision Tree (backed by detailed spreadsheet)

36 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

Case Study: Data Flow Block Diagram – Encoding / Rendering in VideoStudio

Original

File(s)Encoded

File

CPU

Decoding

CPU Video Render

(effect / composite /

scaling / de-interlace

/ color conversion)

CPU

Encoding /

Mixing

GPU

Decoding

DXVA Accel

GPU/D3D Video

Render (effect /

composite / scaling /

de-interlace / color

conversion)

GPU Encoding

(H.264 / MPEG 2)

Legend

System Memory transfer

System/Video Memory Transfer

Video Memory Transfer

37 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

Case Study: Clearly Stated Objectives

Clarity around optimization objectives from the user’s perspective is a key input

Some examples:

– Decrease time-to-render on target platforms by 50%, and ensure all platforms are at least 10% faster

– Make the XYZ effect real-time for any platform that meets our minimum spec

– Etc.

38 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

Request for

Performance

Improvement

Proceed with

Design/Native

Code optimization

Design & Code

currently optimized

in C / C++?

Can the data

or operation be

parallelized?

Is the

operation

relative easy to

implement in

“standard”

OpenCL?

Provide 2 or more

implementations, based

upon hardware

Proceed with OpenCL

implementation

Implement on latest

version of OpenCL

Focus on CPU

Based optimization*

Implement on latest

Common version of

OpenCL

Implement solutions

Optimized by HW

Proceed witn

Non-OpenCL

implementation

YES

NO

Will optimized

Design / Native

Code address

Needs?

Is the

req’d functtionality

Available in 1 or more

external proprietary

libraries?

Performance

critical area, or risk/

QA effort warrant

using extension /

library?

Does

performance

difference warrant

splitting code

paths?

At least one

Solution using

OpenCL

Is the

latest

version of OpenCL

supported on all

req`d HW

NO

NO

YES

YES

NO

YES NO NO YES

YES YES NO

NO

YES

YES

Case Study: Encode Path Decision Tree

39 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

Can the data

or operation be

parallelized?

YES

NO

Parallelism will usually be a key decision for OpenCL usage;

however, exceptions always exist (for example, when battery usage is critical)

Focus on CPU

Based optimization*

Case Study: Encode Path Decision Tree

40 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

Case Study: Encode Path Decision Tree

Provide 2 or more

implementations, based

upon hardware

Proceed with OpenCL

implementation

Performance

critical area, or dev

risk warrants

using extension /

library?

YES

NO

Maintaining a single code base will always be preferable,

but performance will sometimes warrant forking the code

41 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

Case Study: Encode Path Decision Tree

Implement on latest

version of OpenCL

Is the

latest

version of OpenCL

supported on all

req`d HW?

YES

NO

Developers will usually push to use the latest / greatest version of any technology,

but that may not be the correct answer for our users

42 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

Case Study: Reminder

This is one of the VideoStudio decision trees

– Helps to focus where efforts should be directed for one class of optimization

Other apps or workflows will have their own decision tree’s

Presentation materials will be shared online

43 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

CLOSING CTO MUSINGS

Post Mortems are good – in-process checklists, prepared in advance, are better!

Objectives Checklist Check

Optimizing on features that matter to users?

Battery performance, Open/save, rendering, encoding, other

Positive impact on largest/most important group of users?

Is OpenCL the right tool?

Options – OpenCL kernel libraries (e.g. AML), optimize existing code,

existing design

What’s changed?

Availability of technology or support for it

Are we keeping a reference copy of code to validate results?

Are we comparison checking results?

Did we validate results with real user artifacts AND real users?

44 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

CLOSING CTO MUSINGS – CONT’D

Post Mortems are good – in-process checklists, prepared in advance, are better…

….. And hardware vendors collaborating to progress OpenCL is great!

45 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

THANK YOU!

47 | Creating a Decision Framework for OpenCL Usage | June 15, 2011

Presentation Back-up Materials

Corel Stop Motion Video: Treasure Hunters

http://vimeo.com/20610210

http://youtu.be/xFDs-_CJHm8

Corel Stop Motion Secrets: Meet the Filmmaker (interview with John Huang)

http://vimeo.com/20066276

http://youtu.be/6zwO8Pp--WQ

Corel Time-Lapse Video: Time in Motion:

http://vimeo.com/20068138

http://youtu.be/hHvmiLpIsIY

Additional movies by two WordPerfect Project Leaders.

http://www.truedimensions.com/timesage/movies.htm