OpenVX webinar no animations - The Khronos … Nodes can be on any hardware or processor coded in...

33
OpenVX Webinar June 16, 2016 Kari Pulli Intel Radhakrishna Giduthuri, AMD Frank Brill NVIDIA

Transcript of OpenVX webinar no animations - The Khronos … Nodes can be on any hardware or processor coded in...

OpenVX Webinar June 16, 2016

Kari Pulli Intel

Radhakrishna Giduthuri, AMD

Frank Brill NVIDIA

© Copyright Khronos Group 2016 - Page 1

Vision Acceleration

Kari PulliSr. Principal Engineer

Intel

© Copyright Khronos Group 2016 - Page 2

Khronos Open Standards

Software

Silicon

Khronos is an Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access hardware

acceleration for graphics, parallel compute and vision

© Copyright Khronos Group 2016 - Page 3

Vision Processing Power Efficiency• Vision processing just on CPU is too expensive- Especially on battery-powered devices

• GPUs are more power-efficient- They were architected for efficient pixel handling

• Traditional cameras have dedicated hardware- ISP = Image Signal Processor – on all SOCs today

AdvancedSensors

Wearables

But how to program specialized processors?Performance and Functional Portability

Pow

er E

ffic

ienc

y

Computation Flexibility

Dedicated Hardware

GPUCompute

Multi-coreCPUX1

X10

X100 Vision Processing Efficiency

Vision DSPs

© Copyright Khronos Group 2016 - Page 4

OpenVX – Low-Power Vision Acceleration • Higher-level abstraction API- Targeted at real-time mobile and embedded platforms

• Performance portability across diverse architectures- Multi-core CPUs, GPUs, DSPs, ISPs, Dedicated hardware, …

• Extends portable vision acceleration to very low-power domains- Doesn’t require high-power CPU/GPU Complex

Accelerator

Vision Engine

Middleware

Application

Accelerator

Accelerator

Pow

er E

ffic

ienc

y

Computation Flexibility

Dedicated Hardware

GPUCompute

Multi-coreCPUX1

X10

X100 Vision Processing Efficiency

Vision DSPs

© Copyright Khronos Group 2016 - Page 5

OpenVX Graphs• OpenVX developers express a graph of image operations (‘Nodes’)- Nodes can be on any hardware or processor coded in any language- For example, on GPU, nodes may implemented in OpenCL

• Minimizes host interaction during frame-rate graph execution- Host processor can setup graph which can then execute almost autonomously

Array of Keypoints

YUVFrame

GrayFrame

CameraInput

RenderingOutput

Pyrt

Color Conversion

Channel Extract

Optical Flow

Harris Track

Image Pyramid

RGBFrame

Array of FeaturesFtrt-1OpenVX Graph

OpenVX Nodes

© Copyright Khronos Group 2016 - Page 6

OpenVX Framework Efficiency

Reuse pre-allocated memory for multiple intermediate data

MemoryManagement

Less allocation overhead,more memory forother applications

Replace a sub-graph with a single faster node

Kernel Merging

Better memorylocality, less kernel launch overhead

Split the graph execution across the whole system: CPU / GPU / dedicated HW

GraphScheduling

Faster executionor lower powerconsumption

Execute a sub-graph at tile granularity instead of image granularity

DataTiling

Better use of data cache andlocal memory

© Copyright Khronos Group 2016 - Page 7

OpenVX and OpenCV are Complementary

Implementation Community-driven open source library Open standard API designed to be implemented by hardware vendors

ConformanceExtensive OpenCV Test Suite but

no formal Adopters programImplementations must pass defined

conformance test suite to use trademark

Consistency Available functions can vary depending on implementation / platform

All core functions must be available in all conformant implementations

Scope Very wide 1000s of imaging and vision functions

Tight focus on core hardware accelerated functions for mobile vision – but extensible

Efficiency Memory-based architectureEach operation reads and writes to memory

Graph-based executionOptimizable computation and data transfer

Typical Use CaseRapid experimentation and

prototyping - especially on desktopProduction development & deployment on

wide range of mobile and embedded devices

© Copyright Khronos Group 2016 - Page 8

OpenVX 1.0 Shipping, OpenVX 1.1 Released!•Multiple OpenVX 1.0 Implementations shipping – spec in October 2014-Open source sample implementation and conformance tests available

•OpenVX 1.1 Specification released in May 2016-Expands node functionality AND enhances graph framework-Sample source and conformance tests will be updated to OpenVX 1.1 soon

•OpenVX is EXTENSIBLE-Implementers can add their own nodes at any time to meet customer and market needs

= provided results for conformance tests

© Copyright Khronos Group 2016 - Page 1

OpenVX Technical OverviewKhronos Webinar

khronos.org/openvx

Radhakrishna Giduthuri | AMD

© Copyright Khronos Group 2016 - Page 2

OpenVX Components

Context (vx_context)

Data Objectsvx_image, vx_pyramid, vx_array, vx_lut,vx_remap, vx_scalar, vx_threshold,vx_distribution, vx_matrix, vx_convolution,vx_delay, vx_object_array

Kernels (vx_kernel)Built-in vision functions,Vendor extensions,User-defined

MiscellaneousDirectives, Hints, Logging, Performance Measurements

Graphs (vx_graph)

Nodes (vx_node)Kernel instances, parameters,completion callback functions

Virtual Datavx_image, vx_pyramid, vx_array,vx_object_array

ExtensionsTiling, XML Schema

© Copyright Khronos Group 2016 - Page 3

Context

• Context

- OpenVX world: need to be created first

- All objects belong to a context

#include <VX/vx.h>

...

vx_context context = vxCreateContext();

* See “VX/vx_api.h” for framework API function definitions.

© Copyright Khronos Group 2016 - Page 4

•Explicit status checkObject creation: use vxGetStatus to check the object

•More info from the log callback

Error Management• Methods return a status

vx_status returned: VX_SUCCESS when no error

if( vxProcessGraph( graph ) != VX_SUCCESS) { /* Error */ }

vx_context context = vxCreateContext();

if( vxGetStatus( (vx_reference)context ) != VX_SUCCESS ) { /* Error */ }

void logCallback( vx_context c, vx_reference r, vx_status s,

const vx_char string[] )

{ /* Do something */ }

...

vxRegisterLogCallback( context, logCallback, vx_false_e );

...

vxAddLogEntry( reference, VX_INVALID_VALUE, ”specified value is out of range” );

* See “VX/vx_types.h” for type definitions and error codes.

© Copyright Khronos Group 2016 - Page 5

Data objects

vx_image img = vxCreateImage( context, 640, 400, VX_DF_IMAGE_RGB );

// Use the image

vxReleaseImage( &img );

•The application gets only references to objects, not the objects

-References should be released by the application when not needed

-Ref-counted object destroyed by OpenVX when not referenced any more

•Object-Oriented Behavior

-strongly typed (good for safety-critical applications)

-OpenVX are really pointers to structs- any object may be down-cast to a vx_reference, e.g., for passing to vxGetStatus()

•Opaque

-Access to content explicit and temporary (map/unmap or copy)- No permanent pointer to internal data

-Needed to handle complex memory hierarchies- DSP local memory

- GPU dedicated memory

© Copyright Khronos Group 2016 - Page 6

Enumerated Data TypesC data type Enumeration

vx_uint8 (basic data type) VX_TYPE_UINT8

vx_int16 VX_TYPE_INT16

vx_uint16 VX_TYPE_UINT16

vx_int32 VX_TYPE_INT32

vx_float32 VX_TYPE_FLOAT32

vx_enum VX_TYPE_ENUM

vx_rectangle_t (struct) VX_TYPE_RECTANGLE

vx_keypoint_t VX_TYPE_KEYPOINT

… …

vx_image (opaque object) VX_TYPE_IMAGE

© Copyright Khronos Group 2016 - Page 7

Data Object Creation

vx_image img = vxCreateImage( ctx, 640, 400, VX_DF_IMAGE_UYVY ); // supports 13 standard formats

vx_pyramid pyr = vxCreatePyramid( ctx, levels, VX_SCALE_PYRAMID_HALF, 640, 400, VX_DF_IMAGE_U8 );

vx_array arr = vxCreateArray( ctx, VX_TYPE_KEYPOINT, capacity ); // array of vx_keypoint_t[]

vx_lut lut = vxCreateLUT( ctx, VX_TYPE_UINT8, 256 ); // 8-bit look-up table

vx_remap remap = vxCreateRemap( ctx, src_width, src_height, dst_width, dst_height );

vx_float32 scalar_initial_value = 1.25f;

vx_scalar scalar = vxCreateScalar( ctx, VX_TYPE_FLOAT32, &scalar_initial_value );

vx_matrix mat = vxCreateMatrix( ctx, VX_TYPE_FLOAT32, columns, rows );

vx_delay delay = vxCreateDelay( ctx, (vx_reference)pyr, num_slots ); // pyr is an exemplar

vx_object_array obj_arr = vxCreateObjectArray( ctx, (vx_reference)pyr, count );

vx_distribution dist = vxCreateDistribution( ctx, num_bins, offset, range );

OpenVX Graphvx_context context = vxCreateContext();

vx_image input = vxCreateImage( context, 640, 480, VX_DF_IMAGE_U8 );

vx_image output = vxCreateImage( context, 640, 480, VX_DF_IMAGE_U8 );

vx_graph graph = vxCreateGraph( context );

vx_image intermediate = vxCreateVirtualImage( graph, 640, 480, VX_DF_IMAGE_U8 );

vx_node F1 = vxF1Node( graph, input, intermediate );

vx_node F2 = vxF2Node( graph, intermediate, output );

vxVerifyGraph( graph );

while(...) {

// … write to input image …

vxProcessGraph( graph );

// … read from output image …

}

outputinput F1 F2

context

graph

inter-

mediate

* Use #include <VX/vx.h> for OpenVX header files

© Copyright Khronos Group 2016 - Page 9

OpenVX 1.1 Built-in Vision Functions

Kernels

Pixel-wise FunctionsAdd, Subtract, Multiply, AbsDiff,And, Or, Xor, Not,Magnitude, Phase,Threshold, TableLookup, ColorDepth,ChannelExtract, ChannelCombine,ColorConvert,AccumulateImage,AccumulateSquaredImage,AccumulateWeightedImage,

Reduction FunctionsHistogram, MeanStdDev, MinMaxLoc

Complex FunctionsCannyEdgeDetector, EqualizeHist,FastCorners, HarrisCorners, IntegralImage,OpticalFlowPyrLK

Filtering FunctionsBox3x3, Convolve, Dilate3x3, Erode3x3,Gaussian3x3, Median3x3, Sobel3x3,GaussianPyramid, NonLinearFilter,LaplacianPyramid, LaplacianReconstruct

Geometric FunctionsRemap, ScaleImage, WarpAffine,WarpPerspective, HalfScaleGaussian

* See “VX/vx_nodes.h” for functions to create kernel instances (nodes) in a graph.

© Copyright Khronos Group 2016 - Page 10

Rectangle

typedef struct _vx_rectangle_t {

vx_uint32 start_x; /*!< \brief The Start X coordinate. */

vx_uint32 start_y; /*!< \brief The Start Y coordinate. */

vx_uint32 end_x; /*!< \brief The End X coordinate. */

vx_uint32 end_y; /*!< \brief The End Y coordinate. */

} vx_rectangle_t;

Image

end : outside

start : inside

rectangle

Type enumeration: VX_TYPE_RECTANGLE

© Copyright Khronos Group 2016 - Page 11

Keypoints

typedef struct _vx_keypoint_t {

vx_int32 x; // keypoint x-coordinate

vx_int32 y; // keypoint y-coordinate

vx_float32 strength; // strength of keypoint

vx_float32 scale;

vx_float32 orientation;

vx_int32 tracking_status; // zero indicates lost point. Initialized to 1 by detectors

vx_float32 error;

} vx_keypoint_t;

Image

key-point

Type enumeration: VX_TYPE_KEYPOINT

© Copyright Khronos Group 2016 - Page 12

Array Data Objectvx_array vxCreateArray (

vx_context context,

vx_enum item_type, // VX_TYPE_KEYPOINT, VX_TYPE_UINT32, ...

vx_size capacity

);

0 1 2 3 4 5 6 7 8 9 10 11 ... capacity-1

num_items

vx_array array = vxCreateArray( context, VX_TYPE_RECTANGLE, 64 );

// remove all items from array and add 8 items

vxTruncateArray( array, 0 );

vxAddArrayItems( array, 8, &rect[0], sizeof(vx_rectangle_t) );

// get number items in the array by querying array attribute

vxQueryArray(array, VX_ARRAY_NUMITEMS, &num_items, sizeof(num_items));

© Copyright Khronos Group 2016 - Page 13

Array Data Access

• Access limited in time

- vxMapArrayRange: get access (Read, Write, Read & Write)

- vxUnmapArrayRange: release the access

vx_map_id map_id;

void * ptr;

vxQueryArray( arr, VX_ARRAY_NUMITEMS, &num_items, sizeof(num_items) );

vxMapArrayRange( arr, 0, num_items, &map_id, &stride, &ptr,

VX_READ_AND_WRITE, VX_MEMORY_TYPE_HOST, 0 );

// Access data in ptr

vxUnmapArrayRange( arr, map_id );

• Copy using application controlled address and memory layout

- vxCopyArrayRange: copy (Read or Write)

vxQueryArray( arr, VX_ARRAY_NUMITEMS, &num_items, sizeof(num_items) );

vxCopyArrayRange( arr, 0, num_items, sizeof(my_array[0]), &my_array[0],

VX_READ_ONLY, VX_MEMORY_TYPE_HOST );

© Copyright Khronos Group 2016 - Page 14

Image Access (1/2) : Overview• Copy using application controlled address and memory layout

- vxCopyImagePatch: copy (Read or Write)

vx_imagepatch_addressing_t addr = { /* to fill stride_x & stride_y */ };

vx_rectangle_t rect = { 0u, 0u, width, height };

vxCopyImagePatch( img, &rect, plane, &addr, my_array,

VX_WRITE_ONLY, VX_MEMORY_TYPE_HOST, VX_NOGAP_X );

• Access limited in time

- vxMapImagePatch: get access (Read, Write, Read & Write)

- vxUnmapImagePatch: release the access

vx_map_id map_id;

void * ptr;

vx_imagepatch_addressing_t addr;

vx_rectangle_t rect = { 0u, 0u, width, height };

vxMapImagePatch( img, &rect, plane, &map_id, &addr, &ptr,

VX_READ_AND_WRITE, VX_MEMORY_TYPE_HOST, VX_NOGAP_X );

// Access data in ptr

vxUnmapImagePatch( img, map_id );

© Copyright Khronos Group 2016 - Page 15

Image Access (2/2) : Memory Layout

typedef struct _vx_imagepatch_addressing_t {

vx_uint32 dim_x;

vx_uint32 dim_y;

vx_int32 stride_x;

vx_int32 stride_y;

vx_uint32 scale_x;

vx_uint32 scale_y;

vx_uint32 step_x;

vx_uint32 step_y;

} vx_imagepatch_addressing_t;

……

Num of (logical) pixels in a row

Patc

h

Num of (logical) pixels in a column

Num of bytes between the beginning of 2 successive pixels

stride_x

stride_y

Num of bytes between the beginning of 2 successive lines

Sub-sampling :

1 physical pixel every ‘step’ logical pixel

scale = VX_SCALE_UNITY / step

© Copyright Khronos Group 2016 - Page 16

Feature Tracking Example

keypoint array(x, y, …) at t=N

Compute Pyramid

Optical Flow (LK)

Harris Detector

Computed Data

keypoint array(x, y, …) at t=N-1

Copy of Data from Previous Iteration

keep old copy

Convert to Grayscale

Input Image from CAMERA

vx_pyramid at t=N-1 vx_pyramid at t=N

vx_image

vx_image

vx_array

vx_delay of pyramids

vx_delay of keypoints

vx_node

vx_nodevx_context

vx_graph

vx_graph

© Copyright Khronos Group 2016 - Page 17

Pyramid Data Objectvx_pyramid vxCreatePyramid (

vx_context context,

vx_size levels,

vx_float32 scale, // VX_SCALE_PYRAMID_HALF or VX_SCALE_PYRAMID_ORB

vx_uint32 width,

vx_uint32 height,

vx_df_image format // VX_DF_IMAGE_U8

);

Level 0 (base)

Level 1

Level 2Level 3

Example:

vx_pyramid pyramid = vxCreatePyramid(context, …);

// get image at pyramid level 2

vx_image img2 = vxGetPyramidLevel( pyramid, 2 );

vxReleaseImage( &img2 );

vxReleasePyramid( &pyramid );

© Copyright Khronos Group 2016 - Page 18

Delay Data Object

vx_delay vxCreateDelay

(

vx_context context,

vx_reference exemplar,

vx_size count

);

Example:

vx_pyramid exemplar = vxCreatePyramid(context, …);

vx_delay pyr_delay = vxCreateDelay(context, (vx_reference)exemplar, 2);

vxReleasePyramid(&exemplar);

vx_pyramid pyr_0 = (vx_pyramid)vxGetReferenceFromDelay(pyr_delay, 0);

vx_pyramid pyr_1 = (vx_pyramid)vxGetReferenceFromDelay(pyr_delay, -1);

vxAgeDelay(pyr_delay);

© Copyright Khronos Group 2016 - Page 19

Data Objects

keypoint array(x, y, …) at t=N

Computed Data

keypoint array(x, y, …) at t=N-1

Copy of Data from Previous Iteration

keep old copy

Input Image from CAMERA

vx_pyramid at t=N-1 vx_pyramid at t=N

vx_image

vx_array

vx_delay of pyramids

vx_delay of keypoints

vx_context

© Copyright Khronos Group 2016 - Page 20

Harris Graph

keypoint array(x, y, …) at t=N

GaussianPyramid

HarrisCorners

Computed Data

keypoint array(x, y, …) at t=N-1

Copy of Data from Previous Iteration

keep old copy

ColorConvert

Input Image from CAMERA

vx_pyramid at t=N-1 vx_pyramid at t=N

vx_image

(RGB)

vx_image

(virtual U008)

vx_array

vx_delay of pyramids

vx_delay of keypoints

vx_node

vx_context

vx_graph

ChannelExtract

vx_image

(virtual IYUV)

vx_node

vx_node

vx_node

additional

parameters

© Copyright Khronos Group 2016 - Page 21

Vision Functions in a Graph• RGB -> YUV

vxColorConvertNode( graph, input_rgb_image, harris_yuv_image );

VX_DF_IMAGE_RGB VX_DF_IMAGE_YUV

vxChannelExtractNode( graph, harris_yuv_image, VX_CHANNEL_Y, harris_gray_image );

VX_DF_IMAGE_YUV VX_DF_IMAGE_U8

vxHarrisCornersNode( graph, harris_gray_image, strength_thresh, min_distance,

sensitivity, gradient_size, block_size,

keypoint_array_output, NULL );

• YUV -> Y

•Harris corner- strength_thresh : 0.0005f

- min_distance : 5.0f

- sensitivity : 0.04f

- gradient_size : 3

- block_size : 3

© Copyright Khronos Group 2016 - Page 22

Optical Flow Graph

keypoint array(x, y, …) at t=N

GaussianPyramid

Computed Data

keypoint array(x, y, …) at t=N-1

Copy of Data from Previous Iteration

keep old copy

ColorConvert

Input Image from CAMERA

vx_pyramid at t=N-1 vx_pyramid at t=N

vx_image

(RGB)

vx_image

(virtual U008)

vx_array

vx_delay of pyramids

vx_delay of keypoints

vx_context

ChannelExtract

vx_image

(virtual IYUV)

vx_node

vx_node

vx_node

OpticalFlowPyrLK

vx_graph

vx_nodeadditional

parameters

© Copyright Khronos Group 2016 - Page 23

Execute a Graph in Loop to Process Input

• Before executing Harris & Optical Flow Graphs

- vxVerifyGraph API should return VX_SUCCESS (outside the loop)

• Inside the loop -- process each image from input video sequence

- write pixels from input video into input RGB image

- Execute Graphs using vxProcessGraph API- Execute Harris Graph for the 1st image from video sequence

- Execute Optical Flow Graph from 2nd image onwards

- Read previous and current keypoints and draw each item- Use vxGetReferenceFromDelay API to get previous and current keypoint arrays

- Flip the previous and current pyramid and keypoints in delay objects- Use vxAgeDelay API

- This will automatically trigger flipping of previous and current pyramids in all the

graphs

• After the processing loop

- Query VX_GRAPH_ATTRIBUTE_PERFORMANCE for performance measurements

- Release all objects -- make sure to release context at the end

© Copyright Khronos Group 2016 - Page 24

Summary

• OpenVX is a low-level programming framework domain to enable software developers to efficiently access computer vision hardware acceleration with both functional and performance portability.

• OpenVX contains:- a library of predefined and customizable vision functions- a graph-based execution model to combine function enabling both task and

data-independent execution, and;- a set of memory objects that abstract the physical memory.

• OpenVX is defined as a C API- object-oriented design- synchronous and asynchronous execution model- extend functionality using enums and callbacks

Useful Links: www.khronos.org/registry/vx and github.com/rgiduthuri/openvx_tutorial