OpenVX webinar no animations - The Khronos … Nodes can be on any hardware or processor coded in...
-
Upload
nguyenkhuong -
Category
Documents
-
view
228 -
download
0
Transcript of OpenVX webinar no animations - The Khronos … Nodes can be on any hardware or processor coded in...
© Copyright Khronos Group 2016 - Page 2
Khronos Open Standards
Software
Silicon
Khronos is an Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access hardware
acceleration for graphics, parallel compute and vision
© Copyright Khronos Group 2016 - Page 3
Vision Processing Power Efficiency• Vision processing just on CPU is too expensive- Especially on battery-powered devices
• GPUs are more power-efficient- They were architected for efficient pixel handling
• Traditional cameras have dedicated hardware- ISP = Image Signal Processor – on all SOCs today
AdvancedSensors
Wearables
But how to program specialized processors?Performance and Functional Portability
Pow
er E
ffic
ienc
y
Computation Flexibility
Dedicated Hardware
GPUCompute
Multi-coreCPUX1
X10
X100 Vision Processing Efficiency
Vision DSPs
© Copyright Khronos Group 2016 - Page 4
OpenVX – Low-Power Vision Acceleration • Higher-level abstraction API- Targeted at real-time mobile and embedded platforms
• Performance portability across diverse architectures- Multi-core CPUs, GPUs, DSPs, ISPs, Dedicated hardware, …
• Extends portable vision acceleration to very low-power domains- Doesn’t require high-power CPU/GPU Complex
Accelerator
Vision Engine
Middleware
Application
Accelerator
Accelerator
Pow
er E
ffic
ienc
y
Computation Flexibility
Dedicated Hardware
GPUCompute
Multi-coreCPUX1
X10
X100 Vision Processing Efficiency
Vision DSPs
© Copyright Khronos Group 2016 - Page 5
OpenVX Graphs• OpenVX developers express a graph of image operations (‘Nodes’)- Nodes can be on any hardware or processor coded in any language- For example, on GPU, nodes may implemented in OpenCL
• Minimizes host interaction during frame-rate graph execution- Host processor can setup graph which can then execute almost autonomously
Array of Keypoints
YUVFrame
GrayFrame
CameraInput
RenderingOutput
Pyrt
Color Conversion
Channel Extract
Optical Flow
Harris Track
Image Pyramid
RGBFrame
Array of FeaturesFtrt-1OpenVX Graph
OpenVX Nodes
© Copyright Khronos Group 2016 - Page 6
OpenVX Framework Efficiency
Reuse pre-allocated memory for multiple intermediate data
MemoryManagement
Less allocation overhead,more memory forother applications
Replace a sub-graph with a single faster node
Kernel Merging
Better memorylocality, less kernel launch overhead
Split the graph execution across the whole system: CPU / GPU / dedicated HW
GraphScheduling
Faster executionor lower powerconsumption
Execute a sub-graph at tile granularity instead of image granularity
DataTiling
Better use of data cache andlocal memory
© Copyright Khronos Group 2016 - Page 7
OpenVX and OpenCV are Complementary
Implementation Community-driven open source library Open standard API designed to be implemented by hardware vendors
ConformanceExtensive OpenCV Test Suite but
no formal Adopters programImplementations must pass defined
conformance test suite to use trademark
Consistency Available functions can vary depending on implementation / platform
All core functions must be available in all conformant implementations
Scope Very wide 1000s of imaging and vision functions
Tight focus on core hardware accelerated functions for mobile vision – but extensible
Efficiency Memory-based architectureEach operation reads and writes to memory
Graph-based executionOptimizable computation and data transfer
Typical Use CaseRapid experimentation and
prototyping - especially on desktopProduction development & deployment on
wide range of mobile and embedded devices
© Copyright Khronos Group 2016 - Page 8
OpenVX 1.0 Shipping, OpenVX 1.1 Released!•Multiple OpenVX 1.0 Implementations shipping – spec in October 2014-Open source sample implementation and conformance tests available
•OpenVX 1.1 Specification released in May 2016-Expands node functionality AND enhances graph framework-Sample source and conformance tests will be updated to OpenVX 1.1 soon
•OpenVX is EXTENSIBLE-Implementers can add their own nodes at any time to meet customer and market needs
= provided results for conformance tests
© Copyright Khronos Group 2016 - Page 1
OpenVX Technical OverviewKhronos Webinar
khronos.org/openvx
Radhakrishna Giduthuri | AMD
© Copyright Khronos Group 2016 - Page 2
OpenVX Components
Context (vx_context)
Data Objectsvx_image, vx_pyramid, vx_array, vx_lut,vx_remap, vx_scalar, vx_threshold,vx_distribution, vx_matrix, vx_convolution,vx_delay, vx_object_array
Kernels (vx_kernel)Built-in vision functions,Vendor extensions,User-defined
MiscellaneousDirectives, Hints, Logging, Performance Measurements
Graphs (vx_graph)
Nodes (vx_node)Kernel instances, parameters,completion callback functions
Virtual Datavx_image, vx_pyramid, vx_array,vx_object_array
ExtensionsTiling, XML Schema
© Copyright Khronos Group 2016 - Page 3
Context
• Context
- OpenVX world: need to be created first
- All objects belong to a context
#include <VX/vx.h>
...
vx_context context = vxCreateContext();
* See “VX/vx_api.h” for framework API function definitions.
© Copyright Khronos Group 2016 - Page 4
•Explicit status checkObject creation: use vxGetStatus to check the object
•More info from the log callback
Error Management• Methods return a status
vx_status returned: VX_SUCCESS when no error
if( vxProcessGraph( graph ) != VX_SUCCESS) { /* Error */ }
vx_context context = vxCreateContext();
if( vxGetStatus( (vx_reference)context ) != VX_SUCCESS ) { /* Error */ }
void logCallback( vx_context c, vx_reference r, vx_status s,
const vx_char string[] )
{ /* Do something */ }
...
vxRegisterLogCallback( context, logCallback, vx_false_e );
...
vxAddLogEntry( reference, VX_INVALID_VALUE, ”specified value is out of range” );
* See “VX/vx_types.h” for type definitions and error codes.
© Copyright Khronos Group 2016 - Page 5
Data objects
vx_image img = vxCreateImage( context, 640, 400, VX_DF_IMAGE_RGB );
// Use the image
vxReleaseImage( &img );
•The application gets only references to objects, not the objects
-References should be released by the application when not needed
-Ref-counted object destroyed by OpenVX when not referenced any more
•Object-Oriented Behavior
-strongly typed (good for safety-critical applications)
-OpenVX are really pointers to structs- any object may be down-cast to a vx_reference, e.g., for passing to vxGetStatus()
•Opaque
-Access to content explicit and temporary (map/unmap or copy)- No permanent pointer to internal data
-Needed to handle complex memory hierarchies- DSP local memory
- GPU dedicated memory
© Copyright Khronos Group 2016 - Page 6
Enumerated Data TypesC data type Enumeration
vx_uint8 (basic data type) VX_TYPE_UINT8
vx_int16 VX_TYPE_INT16
vx_uint16 VX_TYPE_UINT16
vx_int32 VX_TYPE_INT32
vx_float32 VX_TYPE_FLOAT32
vx_enum VX_TYPE_ENUM
…
vx_rectangle_t (struct) VX_TYPE_RECTANGLE
vx_keypoint_t VX_TYPE_KEYPOINT
… …
vx_image (opaque object) VX_TYPE_IMAGE
© Copyright Khronos Group 2016 - Page 7
Data Object Creation
vx_image img = vxCreateImage( ctx, 640, 400, VX_DF_IMAGE_UYVY ); // supports 13 standard formats
vx_pyramid pyr = vxCreatePyramid( ctx, levels, VX_SCALE_PYRAMID_HALF, 640, 400, VX_DF_IMAGE_U8 );
vx_array arr = vxCreateArray( ctx, VX_TYPE_KEYPOINT, capacity ); // array of vx_keypoint_t[]
vx_lut lut = vxCreateLUT( ctx, VX_TYPE_UINT8, 256 ); // 8-bit look-up table
vx_remap remap = vxCreateRemap( ctx, src_width, src_height, dst_width, dst_height );
vx_float32 scalar_initial_value = 1.25f;
vx_scalar scalar = vxCreateScalar( ctx, VX_TYPE_FLOAT32, &scalar_initial_value );
vx_matrix mat = vxCreateMatrix( ctx, VX_TYPE_FLOAT32, columns, rows );
vx_delay delay = vxCreateDelay( ctx, (vx_reference)pyr, num_slots ); // pyr is an exemplar
vx_object_array obj_arr = vxCreateObjectArray( ctx, (vx_reference)pyr, count );
vx_distribution dist = vxCreateDistribution( ctx, num_bins, offset, range );
OpenVX Graphvx_context context = vxCreateContext();
vx_image input = vxCreateImage( context, 640, 480, VX_DF_IMAGE_U8 );
vx_image output = vxCreateImage( context, 640, 480, VX_DF_IMAGE_U8 );
vx_graph graph = vxCreateGraph( context );
vx_image intermediate = vxCreateVirtualImage( graph, 640, 480, VX_DF_IMAGE_U8 );
vx_node F1 = vxF1Node( graph, input, intermediate );
vx_node F2 = vxF2Node( graph, intermediate, output );
vxVerifyGraph( graph );
while(...) {
// … write to input image …
vxProcessGraph( graph );
// … read from output image …
}
outputinput F1 F2
context
graph
inter-
mediate
* Use #include <VX/vx.h> for OpenVX header files
© Copyright Khronos Group 2016 - Page 9
OpenVX 1.1 Built-in Vision Functions
Kernels
Pixel-wise FunctionsAdd, Subtract, Multiply, AbsDiff,And, Or, Xor, Not,Magnitude, Phase,Threshold, TableLookup, ColorDepth,ChannelExtract, ChannelCombine,ColorConvert,AccumulateImage,AccumulateSquaredImage,AccumulateWeightedImage,
Reduction FunctionsHistogram, MeanStdDev, MinMaxLoc
Complex FunctionsCannyEdgeDetector, EqualizeHist,FastCorners, HarrisCorners, IntegralImage,OpticalFlowPyrLK
Filtering FunctionsBox3x3, Convolve, Dilate3x3, Erode3x3,Gaussian3x3, Median3x3, Sobel3x3,GaussianPyramid, NonLinearFilter,LaplacianPyramid, LaplacianReconstruct
Geometric FunctionsRemap, ScaleImage, WarpAffine,WarpPerspective, HalfScaleGaussian
* See “VX/vx_nodes.h” for functions to create kernel instances (nodes) in a graph.
© Copyright Khronos Group 2016 - Page 10
Rectangle
typedef struct _vx_rectangle_t {
vx_uint32 start_x; /*!< \brief The Start X coordinate. */
vx_uint32 start_y; /*!< \brief The Start Y coordinate. */
vx_uint32 end_x; /*!< \brief The End X coordinate. */
vx_uint32 end_y; /*!< \brief The End Y coordinate. */
} vx_rectangle_t;
Image
end : outside
start : inside
rectangle
Type enumeration: VX_TYPE_RECTANGLE
© Copyright Khronos Group 2016 - Page 11
Keypoints
typedef struct _vx_keypoint_t {
vx_int32 x; // keypoint x-coordinate
vx_int32 y; // keypoint y-coordinate
vx_float32 strength; // strength of keypoint
vx_float32 scale;
vx_float32 orientation;
vx_int32 tracking_status; // zero indicates lost point. Initialized to 1 by detectors
vx_float32 error;
} vx_keypoint_t;
Image
key-point
Type enumeration: VX_TYPE_KEYPOINT
© Copyright Khronos Group 2016 - Page 12
Array Data Objectvx_array vxCreateArray (
vx_context context,
vx_enum item_type, // VX_TYPE_KEYPOINT, VX_TYPE_UINT32, ...
vx_size capacity
);
0 1 2 3 4 5 6 7 8 9 10 11 ... capacity-1
num_items
vx_array array = vxCreateArray( context, VX_TYPE_RECTANGLE, 64 );
…
// remove all items from array and add 8 items
vxTruncateArray( array, 0 );
vxAddArrayItems( array, 8, &rect[0], sizeof(vx_rectangle_t) );
…
// get number items in the array by querying array attribute
vxQueryArray(array, VX_ARRAY_NUMITEMS, &num_items, sizeof(num_items));
© Copyright Khronos Group 2016 - Page 13
Array Data Access
• Access limited in time
- vxMapArrayRange: get access (Read, Write, Read & Write)
- vxUnmapArrayRange: release the access
vx_map_id map_id;
void * ptr;
vxQueryArray( arr, VX_ARRAY_NUMITEMS, &num_items, sizeof(num_items) );
vxMapArrayRange( arr, 0, num_items, &map_id, &stride, &ptr,
VX_READ_AND_WRITE, VX_MEMORY_TYPE_HOST, 0 );
// Access data in ptr
vxUnmapArrayRange( arr, map_id );
• Copy using application controlled address and memory layout
- vxCopyArrayRange: copy (Read or Write)
vxQueryArray( arr, VX_ARRAY_NUMITEMS, &num_items, sizeof(num_items) );
vxCopyArrayRange( arr, 0, num_items, sizeof(my_array[0]), &my_array[0],
VX_READ_ONLY, VX_MEMORY_TYPE_HOST );
© Copyright Khronos Group 2016 - Page 14
Image Access (1/2) : Overview• Copy using application controlled address and memory layout
- vxCopyImagePatch: copy (Read or Write)
vx_imagepatch_addressing_t addr = { /* to fill stride_x & stride_y */ };
vx_rectangle_t rect = { 0u, 0u, width, height };
vxCopyImagePatch( img, &rect, plane, &addr, my_array,
VX_WRITE_ONLY, VX_MEMORY_TYPE_HOST, VX_NOGAP_X );
• Access limited in time
- vxMapImagePatch: get access (Read, Write, Read & Write)
- vxUnmapImagePatch: release the access
vx_map_id map_id;
void * ptr;
vx_imagepatch_addressing_t addr;
vx_rectangle_t rect = { 0u, 0u, width, height };
vxMapImagePatch( img, &rect, plane, &map_id, &addr, &ptr,
VX_READ_AND_WRITE, VX_MEMORY_TYPE_HOST, VX_NOGAP_X );
// Access data in ptr
vxUnmapImagePatch( img, map_id );
© Copyright Khronos Group 2016 - Page 15
Image Access (2/2) : Memory Layout
typedef struct _vx_imagepatch_addressing_t {
vx_uint32 dim_x;
vx_uint32 dim_y;
vx_int32 stride_x;
vx_int32 stride_y;
vx_uint32 scale_x;
vx_uint32 scale_y;
vx_uint32 step_x;
vx_uint32 step_y;
} vx_imagepatch_addressing_t;
……
…
…
Num of (logical) pixels in a row
Patc
h
Num of (logical) pixels in a column
Num of bytes between the beginning of 2 successive pixels
stride_x
stride_y
Num of bytes between the beginning of 2 successive lines
Sub-sampling :
1 physical pixel every ‘step’ logical pixel
scale = VX_SCALE_UNITY / step
© Copyright Khronos Group 2016 - Page 16
Feature Tracking Example
keypoint array(x, y, …) at t=N
Compute Pyramid
Optical Flow (LK)
Harris Detector
Computed Data
keypoint array(x, y, …) at t=N-1
Copy of Data from Previous Iteration
keep old copy
Convert to Grayscale
Input Image from CAMERA
vx_pyramid at t=N-1 vx_pyramid at t=N
vx_image
vx_image
vx_array
vx_delay of pyramids
vx_delay of keypoints
vx_node
vx_nodevx_context
vx_graph
vx_graph
© Copyright Khronos Group 2016 - Page 17
Pyramid Data Objectvx_pyramid vxCreatePyramid (
vx_context context,
vx_size levels,
vx_float32 scale, // VX_SCALE_PYRAMID_HALF or VX_SCALE_PYRAMID_ORB
vx_uint32 width,
vx_uint32 height,
vx_df_image format // VX_DF_IMAGE_U8
);
Level 0 (base)
Level 1
Level 2Level 3
Example:
vx_pyramid pyramid = vxCreatePyramid(context, …);
…
// get image at pyramid level 2
vx_image img2 = vxGetPyramidLevel( pyramid, 2 );
…
vxReleaseImage( &img2 );
…
vxReleasePyramid( &pyramid );
© Copyright Khronos Group 2016 - Page 18
Delay Data Object
vx_delay vxCreateDelay
(
vx_context context,
vx_reference exemplar,
vx_size count
);
Example:
vx_pyramid exemplar = vxCreatePyramid(context, …);
vx_delay pyr_delay = vxCreateDelay(context, (vx_reference)exemplar, 2);
vxReleasePyramid(&exemplar);
…
vx_pyramid pyr_0 = (vx_pyramid)vxGetReferenceFromDelay(pyr_delay, 0);
vx_pyramid pyr_1 = (vx_pyramid)vxGetReferenceFromDelay(pyr_delay, -1);
…
vxAgeDelay(pyr_delay);
© Copyright Khronos Group 2016 - Page 19
Data Objects
keypoint array(x, y, …) at t=N
Computed Data
keypoint array(x, y, …) at t=N-1
Copy of Data from Previous Iteration
keep old copy
Input Image from CAMERA
vx_pyramid at t=N-1 vx_pyramid at t=N
vx_image
vx_array
vx_delay of pyramids
vx_delay of keypoints
vx_context
© Copyright Khronos Group 2016 - Page 20
Harris Graph
keypoint array(x, y, …) at t=N
GaussianPyramid
HarrisCorners
Computed Data
keypoint array(x, y, …) at t=N-1
Copy of Data from Previous Iteration
keep old copy
ColorConvert
Input Image from CAMERA
vx_pyramid at t=N-1 vx_pyramid at t=N
vx_image
(RGB)
vx_image
(virtual U008)
vx_array
vx_delay of pyramids
vx_delay of keypoints
vx_node
vx_context
vx_graph
ChannelExtract
vx_image
(virtual IYUV)
vx_node
vx_node
vx_node
additional
parameters
© Copyright Khronos Group 2016 - Page 21
Vision Functions in a Graph• RGB -> YUV
vxColorConvertNode( graph, input_rgb_image, harris_yuv_image );
VX_DF_IMAGE_RGB VX_DF_IMAGE_YUV
vxChannelExtractNode( graph, harris_yuv_image, VX_CHANNEL_Y, harris_gray_image );
VX_DF_IMAGE_YUV VX_DF_IMAGE_U8
vxHarrisCornersNode( graph, harris_gray_image, strength_thresh, min_distance,
sensitivity, gradient_size, block_size,
keypoint_array_output, NULL );
• YUV -> Y
•Harris corner- strength_thresh : 0.0005f
- min_distance : 5.0f
- sensitivity : 0.04f
- gradient_size : 3
- block_size : 3
© Copyright Khronos Group 2016 - Page 22
Optical Flow Graph
keypoint array(x, y, …) at t=N
GaussianPyramid
Computed Data
keypoint array(x, y, …) at t=N-1
Copy of Data from Previous Iteration
keep old copy
ColorConvert
Input Image from CAMERA
vx_pyramid at t=N-1 vx_pyramid at t=N
vx_image
(RGB)
vx_image
(virtual U008)
vx_array
vx_delay of pyramids
vx_delay of keypoints
vx_context
ChannelExtract
vx_image
(virtual IYUV)
vx_node
vx_node
vx_node
OpticalFlowPyrLK
vx_graph
vx_nodeadditional
parameters
© Copyright Khronos Group 2016 - Page 23
Execute a Graph in Loop to Process Input
• Before executing Harris & Optical Flow Graphs
- vxVerifyGraph API should return VX_SUCCESS (outside the loop)
• Inside the loop -- process each image from input video sequence
- write pixels from input video into input RGB image
- Execute Graphs using vxProcessGraph API- Execute Harris Graph for the 1st image from video sequence
- Execute Optical Flow Graph from 2nd image onwards
- Read previous and current keypoints and draw each item- Use vxGetReferenceFromDelay API to get previous and current keypoint arrays
- Flip the previous and current pyramid and keypoints in delay objects- Use vxAgeDelay API
- This will automatically trigger flipping of previous and current pyramids in all the
graphs
• After the processing loop
- Query VX_GRAPH_ATTRIBUTE_PERFORMANCE for performance measurements
- Release all objects -- make sure to release context at the end
© Copyright Khronos Group 2016 - Page 24
Summary
• OpenVX is a low-level programming framework domain to enable software developers to efficiently access computer vision hardware acceleration with both functional and performance portability.
• OpenVX contains:- a library of predefined and customizable vision functions- a graph-based execution model to combine function enabling both task and
data-independent execution, and;- a set of memory objects that abstract the physical memory.
• OpenVX is defined as a C API- object-oriented design- synchronous and asynchronous execution model- extend functionality using enums and callbacks
Useful Links: www.khronos.org/registry/vx and github.com/rgiduthuri/openvx_tutorial