Vision BOF - SIGGRAPH 2014
-
Upload
the-khronos-group-inc -
Category
Technology
-
view
499 -
download
3
description
Transcript of Vision BOF - SIGGRAPH 2014
© Copyright Khronos Group 2014 - Page 1
Open Standard APIs for Vision and Camera Processing
Neil Trevett Vice President Mobile Ecosystem, NVIDIA
President, Khronos Group
© Copyright Khronos Group 2014 - Page 2
Khronos Connects Software to Silicon
Open Consortium creating
ROYALTY-FREE, OPEN STANDARD
APIs for hardware acceleration
Defining the roadmap for
low-level silicon interfaces
needed on every platform
Graphics, compute, rich media,
vision, sensor and camera
processing
Rigorous specifications AND
conformance tests for cross-
vendor portability
Acceleration APIs
BY the Industry
FOR the Industry
Well over a BILLION people use Khronos APIs
Every Day…
© Copyright Khronos Group 2014 - Page 3
Khronos Standards
Visual Computing - 3D Graphics - Heterogeneous Parallel Computing
3D Asset Handling - 3D authoring asset interchange
- 3D asset transmission format with compression
Acceleration in HTML5 - 3D in browser – no Plug-in
- Heterogeneous computing for JavaScript
Over 100 companies defining royalty-free
APIs to connect software to silicon
Sensor Processing - Vision Acceleration - Camera Control - Sensor Fusion
© Copyright Khronos Group 2014 - Page 4
Visual Computing = Graphics PLUS Vision
Real-time GPU Compute Research project on CUDA-enabled laptop
High-Quality Reflections, Refractions, and Caustics in Augmented
Reality and their Contribution to Visual Coherence
P. Kán, H. Kaufmann, Institute of Software Technology and Interactive
Systems, Vienna University of Technology, Vienna, Austria
https://www.youtube.com/watch?v=i2MEwVZzDaA
Imagery
Data
Vision Processing
Graphics Processing
Enhanced sensor and vision
capability deepens the interaction
between real and virtual worlds
© Copyright Khronos Group 2014 - Page 5
Mobile Visual Computing = New Experiences
Augmented Reality
Face, Body and Gesture Tracking
Computational Photography and
Videography
3D Scene/Object Reconstruction
Need for advanced sensors and the GPU throughput to process them
© Copyright Khronos Group 2014 - Page 6
Vision Pipeline Challenges and Opportunities
• Light / Proximity
• 2 cameras
• 3 microphones
• Touch
• Position
- GPS
- WiFi (fingerprint)
- Cellular trilateration
- NFC/Bluetooth Beacons
• Accelerometer
• Magnetometer
• Gyroscope
• Pressure / Temp / Humidity
19
Sensor Proliferation Diverse sensor awareness of
the user and surroundings
• Camera sensors >20MPix
• Novel sensor configurations
• Stereo pairs
• Plenoptic Arrays
• Active Structured Light
• Active TOF
Growing Camera Diversity Capturing color, range
and lightfields
Diverse Vision Processors Driving for high performance
and low power
• Camera ISPs
• Dedicated vision IP blocks
• DSPs and DSP arrays
• Programmable GPUs
• Multi-core CPUs
Flexible sensor and camera
control to generate
required image stream
Use best processing available
for image stream processing –
with code portability
Control/fuse vision data
by/with all other sensor data
on device
© Copyright Khronos Group 2014 - Page 7
Vision Processing Power Efficiency • Depth sensors = significant processing
- Generate/use environmental information
• Wearables will need ‘always-on’ vision
- With smaller thermal limit / battery than phones!
• GPUs has x10 CPU imaging power efficiency
- GPUs architected for efficient pixel handling
• Traditional cameras have dedicated hardware
- ISP = Image Signal Processor – on all SOCs today
• SOCs have space for more transistors
- But can’t turn on at same time = Dark Silicon
• Potential for dedicated sensor/vision silicon
- Can trigger full CPU/GPU complex
Pow
er
Eff
icie
ncy
Computation Flexibility
Dedicated Hardware
GPU Compute
Multi-core CPU X1
X10
X100
Advanced Sensors
Wearables
But how to program specialized processors? Performance and Functional Portability
© Copyright Khronos Group 2014 - Page 8
OpenVX – Power Efficient Vision Acceleration • Out-of-the-Box vision acceleration framework
- Low-power, real-time, mobile and embedded
• Performance portability for diverse hardware
- ISPs, Dedicated vision blocks,
DSPs and DSP arrays, GPUs, Multi-core CPUs
• Suited for low-power, always-on acceleration
- Can run solely on dedicated vision hardware
• Foundational API for vision acceleration
- Can be used by middleware or applications
• Complementary to OpenCV
- Which is great for prototyping
• Khronos open source sample implementation
- To be released with final specification Open source sample
implementation
Hardware vendor
implementations
OpenCV open
source library
Other higher-level
CV libraries
Application
© Copyright Khronos Group 2014 - Page 9
OpenVX Graphs – The Key to Efficiency • Vision processing directed graphs for power and performance efficiency
- Each Node can be implemented in software or accelerated hardware
- Nodes may be fused by the implementation to eliminate memory transfers
- Processing can be tiled to keep data entirely in local memory/cache
• VXU Utility Library for access to single nodes
- Easy way to start using OpenVX by calling each node independently
• EGLStreams can provide data and event interop with other Khronos APIs
- BUT use of other Khronos APIs are not mandated
OpenVX Node
OpenVX Node
OpenVX Node
OpenVX Node
Downstream
Application
Processing
Native
Camera
Control
Example OpenVX Graph
© Copyright Khronos Group 2014 - Page 10
OpenVX 1.0 Function Overview • Core data structures
- Images and Image Pyramids
- Processing Graphs, Kernels, Parameters
• Image Processing
- Arithmetic, Logical, and statistical operations
- Multichannel Color and BitDepth Extraction and Conversion
- 2D Filtering and Morphological operations
- Image Resizing and Warping
• Core Computer Vision
- Pyramid computation
- Integral Image computation
• Feature Extraction and Tracking
- Histogram Computation and Equalization
- Canny Edge Detection
- Harris and FAST Corner detection
- Sparse Optical Flow
OpenVX 1.0 defines
framework for
creating, managing and
executing graphs
Focused set of widely
used functions that are
readily accelerated
Implementers can add
functions as extensions
Widely used extensions
adopted into future
versions of the core
OpenVX Specification
Evolution
© Copyright Khronos Group 2014 - Page 11
Example Graph - Stereo Machine Vision
Camera 1 Compute Depth
Map (User Node)
Detect and track objects (User Node)
Camera 2
Image Pyramid
Stereo Rectify with
Remap
Stereo Rectify with
Remap
Compute Optical Flow
Object
coordinates
OpenVX Graph
Delay
Tiling extension enables user nodes (extensions) to also optimally run in local memory
© Copyright Khronos Group 2014 - Page 12
OpenVX and OpenCV are Complementary
Governance Community driven open source
with no formal specification
Formal specification defined and
implemented by hardware vendors
Conformance No conformance tests for consistency and
every vendor implements different subset
Full conformance test suite / process
creates a reliable acceleration platform
Portability APIs can vary depending on processor Hardware abstracted for portability
Scope Very wide
1000s of imaging and vision functions
Multiple camera APIs/interfaces
Tight focus on hardware accelerated
functions for mobile vision
Use external camera API
Efficiency Memory-based architecture
Each operation reads and writes memory
Graph-based execution
Optimizable computation, data transfer
Use Case Rapid experimentation Production development & deployment
© Copyright Khronos Group 2014 - Page 13
OpenVX Participants and Timeline • Provisional 1.0 specification released November 2013 for industry feedback
- An update to the provisional spec published in July
• OpenVX 1.0 final release planned for 2014
- With conformance tests
• Itseez is working group chair (the convener of OpenCV)
- Qualcomm and TI are specification editors
© Copyright Khronos Group 2014 - Page 14
Applications and Middleware
Tegra K1
CUDA Libraries
VisionWorks Primitives
Classifier
Corner Detection
3rd Party
Vision Pipeline Samples
Object Detection
3rd Party Pipelines …
…
SLAM
VisionWorks Framework
NVIDIA VisionWorks Uses OpenVX • VisionWorks library contains diverse vision and imaging primitives
• Leverages OpenVX for optimized primitive execution
• Can extend VisionWorks nodes through CUDA accelerated primitives
• Provided with sample library of fully accelerated pipelines
© Copyright Khronos Group 2014 - Page 15
OpenCL – Portable Heterogeneous Computing • Portable Heterogeneous programming of diverse compute resources
- Targeting supercomputers -> embedded systems -> mobile devices
• One code tree can be executed on CPUs, GPUs, DSPs and hardware
- Dynamically interrogate system load and balance work across available processors
• OpenCL = Two APIs and C-based Kernel language
- Platform Layer API to query, select and initialize compute devices
- Kernel language - Subset of ISO C99 + language extensions
- C Runtime API to build and execute kernels
across multiple devices OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
GPU
DSP CPU
CPU HW
© Copyright Khronos Group 2014 - Page 16
OpenCL as Parallel Language Backend
JavaScript
binding for
initiation of
OpenCL C
kernels
River Trail Language
extensions to
JavaScript
MulticoreWare
open source
project on
Bitbucket
OpenCL provides vendor optimized,
cross-platform, cross-vendor access to
heterogeneous compute resources
Harlan High level
language
for GPU
programming
Compiler
directives for
Fortran,
C and C++
Java language
extensions
for
parallelism
PyOpenCL
Python
wrapper
around
OpenCL
Language for
image
processing and
computational
photography
Embedded
array
language for
Haskell
SPIR Standard Portable
Intermediate Representation (extending LLVM for parallel computation)
SPIR 2.0 Released here at SIGGRAPH
© Copyright Khronos Group 2014 - Page 17
Mixamo - Avatar Videoconferencing • Real time facial animation capture on mobile – ported directly from PC
• Animate an avatar while conferencing
• Full GPU acceleration of vision processing using OpenCL
NVIDIA Tegra K1 Development Board
© Copyright Khronos Group 2014 - Page 18
Khronos APIs for Vision Processing
GPU Compute Shaders (OpenGL 4.X and OpenGL ES 3.1)
Pervasively available on almost any mobile device or OS
Easy integration into graphics apps – no compute API interop needed
Program in GLSL not C
Limited to acceleration on a single GPU
General Purpose Heterogeneous Programming Framework Flexible, low-level access to any devices with OpenCL compiler
Single programming and run-time framework for CPUs, GPUs, DSPs, hardware
Open standard for any device or OS – being used as backed by many languages and frameworks
Needs full compiler stack and IEEE precision
Out of the Box Vision Framework - Operators and graph framework library Can run on dedicated hardware – no compiler needed
Easier performance portability to diverse hardware
Suited for low-power, always-on acceleration
Fixed set of operators – but can be extended
It is possible to use OpenCL or GLSL to build OpenVX Nodes on programmable devices
© Copyright Khronos Group 2014 - Page 19
Kari Pulli, NVIDIA Research
© Copyright Khronos Group 2014 - Page 20
Advanced Camera Control Use Cases • High-dynamic range (HDR) and computational flash photography
- High-speed burst with individual frame control over exposure and flash
• Subject isolation and depth detection - High-speed burst with individual frame control over focus
• Rolling shutter elimination
- High-precision intra-frame synchronization between camera and motion sensor
• Augmented Reality
- 60Hz, low-latency capture with motion sensor synchronization
- Multiple Region of Interest (ROI) capture
- Synchronized stereo sensors for scene scaling
- Detailed feedback on camera operation per frame
• Time-of-flight or structured light depth camera processing
- Aligned stacking of data from multiple sensors
© Copyright Khronos Group 2014 - Page 21
Typical Imaging Pipeline
• Pre-processing is non-existent in basic use-cases
• Pre- and Post-processing can be done on CPU, GPU, DSP…
• ISP controls camera via 3A algorithms
Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF)
Pre-processing Image Signal Processor
(ISP)
Post-
processing
CMOS sensor
Color Filter Array
Lens
Bayer RGB
YUV
App
Lens, sensor, aperture control
© Copyright Khronos Group 2014 - Page 22
High Dynamic Range (HDR) • HDR works by combining differing exposures into the same image
• A variety of methods for HDR, based on application
- Multiple frame HDR (requires frame memory)
- Interlace HDR
- Multiple Zone HDR
• HDR requires
- Precise control over camera parameters (exposure)
- Fast capture and processing of multiple images
- Note: with interlace HDR, only 1 image is needed
Short
exposure Long
exposure
Optional mid
exposure
HDR processing
© Copyright Khronos Group 2014 - Page 23
Image stitching, panoramic images
• Made with • Requires processing of multiple images
• Requires position / geometry information
• Requires control of camera (e.g. AE lock)
© Copyright Khronos Group 2014 - Page 24
Typical Burst Sequence Applications
© Copyright Khronos Group 2014 - Page 25
Pipelined Sensor Model • Traditional one-shot sensor model
- Need to know which parameters were used
- reset pipeline between shots slow
• Viewfinding / video mode:
- Pipelined, high frame rate
- Settings changes take effect later
• Need new model for Computational
Photography
- Need parameterized SEQUENCE of images
to feed advanced algorithms
• Real image sensors are pipelined
- While one frame exposing
- Next one is being prepared
- Previous one is being read out
© Copyright Khronos Group 2014 - Page 26
Need for Camera Control API - OpenKCAM • Advanced control of ISP and camera subsystem – with cross-platform portability
- Generate sophisticated image stream for advanced imaging & vision apps
• No platform API currently fulfills all developer requirements
- Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays
- Cross sensor synch: e.g. synch of camera and MEMS sensors
- Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI
- Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing
Image Signal
Processor (ISP)
Image/Vision
Applications
Defines control of Sensor, Color Filter Array
Lens, Flash, Focus, Aperture
Auto Exposure (AE)
Auto White Balance (AWB)
Auto Focus (AF)
EGLStreams
© Copyright Khronos Group 2014 - Page 27
OpenKCAM API Requirements • Provide functional portability for advanced camera applications
- Reduce extreme fragmentation for ISVs wanting more than point and shoot
• Application control over ISP processing (including 3A)
- Including multiple, re-entrant ISPs
• Control multiple sensors with synch and alignment
- E.g. Stereo pairs, Plenoptic arrays, TOF or structured light depth cameras
• Enhanced per frame detailed control
- Format flexibility, Region of Interest (ROI) selection
• Global timing & synchronization
- E.g. Between cameras and MEMS sensors
• Flexible processing/streaming
- Multiple input and output streams with RAW, Bayer or YUV Processing
- Streaming of rows (not just frames)
Enable advanced camera functionality not available on current platforms
© Copyright Khronos Group 2014 - Page 28
OpenKCAM is FCAM-based • FCAM (2010) Stanford/Nokia, open source
• Capture stream of camera images with precision control
- A pipeline that converts requests into image stream
- All parameters packed into the requests - no visible state
- Programmer has full control over sensor settings for each frame in stream
• Control over focus and flash
- No hidden daemon running
• Control ISP
- Can access supplemental
statistics from ISP if available
• No global state
- State travels with image requests
- Every pipeline stage may have different state
- Enables fast, deterministic state changes
© Copyright Khronos Group 2014 - Page 29
OpenKCAM Design Philosophy • C-language API starting from proven designs
- e.g. FCAM, Android camera platform
• Design alignment with widely used hardware standards
- e.g. MIPI CSI
• Focus on mobile, power-limited devices
- But do not preclude other use cases such as automotive, DSLR…
• Minimize overlap and maximize interoperability with other Khronos APIs
- But other Khronos APIs are not required
• Provide support for vendor-specific extensions
© Copyright Khronos Group 2014 - Page 30
• Android Exposes Java camera APIs to developers
- Controls underlying Camera HAL
• Camera HAL v1 API simplified basic point and shoot apps
- Difficult or impossible to do much else
• Camera HAL v3 API is a fundamentally different API
- Streams-based to enable more sophisticated camera applications
Potential Adoption on Android
Open source
project developed
by Nokia and
Stanford
Camera API
HAL V3 adopts many
FCAM ideas and can use
EGL in its implementation
OpenKCAM builds on FCAM with a goal of
being forward compatible with Android
architecture
OpenKCAM may be used to IMPLEMENT Android
Camera HAL – and provide an advanced native
camera API in NDK
© Copyright Khronos Group 2014 - Page 31
Participating Companies and Milestones
Apr13
Jul13
Group charter approved
3Q14
Sample implementation and
tests
1Q15
Specification ratification
© Copyright Khronos Group 2014 - Page 32
OpenKCAM Working Group • Royalty free API for portable access to advanced mobile camera functionality
- Reduce fragmentation and encourage more advanced camera applications
• Control for the new wave of sensors to enable advanced imaging and vision
- Multiple sensors, depth cameras, synchronized sensors
• Provide sophisticated camera functionality not available on today’s platforms
- But work to enable easy adoption by platform vendors
• Eager to contribute? Join Khronos OpenKCAM WG!
- http://www.khronos.org/camera
• Mikaël Bourges-Sévenier
© Copyright Khronos Group 2014 - Page 33
Neil Trevett, NVIDIA
© Copyright Khronos Group 2014 - Page 34
Sensor Industry Fragmentation …
© Copyright Khronos Group 2014 - Page 35
Sensor Types • Basic sensor data:
- Acceleration, Magnetic Field, Angular Rates
- Pressure, Ambient Light, Proximity, Temperature, Humidity, RGB light, UV light
- Heart rate, Blood Oxygen Level, Skin Hydration, Breathalyzer
• Sensor fusion
- Orientation (Quaternion or Euler Angles), Gravity, Linear Acceleration
- Position
• Context awareness
- Device Motion: general movement of the device: still, free-fall, …
- Carry: how the device is being held by a user: in pocket, in hand, …
- Posture: how the body holding the device is positioned: standing, sitting, step, …
- Transport: about the environment around the device: in elevator, in car, …
© Copyright Khronos Group 2014 - Page 36
Low-level Sensor Abstraction API
Apps Need Sophisticated Access to Sensor Data Without coding to specific
sensor hardware
Apps request semantic sensor information StreamInput defines possible requests, e.g.
Read Physical or Virtual Sensors e.g. “Game Quaternion”
Context detection e.g. “Am I in an elevator?”
StreamInput processing graph provides
optimized sensor data stream High-value, smart sensor fusion middleware can connect
to apps in a portable way
Apps can gain ‘magical’ situational awareness
Advanced Sensors Everywhere Multi-axis motion/position, quaternions,
context-awareness, gestures, activity monitoring, health and environmental sensors
Sensor Discoverability
Sensor Code Portability
© Copyright Khronos Group 2014 - Page 37
StreamInput: Platform Integration
OS Sensor APIs (E.g. Android SensorManager or
iOS CoreMotion)
Low-level native API defines portable access to
fused sensor data stream and context-awareness
…
Applications
Sensor Sensor
Sensor
Hub Sensor
Hub
Middleware (E.g. Context-awareness engines,
gaming engines) Flexible native API to
integrate where needed
depending on existing
platform sensor stacks
© Copyright Khronos Group 2014 - Page 38
Sensor OSP Announcement • Proposal to converge OSP (Open Sensor Platform) APIs with StreamInput
- Sensor Platforms is StreamInput Spec Editor
© Copyright Khronos Group 2014 - Page 39
EGL 1.5 Released at GDC 2014 • EGL 1.5 brings functionality from
multiple extensions into core
- Increased reliability and portability
• EGLImages
- Sharing textures and renderbuffers
• Context Robustness
- Defending against malicious code
• EGLSync objects
- Improved OpenGL /OpenCL interop
• Platform extensions
- Standardized interactions for multiple
OS e.g. Android and 64-bit platforms
• sRGB colorspace rendering
Applications
OS and Display
Platforms
Application Portability EGL abstracts graphics context
management, surface and
buffer binding and rendering synchronization
API Interop EGL provides efficient
transfer of data and events
between Khronos APIs
© Copyright Khronos Group 2014 - Page 40
Potential EGL Future Directions • EGLImageStream extensions are very powerful today
- But need wider implementation in drivers
- Stream other types of data – unformatted buffers for metadata and more
- GPU-to-GPU streaming and invoking client API activities directly from other client
APIs without CPU intervention
• Separation of traditional context/surface functionality from “hub” functionality
• Support for new Khronos APIs where appropriate
- Streaming video + image processing + display use case
© Copyright Khronos Group 2014 - Page 41
Khronos APIs for Augmented Reality
Advanced Camera Control and stream
generation
3D Rendering and Video
Composition
On GPU
Audio
Rendering
Application
on CPUs, GPUs
and DSPs
Sensor
Fusion
Vision
Processing
MEMS
Sensors
EGLStream - stream data
between APIs
Precision timestamps
on all sensor samples
AR needs not just advanced sensor processing, vision
acceleration, computation and rendering - but also for
all these subsystems to work efficiently together
© Copyright Khronos Group 2014 - Page 42
Summary • Khronos is building a family of interoperating APIs for portable and
power-efficient vision processing
• OpenVX 1.0 has been provisionally released and non-members are invited to
provide feedback on the forums - http://www.khronos.org/message_boards/forumdisplay.php/110-OpenVX-General
• OpenKCAM and StreamInput APIs are currently in design and complement and
integrate with OpenVX
• Any company is welcome to join Khronos to influence the direction of mobile
and embedded vision processing!
- $15K annual membership fee for access to all Khronos API working groups
- Well-defined IP framework protects your IP and conformant implementations
• www.khronos.org