T1.1- Analysis of acceleration opportunities and virtualization requirements in industrial...

11
T1.1- Analysis of acceleration opportunities and virtualization requirements in industrial applications Bologna, April 2012 UNIBO

Transcript of T1.1- Analysis of acceleration opportunities and virtualization requirements in industrial...

T1.1- Analysis of acceleration opportunities and virtualization

requirements in industrial applications

Bologna, April 2012UNIBO

Android and accelerators?

• Android is the most widelyused operating system formobile devices– Linux-based– Open source

• Which applications running on android-based devices could benefit from HW acceleration (GPPA, HWPU)?– Smartphones have a camera and increasingly more computationally

powerful image processing– innovative and attractive apps leveraging their portability and

ubiquity

Computer Vision

• Computer Vision is a branch of computer science that includes many techniques to extract, characterize, and interpret information in visual images

• Scientific and industrial communities are showing a growing interest in developing Computer Vision (CV) algorithms on embedded systems

Augmented Reality

• Augmented reality (AR) is a live view of a real-world environment with virtual objects superimposed upon (or composited with) the current scene– Semantic context– Real-time constraints

• Layar is an augmented reality browser for Android and iOS

– It uses sensor data (camera, compass, GPS, and accelerometer) to identify user locationand field of view

– It shows geo-located POI organized in  layers– As of September 2011, Layar had 2993 layers

AR Algorithms

• A primary issue of augmented reality application is image registration, that is the process to derive real world coordinates from images

• A first step for image registration is the detection of feature points usingproper algorithms

• OpenCV is a C/C++ library thatincludes many CV algorithms,including feature detectors– Android build is available!!!

Feature extraction kernels

Android – OpenCV API Reference (http://opencv.itseez.com/)• features2d – Feature detection and description

• SIFT – Scale Invariant Feature Transform [Yuan09]• SURF – Speeded Up Robust Features [Bay06]• FAST – Detects corners using the FAST algorithm [Rosten10]

[Yuan09] , Y. Yuan , C. Shi, “Object tracking using SIFT features and mean shift”, Computer Vision and Image Understanding, 2009

[Bay06] Bay, H., Tuytelaars, T., Van Gool, L. “SURF: Speeded Up Robust Features”, 9th European Conference on Computer Vision, 2006

[Rosten10] Rosten, E.; Porter, R.; Drummond, T.; , "Faster and Better: A Machine Learning Approach to Corner Detection," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.32, no.1, pp.105-119, Jan. 2010

Example: FAST Algorithm

• For each image point, FAST examines the 16 pixels on a circle with radius 3 and center p

• A feature is detected iff the intensities of at least n contiguous pixels are all above or all below the intensity of p by a threshold t

• Most feature detector algorithms are inherently parallel, as they verify some properties for each point in the current image

Embedded Platforms for Benchmarking

LG Optimus 2x Pandaboard DragonBoard

CPUFrequency

L1 Cache (I/D)L2 Cache

Main Memory

Dual-Core Cortex-A91 GHz, per core(32KB / 32KB) per core1 MB Shared1GB LPDDR2-667

Dual-Core Cortex-A91 GHz, per core(32KB / 32KB) per core1 MB Shared1GB LPDDR2-400

Dual-Core Scorpion1.2 GHz, per core(32KB / 32KB) per core512KB Shared1GB LPDDR2-333 ISM

Consumer smartphone

Low-cost dev board

Advanced dev board

Feature Detection on Embedded Platforms

• This figure shows the speed-up for a scalable version of FAST on three different platforms– Fine-Grained Data-Level Parallelization The main computation loop

divides the image in multiple horizontal bands regular memory access pattern

– The measured speed-up is very limited

Fine-Grained Data-Level Parallelization

• We tested the same version of FAST using a multi-core virtual platform– the experimental speed-up is closer to the ideal one when the number of threads

becomes comparable with the number of cores

• The number of cores is limited (max 4 in current generation)– A viable solution to exploit scalability is the use of accelerators

Other applications

• QoS requirements• Virtualization specification