Image Processing and Computer Vision in iOS

Image Processing and Computer Vision in iOS

Oge Marques, PhD��[email protected]

Uberaba, MG - Brazil

18 December 2013

Mobile image processing and computer vision applications are coming of age. There are many opportunities for building successful apps that may improve the way we capture, organize, edit, share, annotate, and retrieve images and videos using iPhone and iPad.

Take-home message

Disclaimer #1

•  I'm a teacher, researcher, graduate advisor, author, …

•  … not a developer

Disclaimer #2

•  I'm a trained engineer

•  … not an artist / designer

Disclaimer #3 •  I'm an Apple fan! – Since 2001… •  4 iPod, 3 iPhone, 2 iPad, 2 iMac, 4

MacBook, and more (AirPort, AirPort Express, Apple TV, etc.)

– Since 2010… •  Created and co-taught

iOS Programming classes at FAU

In 2013…

•  1.4 billion people have a smartphone with camera

•  350 million photos uploaded to Facebook every day

•  Instagram reaches 150 million users, with a total of 16 billion photos shared and 1 billion likes each day

In 2013…

•  "Selfie" was the Oxford Dictionary's new word of the year

And speaking of new words…

Background and Motivation

•  The maturity and popularity of image processing and computer vision techniques and algorithms

•  The unprecedented success of mobile devices, particularly the iPhone and the iPad

Two sides of the coin

Motivation •  Rich capabilities of iPhone/iPad for image and

video processing

•  Apple support for image and multimedia: frameworks, libraries, etc.

•  Third-party support for iPhone-based development: open APIs, OpenCV, etc.

•  Success stories and ever-growing market

Motivation •  Q: Why DIP and CV? •  A: Because they are still relevant and growing

fields whose techniques and can help solve many problems.

•  Q: Why iOS / mobile? •  A: Because some problems are better solved in

that context and some still need to be solved in a away that is consistent with ergonomics (devices' size etc.) and user needs ("quick fix" + filter before sharing).

Example: a natural use case for CBIR •  Content-Based Image Retrieval (CBIR) using the

"Query-By-Example" (QBE) paradigm – The example is right there, in front of the user!

IEEE SIGNAL PROCESSING MAGAZINE [62] JULY 2011

■ The mobile client processes the query image, extracts fea-tures, and transmits feature data. The image-retrieval algo-rithms run on the server using the feature data as query.

■ The mobile client downloads data from the server, and all image matching is performed on the device.One could also imagine a hybrid of the approaches men-

tioned above. When the database is small, it can be stored on the phone, and image-retrieval algorithms can be run locally [8]. When the database is large, it has to be placed on a remote server and the retrieval algorithms are run remotely.

In each case, the retrieval framework has to work within stringent memory, computation, power, and bandwidth constraints of the mobile device. The size of the data transmit-ted over the network needs to be as small as possible to reduce network latency and improve user experience. The server laten-cy has to be low as we scale to large databases. This article reviews the recent advances in content-based image retrieval with a focus on mobile applications. We first review large-scale image retrieval, highlighting recent progress in mobile visual search. As an example, we then present the Stanford Product Search system, a low-latency interactive visual search system. Several sidebars in this article invite the interested reader to dig deeper into the underlying algorithms.

ROBUST MOBILE IMAGE RECOGNITIONToday, the most successful algorithms for content-based image retrieval use an approach that is referred to as bag of features (BoFs) or bag of words (BoWs). The BoW idea is borrowed from text retrieval. To find a particular text document, such as a Web page, it is sufficient to use a few well-chosen words. In the database, the document itself can be likewise represented by a

bag of salient words, regardless of where these words appear in the text. For images, robust local features take the analogous role of visual words. Like text

retrieval, BoF image retrieval does not consider where in the image the features occur, at least in the initial stages of the retrieval pipeline. However, the variability of features extracted from different images of the same object makes the problem much more challenging.

A typical pipeline for image retrieval is shown in Figure 2. First, the local features are extracted from the query image. The set of image features is used to assess the similarity between query and database images. For mobile applications, individual features must be robust against geometric and photometric dis-tortions encountered when the user takes the query photo from a different viewpoint and with different lighting compared to the corresponding database image.

Next, the query features are quantized [9]–[12]. The parti-tioning into quantization cells is precomputed for the database, and each quantization cell is associated with a list of database images in which the quantized feature vector appears some-where. This inverted file circumvents a pairwise comparison of each query feature vector with all the feature vectors in the data-base and is the key to very fast retrieval. Based on the number of features they have in common with the query image, a short list of potentially similar images is selected from the database.

Finally, a geometric verification (GV) step is applied to the most similar matches in the database. The GV finds a coherent spatial pattern between features of the query image and the can-didate database image to ensure that the match is plausible. Example retrieval systems are presented in [9]–[14].

For mobile visual search, there are considerable challenges to provide the users with an interactive experience. Current deployed systems typically transmit an image from the client to the server, which might require tens of seconds. As we scale to large databases, the inverted file index becomes very large, with memory swapping operations slowing down the feature-match-ing stage. Further, the GV step is computationally expensive and thus increases the response time. We discuss each block of the retrieval pipeline in the following, focusing on how to meet the challenges of mobile visual search.

[FIG1] A snapshot of an outdoor mobile visual search system being used. The system augments the viewfinder with information about the objects it recognizes in the image taken with a camera phone.

Database

QueryImage

FeatureExtraction

FeatureMatching

GeometricVerification

[FIG2] A Pipeline for image retrieval. Local features are extracted from the query image. Feature matching finds a small set of images in the database that have many features in common with the query image. The GV step rejects all matches with feature locations that cannot be plausibly explained by a change in viewing position.

MOBILE IMAGE-RETRIEVAL APPLICATIONS POSE A UNIQUE

SET OF CHALLENGES.

Example: Stanford DIP class

•  Course page: http://www.stanford.edu/class/ee368/index.html •  YouTube playlist

iPhone photo apps

•  400+ photo- and video-related apps available in iTunes store – Entire sites for reviews, discussions, etc.

– Subcategories include: •  Camera enhancements

•  Image editing and processing •  Image sharing

•  Image printing, wireless transfer, etc.

An app about apps

iPhone photo apps

•  Fresh from the oven…

iPhone photo apps

Developing DIP/CV apps for iOS

•  Checklist: – Get a Mac running OS X – Sign up to become a registered iOS developer

– Download / install xCode and latest version of iOS SDK

– Download / install iOS simulator

– Learn Objective-C and the basics of iOS programming – Get an iPhone, iPod Touch, or iPad (optional)


•  Topics to study in greater depth: – The main classes that you need to understand in order

to develop basic applications involving images, camera, and photo library for the iPhone are: •  UIImageView

•  UIImagePickerController

•  UIImage

– Check out the documentation for the A/V foundation


•  Topics to study in greater depth (cont'd): – Learn about Core Image and its main classes: •  CIFilter: a mutable object that represents an effect. A filter

object has at least one input parameter and produces an output image. •  CIImage: an immutable object that represents an image.

•  CIContext: an object through which Core Image draws the results produced by a filter.

Core Image •  Image processing and analysis technology designed to provide

near real-time processing for still and video images. •  Hides the details of low-level graphics processing by providing an

easy-to-use API. •  Brought into iOS since iOS 5 (Oct'11)

OpenCV •  OpenCV (Open Source Computer Vision) is

a library of programming functions for real-time computer vision.

•  OpenCV is released under a BSD license; it is free for both academic and commercial use.

•  Goal: to provide a simple-to-use computer vision infrastructure that helps people build fairly sophisticated vision applications quickly.

•  The library has 2000+ optimized algorithms. –  It is used around the world, has >2M downloads

and >40K people in the user group.

OpenCV

•  5 main components: 1.   CV: basic image processing and higher-level computer

vision algorithms

2.   ML: machine learning algorithms 3.   HighGUI: I/O routines and functions for storing and

loading video and images

4.   CXCore: basic data structures and content upon which the three components above rely

5.   CvAux: defunct areas + experimental algorithms; not well-documented.

OpenCV and iOS

Example

Contrast with equivalent func=onality using Core Image

A bit of advice… •  Go beyond the DIP/CV and iOS boxes –  Learn about ergonomics, human factors, human psych,

HCI, UX

•  Don't reinvent the wheel! – Reuse code and ideas whenever possible

•  Avoid the trap of building solutions looking for problems

•  Tackle ONE problem and solve it well! •  Beware of competition •  Beware of narrow windows of opportunity and

ephemeral success: timing is everything!

Learn more about it •  iOS Programming – Apple online documentation – Core Image

•  OpenCV – Official website –  "Learning OpenCV" book –  "Instant OpenCV for iOS" book

•  Our work –  Slideshare (WVC 2011) – Upcoming book (Springer Briefs, 2014)

Concluding thoughts •  Mobile image processing, image search, and computer

vision-based apps have a promising future. •  There is a great need for good solutions to specific

problems.

•  I hope this talk has provided a good starting point and many useful pointers.

•  I look forward to working with some of you!

Let's get to work!

•  Which computer vision or image processing app would you like to build?

•  Contact me with ideas: [email protected]

Image Processing and Computer Vision in iOS

Technology

Transcript of Image Processing and Computer Vision in iOS