SIFT Guest Lecture by Jiwon Kim

96
SIFT • Guest Lecture by Jiwon Kim http://www.cs.washington.edu/homes/jwkim/
  • date post

    23-Jan-2016
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of SIFT Guest Lecture by Jiwon Kim

Page 1: SIFT Guest Lecture by Jiwon Kim

SIFT

• Guest Lecture by Jiwon Kim• http://www.cs.washington.edu/homes/jwkim/

Page 2: SIFT Guest Lecture by Jiwon Kim

SIFT Features andIts Applications

Page 3: SIFT Guest Lecture by Jiwon Kim

Autostitch Demo

Page 4: SIFT Guest Lecture by Jiwon Kim

Autostitch

• Fully automatic panorama generation– Input: set of images– Output: panorama(s)

• Uses SIFT (Scale-Invariant Feature Transform) to find/align images

Page 5: SIFT Guest Lecture by Jiwon Kim

1. Solve for homography

Page 6: SIFT Guest Lecture by Jiwon Kim

1. Solve for homography

Page 7: SIFT Guest Lecture by Jiwon Kim

1. Solve for homography

Page 8: SIFT Guest Lecture by Jiwon Kim

2. Find connected sets of images

Page 9: SIFT Guest Lecture by Jiwon Kim

2. Find connected sets of images

Page 10: SIFT Guest Lecture by Jiwon Kim

2. Find connected sets of images

Page 11: SIFT Guest Lecture by Jiwon Kim

3. Solve for camera parameters

• New images initialised with rotation, focal length of best matching image

Page 12: SIFT Guest Lecture by Jiwon Kim

3. Solve for camera parameters

• New images initialised with rotation, focal length of best matching image

Page 13: SIFT Guest Lecture by Jiwon Kim

4. Blending the panorama

• Burt & Adelson 1983– Blend frequency bands over range

Page 14: SIFT Guest Lecture by Jiwon Kim

Low frequency ( > 2 pixels)

High frequency ( < 2 pixels)

2-band Blending

Page 15: SIFT Guest Lecture by Jiwon Kim

Linear Blending

Page 16: SIFT Guest Lecture by Jiwon Kim

2-band Blending

Page 17: SIFT Guest Lecture by Jiwon Kim

So, what is SIFT?

• Scale-Invariant Feature Transform• David Lowe at UBC• Scale/rotation invariant• Currently best known feature descriptor• Many real-world applications

– Object recognition– Panorama stitching– Robot localization– Video indexing– …

Page 18: SIFT Guest Lecture by Jiwon Kim

Example: object recognition

Page 19: SIFT Guest Lecture by Jiwon Kim

SIFT properties

• Locality: features are local, so robust to occlusion and clutter

• Distinctiveness: individual features can be matched to a large database of objects

• Quantity: many features can be generated for even small objects

• Efficiency: close to real-time performance

Page 20: SIFT Guest Lecture by Jiwon Kim

SIFT algorithm overview

1. Feature detection– Detect points that can be repeatably

selected under location/scale change

2. Feature description– Assign orientation to detected feature

points– Construct a descriptor for image patch

around each feature point

3. Feature matching

Page 21: SIFT Guest Lecture by Jiwon Kim

1. Feature detection

• Detect points stable under location/scale change

– Build continuous space (x, y, scale)– Approximated by multi-scale Difference-of-

Gaussian pyramid– Select maxima/minima in (x, y, scale)

Page 22: SIFT Guest Lecture by Jiwon Kim

1. Feature detection

Page 23: SIFT Guest Lecture by Jiwon Kim

1. Feature detection

• Localize extrema by fitting a quadratic

1) Sub-pixel/sub-scale interpolation using Taylor expansion

2) Take derivative and set to zero

Page 24: SIFT Guest Lecture by Jiwon Kim

1. Feature detection• Discard low-contrast/edge points

1) Low contrast: discard keypoints with < threshold

2) Edge points: high contrast in one direction, low in the other compute principal curvatures from eigenvalues of 2x2 Hessian matrix, and limit ratio

)ˆ(xD

Page 25: SIFT Guest Lecture by Jiwon Kim

1. Feature detection• Example

(a) 233x189 image(b) 832 DOG extrema(c) 729 left after peak value threshold(d) 536 left after testing ratio of principle curvatures

Page 26: SIFT Guest Lecture by Jiwon Kim

2. Feature description

– Create histogram of local gradient directions computed at selected scale

– Assign canonical orientation at peak of smoothed histogram

0 2

• Assign orientation to keypoints

Page 27: SIFT Guest Lecture by Jiwon Kim

2. Feature description• Construct SIFT descriptor

– Create array of orientation histograms– 8 orientations x 4x4 histogram array = 128

dimensions

Page 28: SIFT Guest Lecture by Jiwon Kim

2. Feature description• Advantage over simple correlation

– Gradients less sensitive to illumination change

– Gradients may shift: robust to deformation, viewpoint change

Page 29: SIFT Guest Lecture by Jiwon Kim

Performance: stability to noise

• Match features after random change in image scale & orientation, with differing levels of image noise

• Find nearest neighbor in database of 30,000 features

Page 30: SIFT Guest Lecture by Jiwon Kim

Performance:stability to affine change

• Match features after random change in image scale & orientation, with 2% image noise, and affine distortion

• Find nearest neighbor in database of 30,000 features

Page 31: SIFT Guest Lecture by Jiwon Kim

Performance: distinctiveness

• Vary size of database of features, with 30 degree affine change, 2% image noise

• Measure % correct for single nearest neighbor match

Page 32: SIFT Guest Lecture by Jiwon Kim

3. Feature matching

• For each feature in A, find nearest neighbor in B

A B

Page 33: SIFT Guest Lecture by Jiwon Kim

3. Feature matching

• Nearest neighbor search too slow for large database of 128-dimenional data

• Approximate nearest neighbor search:– Best-bin-first [Beis et al. 97]: modification to k-d

tree algorithm– Use heap data structure to identify bins in

order by their distance from query point

• Result: Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time

Page 34: SIFT Guest Lecture by Jiwon Kim

3. Feature matching• Reject false matches

– Compare distance of nearest neighbor to second nearest neighbor

– Common features aren’t distinctive, therefore bad– Threshold of 0.8 provides excellent separation

Page 35: SIFT Guest Lecture by Jiwon Kim

3. Feature matching

• Now, given feature matches…– Find an object in the scene– Solve for homography (panorama)– …

Page 36: SIFT Guest Lecture by Jiwon Kim

3. Feature matching

• Example: 3D object recognition

Page 37: SIFT Guest Lecture by Jiwon Kim

3. Feature matching

• 3D object recognition– Assume affine transform: clusters of size >=3– Looking for 3 matches out of 3000 that agree

on same object and pose: too many outliers for RANSAC or LMS

– Use Hough Transform• Each match votes for a hypothesis for object

ID/pose• Voting for multiple bins & large bin size allow for

error due to similarity approximation

Page 38: SIFT Guest Lecture by Jiwon Kim

3. Feature matching• 3D object recognition: solve for pose

– Affine transform of [x,y] to [u,v]:

– Rewrite to solve for transform parameters:

Page 39: SIFT Guest Lecture by Jiwon Kim

3. Feature matching

• 3D object recognition: verify model1) Discard outliers for pose solution in prev step2) Perform top-down check for additional

features3) Evaluate probability that match is correct

a) Use Bayesian model, with probability that features would arise by chance if object was not present

b) Takes account of object size in image, textured regions, model feature count in database, accuracy of fit [Lowe 01]

Page 40: SIFT Guest Lecture by Jiwon Kim

Planar recognition• Training images

Page 41: SIFT Guest Lecture by Jiwon Kim

Planar recognition

• Reliably recognized at a rotation of 60° away from the camera

• Affine fit approximates perspective projection

• Only 3 points are needed for recognition

Page 42: SIFT Guest Lecture by Jiwon Kim

3D object recognition• Training images

Page 43: SIFT Guest Lecture by Jiwon Kim

3D object recognition

• Only 3 keys are needed for recognition, so extra keys provide robustness

• Affine model is no longer as accurate

Page 44: SIFT Guest Lecture by Jiwon Kim

Recognition under occlusion

Page 45: SIFT Guest Lecture by Jiwon Kim

Illumination invariance

Page 46: SIFT Guest Lecture by Jiwon Kim

Applications of SIFT

• Object recognition• Panoramic image stitching• Robot localization• Video indexing• …

• The Office of the Past– Document tracking and recognition

Page 47: SIFT Guest Lecture by Jiwon Kim

Location recognition

Page 48: SIFT Guest Lecture by Jiwon Kim

Robot Localization

Page 49: SIFT Guest Lecture by Jiwon Kim

Map continuously built over time

Page 50: SIFT Guest Lecture by Jiwon Kim

Locations of map features in 3D

Page 51: SIFT Guest Lecture by Jiwon Kim

Sony Aibo

SIFT usage:

Recognize charging station

Communicate with visual cards

Teach object recognition

Page 52: SIFT Guest Lecture by Jiwon Kim

The Office of the Past• Paper everywhere

Page 53: SIFT Guest Lecture by Jiwon Kim

Unify physical andelectronic desktops

• Recognize video of paper on physical desktop– Tracking– Recognition– Linking

Video camera

Desktop

Page 54: SIFT Guest Lecture by Jiwon Kim

Unify physical andelectronic desktops

• Applications– Find lost documents– Browse remote

desktop– Find electronic

version– History-based

queries

Video camera

Desktop

Page 55: SIFT Guest Lecture by Jiwon Kim

Example input video

Page 56: SIFT Guest Lecture by Jiwon Kim

Demo – Remote desktop

Page 57: SIFT Guest Lecture by Jiwon Kim

System overviewVideo camera

DeskUser

Computer

Page 58: SIFT Guest Lecture by Jiwon Kim

System overview

Video of desk

Page 59: SIFT Guest Lecture by Jiwon Kim

System overview

Video of desk Images from PDF

Page 60: SIFT Guest Lecture by Jiwon Kim

System overview

Video of desk Images from PDF

Track & recognize

Page 61: SIFT Guest Lecture by Jiwon Kim

System overview

Video of desk Images from PDF

Track & recognize

T T+1

Desk Desk

Internal representation

Page 62: SIFT Guest Lecture by Jiwon Kim

System overview

Video of desk Images from PDF

Track & recognize

T T+1

Internal representation

Scene Graph

Desk Desk

Page 63: SIFT Guest Lecture by Jiwon Kim

System overview

Video of desk Images from PDF

Track & recognize

T T+1

Internal representation

Where is my W-2?

Desk Desk

Page 64: SIFT Guest Lecture by Jiwon Kim

System overview

Video of desk Images from PDF

Track & recognize

T T+1

Desk Desk

Internal representation

Where is my W-2?

Answer

Page 65: SIFT Guest Lecture by Jiwon Kim

Assumptions

• Document– Corresponding electronic copy exists– No duplicates of same document

Page 66: SIFT Guest Lecture by Jiwon Kim

Assumptions

• Document– Corresponding electronic copy exists– No duplicates of same document

• Motion– 3 event types: move/entry/exit– One document at a time– Only topmost document can move

Page 67: SIFT Guest Lecture by Jiwon Kim

Non-assumptions

• Desk need not be initially empty

Page 68: SIFT Guest Lecture by Jiwon Kim

Non-assumptions

• Desk need not be initially empty• Stacks may overlap

Page 69: SIFT Guest Lecture by Jiwon Kim

Algorithm overviewInput

Frames… …

Page 70: SIFT Guest Lecture by Jiwon Kim

Algorithm overviewInput

Frames… …

Event Detection

before after

Page 71: SIFT Guest Lecture by Jiwon Kim

Algorithm overviewInput

Frames… …

Event Detection

Event Interpretation

“A document moved from (x1,y1) to (x2,y2)”

before after

Page 72: SIFT Guest Lecture by Jiwon Kim

Algorithm overviewInput

Frames… …

Event Detection

Event Interpretation

“A document moved from (x1,y1) to (x2,y2)”

Document Recognition

before after

File1.pdf

File2.pdf

File3.pdf

Page 73: SIFT Guest Lecture by Jiwon Kim

Algorithm overviewInput

Frames… …

Event Detection

Event Interpretation

“A document moved from (x1,y1) to (x2,y2)”

Document Recognition

before after

File1.pdf

File2.pdf

File3.pdf

Scene Graph Update

Desk Desk

Page 74: SIFT Guest Lecture by Jiwon Kim

Algorithm overviewInput

Frames… …

Event Detection

Event Interpretation

“A document moved from (x1,y1) to (x2,y2)”

Document Recognition

before after

File1.pdf

File2.pdf

File3.pdf

Scene Graph Update

Desk Desk

SIFT

Page 75: SIFT Guest Lecture by Jiwon Kim

Document tracking example

before after

Page 76: SIFT Guest Lecture by Jiwon Kim

Document tracking example

before after

Page 77: SIFT Guest Lecture by Jiwon Kim

Document tracking example

before after

Page 78: SIFT Guest Lecture by Jiwon Kim

Document tracking example

before after

Page 79: SIFT Guest Lecture by Jiwon Kim

Document tracking example

before after

Page 80: SIFT Guest Lecture by Jiwon Kim

Document tracking example

before after

Page 81: SIFT Guest Lecture by Jiwon Kim

Document tracking example

before after

Page 82: SIFT Guest Lecture by Jiwon Kim

Document tracking example

before after

Page 83: SIFT Guest Lecture by Jiwon Kim

Document tracking example

before after

Page 84: SIFT Guest Lecture by Jiwon Kim

Document tracking example

Motion: (x,y,θ)

before after

Page 85: SIFT Guest Lecture by Jiwon Kim

Document Recognition

File1.pdf File2.pdf File3.pdf File4.pdf File5.pdf File6.pdf

• Match against PDF image database

Page 86: SIFT Guest Lecture by Jiwon Kim

Document Recognition• Performance analysis

– Tested 20 pages against database of 162 pages

Page 87: SIFT Guest Lecture by Jiwon Kim

Document Recognition• Performance analysis

– Tested 20 pages against database of 162 pages

– ~200x300 pixels per document for reliable match

Document Resolution

Recognition Rate

Page 88: SIFT Guest Lecture by Jiwon Kim

Document Recognition• Performance analysis

– Tested 20 pages against database of 162 pages

– ~200x300 pixels per document for reliable match

Document Resolution

Recognition Rate

300

0.9

Page 89: SIFT Guest Lecture by Jiwon Kim

Results

• Input video– ~40 minutes– 1024x768 @ 15 fps– 22 documents, 49 events

• Running time– Video processed offline– No optimization– A few hours for entire video

Page 90: SIFT Guest Lecture by Jiwon Kim

Demo – Paper tracking

Page 91: SIFT Guest Lecture by Jiwon Kim

Photo sorting example

Page 92: SIFT Guest Lecture by Jiwon Kim

Photo sorting example

Page 93: SIFT Guest Lecture by Jiwon Kim

Demo – Photo sorting

Page 94: SIFT Guest Lecture by Jiwon Kim

Future work

• Enhance realism– Handle more realistic desktops– Real-time performance

• More applications– Support other document tasks

• E.g., attach reminder, cluster documents

– Beyond documents• Other 3D desktop objects, books/CD’s

Page 95: SIFT Guest Lecture by Jiwon Kim

Summary

• SIFT is:– Scale/rotation invariant local feature– Highly distinctive– Robust to occlusion, illumination

change, 3D viewpoint change– Efficient (real-time performance)– Suitable for many useful applications

Page 96: SIFT Guest Lecture by Jiwon Kim

References

• Distinctive image features from scale-invariant keypoints – David G. Lowe, International Journal of Computer Vision,

60, 2 (2004), pp. 91-110• Recognising panoramas

– Matthew Brown and David G. Lowe, International Conference on Computer Vision (ICCV 2003), Nice, France (October 2003), pp. 1218-25.

• Video-Based Document Tracking: Unifying Your Physical and Electronic Desktops – Jiwon Kim, Steven M. Seitz and Maneesh Agrawala, ACM

Symposium on User Interface Software and Technology (UIST 2004), pp. 99-107.