Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer...

226
Computer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad Anwer Slide set draft March 30, 2015

Transcript of Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer...

Page 1: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Computer Vision T-61.5070 (5 cr) P

Spring 2015

Lectures: Jorma LaaksonenExercises: Rao Muhammad Anwer

Slide set draft March 30, 2015

Page 2: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #1, 12.1.2015 . . . . . . . . . . . . . . . . . . . . . 10

1. GENERAL INFORMATION . . . . . . . . . . . . . . . . . 111.1 Passing the course . . . . . . . . . . . . . . . . . 111.2 Enrollment . . . . . . . . . . . . . . . . . . . . . 111.3 Notices . . . . . . . . . . . . . . . . . . . . . . . 111.4 Lectures . . . . . . . . . . . . . . . . . . . . . . . 121.5 Exercises . . . . . . . . . . . . . . . . . . . . . . 121.6 Exceptions in lecture and exercise times . . . . . . 131.7 Book . . . . . . . . . . . . . . . . . . . . . . . . 141.8 Additional material . . . . . . . . . . . . . . . . . 151.9 Exams . . . . . . . . . . . . . . . . . . . . . . . . 151.10 Obligatory course assignment . . . . . . . . . . . . 161.11 Feedback from the course . . . . . . . . . . . . . . 161.12 Become a summer trainee at ICS Department? . . 17

2. Course learning goals . . . . . . . . . . . . . . . . . . . . 183. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1 What computer vision stands for? (1) . . . . . . . . 193.2 What for is computer vision needed? . . . . . . . . 193.3 Why is computer vision difficult? (1.2/) . . . . . . . 20

1

Page 3: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

3.4 What are the essential parts of a CV system? . . . 213.5 Image representation and analysis (1.3/) . . . . . . 223.6 Some useful vocabulary . . . . . . . . . . . . . . . 23

4. Digital image . . . . . . . . . . . . . . . . . . . . . . . . . 244.1 Basic properties and definitions (2.1) . . . . . . . . 244.2 Digitization of images (2.2) . . . . . . . . . . . . . 254.3 Metric properties of a digital image (2.3.1) . . . . . 264.4 Noise in images (2.3.6/2.3.5) . . . . . . . . . . . . . 30

5. Mathematical tools and notations . . . . . . . . . . . . . . 315.1 Dirac distribution and convolution (3.1.2/2.1.2) . . . 315.2 Image as a linear system (3.2.1/2.1.5) . . . . . . . . 325.3 2-dimensional Fourier transform (3.2.4/2.1.3) . . . . 335.4 Convolution theorem (3.2.4/2.1.3) . . . . . . . . . . 345.5 Image as a stochastic process (3.3/2.1.4) . . . . . . 35

6. 3D vision . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.1 Difficulties of 3D vision (11/9) . . . . . . . . . . . . 36

LECTURE #2, 19.1.2015 . . . . . . . . . . . . . . . . . . . . . 37

6.2 Strategies of 3D vision (11.1/9.1) . . . . . . . . . . 386.3 Marr’s theory (11.1.1/9.1.1) . . . . . . . . . . . . . 39

2

Page 4: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.4 Active versus passive computer vision (11.1.2/9.1.2) 406.5 3D projection geometry (11.2.1/9.2.1) . . . . . . . . 416.6 Geometry of single perspective camera (11.3.1/9.2.2) 426.7 Two cameras (stereopsis) (11.5.1/9.2.5) . . . . . . . 466.8 Shape from stereo vision (11.5.5/9.2.5) . . . . . . . 476.9 Point correspondence in stereo vision (11.6.1/9.2.11) 486.10 Active acquisition of range images (11.6.2/9.2.12) . . 526.11 Radiometry in 3D vision (11.7.1,3.4.5/9.3) . . . . . . 536.12 Shape from X (12/10) . . . . . . . . . . . . . . . . 546.13 Shape from motion (12.1.1/10.1.1) . . . . . . . . . 55

LECTURE #3, 26.1.2015 . . . . . . . . . . . . . . . . . . . . . 56

6.14 Shape from texture (12.1.2/10.1.2) . . . . . . . . . 586.15 Models of 3D world (12.2/10.2) . . . . . . . . . . . 596.16 Line labeling algorithm (12.2.2/10.2.2) . . . . . . . . 606.17 More models of 3D world (12.2.4,5/10.2.4,5) . . . . 616.18 On recognition of 3D objects (12.3/102.3) . . . . . . 626.19 Goad’s algorithm (12.3.2/10.3.2) . . . . . . . . . . . 636.20 Model-based 3D recognition from intensity images

(12.3.3/10.3.3) . . . . . . . . . . . . . . . . . . . . 643

Page 5: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.21 2D view-based representations for 3D (12.4/10.4) . . 667. Data structures . . . . . . . . . . . . . . . . . . . . . . . 68

7.1 Introduction (4.1/3.1) . . . . . . . . . . . . . . . . 687.2 Traditional data structures (4.2/3.2) . . . . . . . . . 697.3 Hierarchic data structures (4.3/3.3) . . . . . . . . . 697.4 Co-occurrence matrix (4.2.1/3.2.1) . . . . . . . . . . 70

LECTURE #4, 2.2.2015 . . . . . . . . . . . . . . . . . . . . . . 71

7.5 Integral image matrix (4.2.1/3.2.1) . . . . . . . . . . 727.6 Chain structures (4.2.2/3.2.2) . . . . . . . . . . . . 737.7 Topological data structures (4.2.3/3.2.3) . . . . . . 747.8 Relational database structures (4.2.4/3.2.4) . . . . . 757.9 Hierarchical data structures (4.3/3.3) . . . . . . . . 76

8. Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 788.1 Brightness value changes in single pixels (5.1/4.1) . 808.2 Geometric co-ordinate transformations (5.2.1/4.2.1) . 818.3 Brightness interpolation (5.2.2/4.2.2) . . . . . . . . 838.4 Local pre-processing (5.3.1/4.3.1) . . . . . . . . . . 848.5 Additional constraints for local averaging (?/4.3.1) . 858.6 Local neighborhood in edge detection (5.3.2/4.3.2) . 87

4

Page 6: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.7 Edge detection by derivative approximation (5.3.2/4.3.2) 888.8 Marr-Hildreth edge detector (5.3.3/4.3.3) . . . . . . 89

LECTURE #5, 9.2.2015 . . . . . . . . . . . . . . . . . . . . . . 91

8.9 Scale-space methods (5.3.4/4.3.4) . . . . . . . . . . 928.10 Canny edge detector (5.3.5/4.3.5) . . . . . . . . . . 938.11 Parametric edge models (5.3.6/4.3.6) . . . . . . . . 958.12 Edges in multi-channel images (5.3.7/4.3.7) . . . . . 958.13 Other local neighborhood operations (5.3.9/4.3.8) . 968.14 Corner and interest point detection (5.3.10/4.3.8) . . 998.15 Adaptive local pre-processing (?/4.3.9) . . . . . . . 1008.16 Frequency domain image restoration (5.3.8,5.4/4.4) . 101

9. Morphology . . . . . . . . . . . . . . . . . . . . . . . . . 1029.1 Basic notations and operations (13.1/11.1) . . . . . 1039.2 Dilation ⊕ (fill, grow) (13.3.1/11.3.1) . . . . . . . . 1059.3 Erosion (shrink, reduce) (13.3.2/11.3.2) . . . . . . 1069.4 Some properties of dilation and erosion . . . . . . 1079.5 Opening and closing • (13.3.4/11.3.4) . . . . . . . 1089.6 Gray-scale dilation and erosion (13.4/11.4) . . . . . 109

5

Page 7: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #6, 23.2.2015 . . . . . . . . . . . . . . . . . . . . . 110

9.7 Skeletons and maximal ball . . . . . . . . . . . . . 1119.8 Hit-or-miss ⊗, thinning , thickening (13.3.3,13.5.3)1139.9 Golay alphabets . . . . . . . . . . . . . . . . . . . 1149.10 Quench function and ultimate erosion (13.5.4/11.5.4) 1159.11 Ultimate erosion and distance functions (11.5.5/13.5.5) 1169.12 Geodesic transformations (13.5.6/11.5.6) . . . . . . 1179.13 Morphological reconstruction (13.5.7/11.5.7) . . . . 1189.14 Granulometry (13.6/11.6) . . . . . . . . . . . . . . 1199.15 Morphological segmentation, watersheds (13.7/11.7) 120

10. Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12110.1 Properties of natural textures . . . . . . . . . . . . 122

LECTURE #7, 27.2.2015 . . . . . . . . . . . . . . . . . . . . . 123

10.2 Statistical texture descriptions (15.1.1,14.1.1) . . . . 12410.3 Co-occurrence matrices (15.1.2,14.1.2) . . . . . . . . 12610.4 Co-occurrence matrices – an example . . . . . . . 12710.5 Haralick features from co-occurrence matrix . . . . 12810.6 Edge frequency (15.1.3,14.1.3) . . . . . . . . . . . . 129

6

Page 8: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.7 Run length statistics (15.1.4,14.1.4) . . . . . . . . . 13110.8 Laws’ texture energy measures (15.1.5,14.1.5) . . . . 13210.9 Other statistical methods (15.1.6–8,14.1.6–7) . . . . 13410.10 Syntactic texture descriptions (15.2.1,14.2.1) . . . . 13510.11 Graph grammars (15.2.2,14.2.2) . . . . . . . . . . . 13610.12 Primitive grouping and hierarchical textures (15/14.2.3)13710.13 Hybrid texture description methods (15.3,14.3) . . . 13810.14 Application areas for texture analysis (15.4,14.4) . . 139

LECTURE #8, 2.3.2015 . . . . . . . . . . . . . . . . . . . . . . 140

11. Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 14111.1 Thresholding methods in segmentation (6.1/5.1) . . 14211.2 Edge-based segmentation (6.2/5.2) . . . . . . . . . 14611.3 Border detection as graph searching (6.2.4/5.2.4) . . 152

7

Page 9: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #9, 9.3.2015 . . . . . . . . . . . . . . . . . . . . . . 156

11.4 Hough transforms (6.2.6/5.2.6) . . . . . . . . . . . 15711.5 Region-based segmentation (6.3/5.3) . . . . . . . . 162

LECTURE #10, 16.3.2015 . . . . . . . . . . . . . . . . . . . . 166

11.6 Segmentation from template matching (6.4/5.4) . . 16812. Shape description . . . . . . . . . . . . . . . . . . . . . . 169

12.1 Methods and stages in image analysis (8/6) . . . . . 17012.2 Region identification from pixel labels (8.1/6.1) . . . 17112.3 Boundary-based description (8.2/6.2) . . . . . . . . 172

LECTURE #11, 23.3.2015 . . . . . . . . . . . . . . . . . . . . 177

12.4 Region-based description (8.3/6.3) . . . . . . . . . . 18113. Object recognition . . . . . . . . . . . . . . . . . . . . . . 189

13.1 Knowledge representation (9.1/7.1) . . . . . . . . . 189

LECTURE #12, 30.3.2015 . . . . . . . . . . . . . . . . . . . . 190

13.2 Statistical pattern recognition (9.2/7.2) . . . . . . . 19113.3 Neural network classifiers (9.3/7.3) . . . . . . . . . 19313.4 Syntactic pattern recognition (9.4/7.4) . . . . . . . 194

8

Page 10: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

13.5 Recognition as graph matching (9.5/7.5) . . . . . . 19513.6 Optimization techniques (9.6/7.6) . . . . . . . . . . 197

14. Image understanding . . . . . . . . . . . . . . . . . . . . . 19814.1 Control strategies (10.1/8.1) . . . . . . . . . . . . . 19914.2 Active contour models aka snakes (7.2/8.2) . . . . . 20114.3 Point distribution models, PDMs (10.3/8.3) . . . . . 20314.4 Principal component analysis, PCA (3.2.10/8.3) . . . 20414.5 Example: metacarpal bones, PCA+PDM (3.2.10/8.3) 20514.6 Pattern recognition in image understanding (10.5/8.4) 20614.7 Scene labeling and constraint propagation (10.7/8.5) 20714.8 Semantic image segmentation (10.8/8.6) . . . . . . 211

15. Motion analysis . . . . . . . . . . . . . . . . . . . . . . . 21215.1 Differential motion analysis methods (16.1/15.1) . . 21315.2 Optical flow (16.2/15.2) . . . . . . . . . . . . . . . 21415.3 Optical flow in motion analysis (16.2.4/15.2.4) . . . 21615.4 Correspondence of interest points (16.3/15.3) . . . . 218

EXAM GUIDE . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

9

Page 11: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #1, 12.1.2015

Learning goals: After this lecture the student should be able to

• understand the practical arrangements of the course

• understand what computer vision means

• recall the basic image acquisition and representation principles

• understand basic spatial properties of image pixels

• recall Fourier transform and convolution theorem

• understand the difficulties of 3D vision

10

Page 12: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

1. GENERAL INFORMATION

1.1 Passing the course

The course can be passed by doing the obligatory course assignment andpassing an exam.

1.2 Enrollment

Enroll yourselves in WebOodi:https://oodi.aalto.fi/w/opintjakstied.jsp?Tunniste=T-61.5070&html=1.

1.3 Notices

Announcements concerning the course are given in the web athttps://noppa.aalto.fi/noppa/kurssi/t-61.5070.

One can order emailed news notices from courses one attends athttps://noppa.aalto.fi/noppa/asetukset/uutiset.

11

Page 13: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

1.4 Lectures

Lectures are given on Mondays at 10–12 o’clock in lecture hall T6 by docentD.Sc.(Tech.) Jorma Laaksonen (mailto:[email protected]), room B304.

Lecture notes are available after the lecture in Noppa athttps://noppa.aalto.fi/noppa/kurssi/t-61.5070/luennot.

Before the lecture one can read the lecture notes of spring 2014 in Noppa athttps://noppa.aalto.fi/noppa/kurssi/t-61.5070/materiaali.

1.5 Exercises

Exercises are held on Fridays at 12–14 o’clock in lecture hall T5, start-ing 23.1.2015, by Ph.D. Rao Muhammad Anwer (mailto:[email protected]),room A321.

Exercise papers are available prior to the exercise in Noppa athttps://noppa.aalto.fi/noppa/kurssi/t-61.5070/viikkoharjoitukset.

12

Page 14: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

1.6 Exceptions in lecture and exercise timesThere will be some exceptions in lecture and exercise times:

• Friday 16.1. no exercise

• Friday 27.2. lecture instead of exercise

13

Page 15: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

1.7 Book

Milan Sonka, Vaclav Hlavac and Roger Boyle:Image Processing, Analysis and Machine Vi-sion. Two editions are available and can beused,either :4th Edition, Thomson, 2015, ISBN 978-1-133-59369-0 (international edition)or :3rd Edition, Thomson, 2008, ISBN 978-0-495-24428-7 (international student edition)

A photocopied sample copy of the book is availablefor short loans in a gray drawer in secretary TarjaPihamaa’s room B326.

In the 4th/3rd ed. book, chapter 14 is skipped. Inthe 2nd ed. book, chapters 12 and 13 are skipped.

14

Page 16: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

1.8 Additional material

Lecture notes and exercise papers with answers will be distributed as PDFfiles for download in Noppa athttps://noppa.aalto.fi/noppa/kurssi/t-61.5070/materiaali.

1.9 Exams

There will be at least three exams: on Tuesday 7 April 2015, one in theautumn and one in January 2016.

Use WebOodi to register for the exam!

In the exam, there will be five tasks, each worth of 6 points, so the maximumwill be 30 points. 11 points will suffice for passing the course. One of thetasks is a long textual question, one is based on some exercise tasks and oneconsists of six short questions.

15

Page 17: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

1.10 Obligatory course assignment

An obligatory course assignment has to be completed and accepted by thecourse assistant for passing the course. The assignment will be graded asaccepted/rejected. Further instructions concerning the practices will be givenby the assistant. Monday 13 April 2015 is the deadline for submitting theassignment.

One cannot participate in the exams after April 2015 unless theobligatory course assignment has been passed.

Further instructions will be visible in February athttps://noppa.aalto.fi/noppa/kurssi/t-61.5070/harjoitustyot.

In all questions related to the exercise work, please contact thecourse assistant (mailto:[email protected]).

1.11 Feedback from the course

After the lectures have ended, one can give feedback on the course.

16

Page 18: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

1.12 Become a summer trainee at ICS Department?

The Department of Information and Computer Science at the the AaltoUniversity is recruiting summer interns to participate in world-class research.

Prerequisites: successful studies in information and computer science, math-ematics, or bioinformatics, and interest in scientific research work .

The call for applications is open until 26 January 2015.

More information: http://dept.ics.aalto.fi/calls/summer2015/.

17

Page 19: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

2. Course learning goals

After the course the students should:

• understand 2D image formation from 3D scene

• know basic 2D and 3D image visual data representation forms

• understand the fundamentals of textures analysis

• be familiar with common image segmentation methods

• know fundamental image content description, analysis and classifica-tion techniques

• have advanced understanding in digital edge detection, morphologyand non-linear filtering

18

Page 20: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

3. Introduction

3.1 What computer vision stands for? (1)

• qualitative / quantitative explanation of images

• structural / statistical recognition of objects

3.2 What for is computer vision needed?

• quality control in manufacturing

• medical diagnostics

• robot control

• surveillance cameras

• analysis of remote sensing (satellite) imagery

• intelligence/espionage applications

• image databases

• optical character recognition

• biometrics

19

Page 21: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

3.3 Why is computer vision difficult? (1.2/)

• loss of information in 3D → 2D projection

• interpretation of data by a model is problematic

• noise is inherently present in measurements

• there is way too much data

• measured brightness is weakly related to world’s properties

• most methods rely on local analysis of a global view

20

Page 22: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

3.4 What are the essential parts of a CV system?

• low-level image processing

– noise reduction

– sharpening

– edge detection

– scale, rotation and location normalization

– compression

– feature extraction

• segmentation

• high-level “understanding”

– model fitting

– hypothesis testing

– classification

– feedback to preprocessing

21

Page 23: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

3.5 Image representation and analysis (1.3/)

Many different intermediate image content representations can be used.

Objects

Scale

Scene

2D image

Digital image

Image withfeatures

Edgels Interest points TextureRegions

22

Page 24: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

3.6 Some useful vocabulary

• heuristic / heuristics = badly justified, but useful

• a priori information = something known, eg. by an expert

• syntactic = structure described with symbols and rules

• semantic = content or meaning described or explained

• top down = starting from the whole, moving towards details

• bottom up = starting from details, moving towards the whole

23

Page 25: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

4. Digital image

4.1 Basic properties and definitions (2.1)

• continuous / discrete / digital image

• intensity / depth image

• monochromatic / multispectral image

• photometry: intensity, brightness, gray levels

• colorimetry: analysis of color (wavelength) information

• resolution: spatial / spectral / radiometric / temporal

24

Page 26: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

4.2 Digitization of images (2.2)

• sampling

• resolution

• 2D sampling interval ∆x,∆y

• sampling points, sampling grid

• band-limited spectrum

• Shannon’s sampling theorem

• quantization

25

Page 27: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

4.3 Metric properties of a digital image (2.3.1)

• distance D(p, q) is a metric iff:

1) D(p, q) = 0⇔ p = q (identity)

2) D(p, q) > 0⇔ p 6= q (non-negativity)

3) D(p, q) = D(q, p) (symmetry)

4) D(p, q) ≤ D(p, r) +D(r, q) ∀r (triangular inequality)

• distances D(p, q) between points p = (i, j) and q = (h, k):

DE((i, j), (h, k)) =√

(i− h)2 + (j − k)2

D4((i, j), (h, k)) =|i− h|+ |j − k|D8((i, j), (h, k)) = max|i− h|, |j − k|

DQE((i, j), (h, k)) = max|i− h|, |j − k|+

(√

2− 1) min|i− h|, |j − k|

26

Page 28: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Distance transform aka chamfering algorithm

1) pixel p in the object: F (p) := 0, otherwise F (p) :=∞

2) scanning top to bottom, left to right, causal 4- or 8-neighborhood AL:

F (p) := minq∈AL

(F (p), D(p, q) + F (q))

3) scanning bottom to top, right to left, causal 4- or 8-neighborhood BR:

F (p) := minq∈BR

(F (p), D(p, q) + F (q))

AL AL BR

AL p BR

AL BR BR

AL and BR

1 1 1

1 p 1

1 1 1

D8()

2 1 2

1 p 1

2 1 2

D4()

0 0 0 0 0 0 1 00 0 0 0 0 1 0 00 0 0 0 0 1 0 00 0 0 0 0 1 0 00 1 1 0 0 0 1 00 1 0 0 0 0 0 10 1 0 0 0 0 0 00 1 0 0 0 0 0 0

5 4 4 3 2 1 0 14 3 3 2 1 0 1 23 2 2 2 1 0 1 22 1 1 2 1 0 1 21 0 0 1 2 1 0 11 0 1 2 3 2 1 01 0 1 2 3 3 2 11 0 1 2 3 4 3 2

27

Page 29: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Adjacency of pixels

• 4- or 8-neighbors of pixels

• segmentation into regions on basis of adjacency

• path between pixels: simple/non-simple/closed

• contiguous pixels have a path between them

• being contiguous: reflective, symmetric and transitive

• simple contiguous = no holes, multiple contiguous = has holes

• connectivity paradoxes

28

Page 30: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Segmentation, borders/boundaries and edges

• segmentation: region / object / backround / holes

• border/boundary is related to binary images

• edges are local properties of grayscale images: strength and direction

• crack edge: interpixel difference between 4-neighbor pixels

Topological properties

• rubber sheet and rubber band operations and invariances

• convex hull and its deficits: lakes and bays

R RHistograms (2.3.2)

29

Page 31: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

4.4 Noise in images (2.3.6/2.3.5)

• white / Gaussian

• additive:f(x, y) = g(x, y) + ν(x, y)

• multiplicative:f = g + νg = g(1 + ν) ≈ gν

• quantization noise

• impulse noise = salt and pepper noise

• structural noise: clutter, spectle

30

Page 32: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

5. Mathematical tools and notations

5.1 Dirac distribution and convolution (3.1.2/2.1.2)

• 2-dimensional Dirac distribution δ(x, y):∫ ∞−∞

∫ ∞−∞

δ(x, y) dx dy = 1 , δ(x, y) = 0, ∀x, y 6= 0.

• 2-dimensional convolution f ∗ h:

g(x, y) =

∫ ∞−∞

∫ ∞−∞

f(a, b) h(x− a, y − b) da db

=

∫ ∞−∞

∫ ∞−∞

f(x− a, y − b) h(a, b) da db

= (f ∗ h)(x, y) = (h ∗ f)(x, y)

• 2-dimensional sampling:∫ ∞−∞

∫ ∞−∞

f(a, b) δ(a− x, b− y) da db = f(x, y)

31

Page 33: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

5.2 Image as a linear system (3.2.1/2.1.5)

• linear operator L:

Laf1 + bf2 = aLf1+ bLf2

• image representation by a point spread function:

g(x, y) = Lf(x, y)

=

∫ ∞−∞

∫ ∞−∞

f(a, b) Lδ(x− a, y − b) da db

=

∫ ∞−∞

∫ ∞−∞

f(a, b) h(x− a, y − b) da db

= (f ∗ h)(x, y)

G(u, v) = F (u, v)H(u, v)

32

Page 34: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

5.3 2-dimensional Fourier transform (3.2.4/2.1.3)

• forward and backward (inverse) transforms:

Ff(x, y) = F (u, v) =

∫ ∞−∞

∫ ∞−∞

f(x, y) e−2πi(xu+yv) dx dy

F−1F (u, v) = f(x, y) =

∫ ∞−∞

∫ ∞−∞

F (u, v) e2πi(xu+yv) du dv

• linearity of the Fourier transform:

Faf1(x, y) + bf2(x, y) = aF1(u, v) + bF2(u, v)

• translation of the origin:

Ff(x− a, y − b) = F (u, v)e−2πi(au+bv)

Ff(x, y)e2πi(u0x+v0y) = F (u− u0, v − v0)

• symmetry, if f(x, y) ∈ R:

F (−u,−v) = F ∗(u, v) = RealF (u, v) − i ImagF (u, v)

33

Page 35: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

5.4 Convolution theorem (3.2.4/2.1.3)

• duality of the convolution:

F(f ∗ h)(x, y) = F (u, v)H(u, v)

Ff(x, y) h(x, y) = (F ∗H)(u, v)

• is equivalent to

if g(x, y) = f(x, y) ∗ h(x, y)

then G(u, v) = F (u, v)H(u, v)

• and

if g(x, y) = f(x, y) h(x, y)

then G(u, v) = F (u, v) ∗H(u, v)

34

Page 36: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

5.5 Image as a stochastic process (3.3/2.1.4)

• entropy H = −∑n

k=1 p(ak) log2 p(ak)

• average value µf (x, y, ωi) = Ef(x, y, ωi) =∫∞−∞ z p1(z;x, y, ωi) dz

• stationarity ⇒ µf (x, y, ωi) = µf (ωi)

• crosscorrelation

Rfg(a, b, ωi) =

∫ ∞−∞

∫ ∞−∞

f(x+ a, y + b, ωi) g(x, y, ωi) dx dy

FRfg(a, b, ωi) = F ∗(u, v)G(u, v) (stat.)

• autocorrelation

Rff (a, b, ωi) =

∫ ∞−∞

∫ ∞−∞

f(x+ a, y + b, ωi) f(x, y, ωi) dx dy

FRff (a, b, ωi) = F ∗(u, v) F (u, v) = |F (u, v)|2 (stat.)

• f(x, y)’s power spectrum = spectral density: Sff (u, v) = FRff (a, b)

• ergodicity ⇔ µf (x, y, ωi) = f(x, y)

35

Page 37: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6. 3D vision

6.1 Difficulties of 3D vision (11/9)

• the camera projects 3D to 2D and unique inversion doesn’t exist

• complicated correspondence between measured intensities and the scene

• objects occlude themselves and each other

• noise and time complexity of algorithms

General questions (11.1/9.1)

• a priori knowledge about image characteristics being searched

• selection of the form of presentation, its influence on interpretations

• image interpretation: mapping from internal structures to the world

36

Page 38: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #2, 19.1.2015

Learning goals: After this lecture the student should be able to

• understand the difficulties of 3D vision

• explain Marr’s theory on bottom-up vision systems

• understand the calculation of 3D-to-2D projection

• explain stereopsis and shape from stereo vision

• explain shape from X with radiometry

37

Page 39: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.2 Strategies of 3D vision (11.1/9.1)

• bottom-up reconstruction

– the most general solution for any problem

– biological motivation

– Marr, 1982

• top-down recognition

– model-based vision

– a special solution for a specific problem

– engineering point of view

• 2D substituents

– geon-based 2D approaches with qualitative features

– alignment of 2D views

38

Page 40: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.3 Marr’s theory (11.1.1/9.1.1)

The three levels of information processing

• computational theory: logic or strategy for performing a task

• representation and algorithm: details on data and its processing

• implementation: programs and hardware

Stages of a bottom-up vision system according to Marr

• 2D image: input data

• primal sketch: detection of significant intensity changes (edges)

• 2.5D sketch: reconstruction of a depth map

• 3D representation: movement to object-centered description

– the last stage matches a top-down step

– a priori knowledge can be used for regularization

39

Page 41: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.4 Active versus passive computer vision (11.1.2/9.1.2)

• classical computer vision viewpoint: static, passive cameras

• robot systems can make use of active perception

• a system can actively acquire information it needs

• many ill-posed vision tasks become well-posed

Other dichotomies

• qualitative versus quantitative vision

• purposive vision versus precise description techniques

40

Page 42: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.5 3D projection geometry (11.2.1/9.2.1)

image p

lane

horizon

optical axis / gaze direction

base plane

vanishing point (epipole)

focal point

• 3D world is mapped to a 2D plane

• in perspective projection, parallel lines meet in the epipole

41

Page 43: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.6 Geometry of single perspective camera (11.3.1/9.2.2)

principal point

optical ray

focal point

optical axis

scene point

Zc

Yc

Xc

Z

Y

X

w

v

u

camera coordinates

coordinatesworld

coordinates

image

t R

f

[0, 0, -f][u0, v0, 0]image plane

projected point

• all points along an optical ray project to the same point

• single perspective camera system includes four coordinate systems:

– world coordinates X = (X, Y, Z[, 1])T

– camera coordinates Xc = (Xc, Yc, Zc[, 1])T

– image Euclidean coordinates ui = (ui, vi, wi)T

– image affine coordinates u = (u, v, w)T

42

Page 44: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Transformations between coordinate systems

A scene point X is transformed to camera coordinates with the extrinsiccamera calibration parameters shift t and rotation R in the 3-dimensionalnon-homogeneous case:

Xc = R(X− t)

The same can be written in homogeneous form where Xc and X are 4-dimensional by augmentation of “1”.

Xc =

[R −Rt0T 1

]X

43

Page 45: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

The camera coordinate point Xc is projected to the image plane in Euclideancoordinates by the non-homogeneous equation as:

ui =Xcf

Zc, vi =

Ycf

Zc

In homogeneous coordinates it equals to:

ui '

f 0 0 00 f 0 00 0 1 0

Xc

(The homography or collineation symbol ' means that the equation holdsup to unknown scale.)

It will be easier to assume first that f = 1 and intruduce its true value later.Then:

ui '

1 0 0 00 1 0 00 0 1 0

Xc

44

Page 46: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

The last step from image Euclidean coordinates to image affine coordinatescan be expressed with the intrinsic calibration matrix K as:

u ' Kui =

f s −u0

0 g −v0

0 0 1

ui

If g 6= f , then the scaling along the x and y axes will be different. If s 6= 0,it determines the shear of the axes in the image plane.

All transformation stages can be concatenated:

u ' K

1 0 0 00 1 0 00 0 1 0

[R −Rt0T 1

]X = K [R | −Rt] = MX

where M is the projection matrix.

45

Page 47: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.7 Two cameras (stereopsis) (11.5.1/9.2.5)X

C’

C

u

e

right image

e’

u’

left image

baseline

l l’

epipolar lines

epipolar plane

epipoles

• in a general setting the two cameras can see each other

• the line connecting the cameras is called the baseline

• the cameras and the object point determine the epipolar plane

• the intersection of the image plane with the baseline and the rays fromX determine the epipolar lines l and l’

• in the rectified configuration the cameras have parallel axes

46

Page 48: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.8 Shape from stereo vision (11.5.5/9.2.5)

• projection geometry: perspective / orthographic

z = 0

f

hh

C

x = 0 x = 0x = 0l r

X=[x,y,z]T

C’

z

x

u u’

• z can be solved from similar right-angled triangles:

u

f= −h+ x

z,

u′

f=h− xz

⇒ z =2hf

u′ − u=

2hf

d

47

Page 49: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.9 Point correspondence in stereo vision (11.6.1/9.2.11)

The correspondence of points in camera pair views is constrained by:

• epipolar constraint

• uniqueness constraint (almost always)

• symmetry constraint

• photometric (intensity) compatibility constraint

• geometric similarity constraints

• disparity smoothness constraint

• feature compatibility (same discontinuity) constraint

• disparity search range / small disparity values

• disparity gradient limit

• ordering constraint (almost always)

48

Page 50: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Matching based of correlation

Multiple approaches exist:

• best match for each pixel without any special interest points

• each pixel/block is associated with the best matching pixel/block

• the resulting disparity function may be sparse, can be made denser

• edge detection can be applied prior to correlation

• gradual refinement of the resolution

• projection of a dot pattern on the scene

49

Page 51: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

PMF algorithm

• matching based on candidate pairs of similar visual feature points

• set of feature points (eg. edges) are extracted from each image

– SIFT or SURF features?

• epipolar constraint: the y coordinate is the same in matching points

• uniqueness constraint: one-to-one matching

x

A

B

A

B

A

B

x

l

l

c

c

r

r

rl

Left Cyclopean Right

50

Page 52: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• cyclopean separation S(A,B) is the distance between A and B:

S(A,B) =

√((axl + axr

2)− (

bxl + bxr2

))2

+(ay − by

)2

=

√1

4(xl + xr)2 + (ay − by)2

• the difference D(A,B) in disparity between matches A and B is:

D(A,B) = (axl − axr)− (bxl − bxr) = xl − xr

• the disparity gradient Γ(A,B) should be as small as possible:

Γ(A,B) =D(A,B)

S(A,B)=

xl − xr√14(xl + xr)2 + (ay − by)2

• small disparity gradient ⇔ coherence between A and B

51

Page 53: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.10 Active acquisition of range images (11.6.2/9.2.12)

• shape from X techniques are generally passive

• range image / depth map can be obtained with explicit methods

• laser light: reflection, delay, phase shift

• laser stripe finders

• laser stripe is projected on the object

• object or stripe is shifted and/or rotated

• radar images, ultra sound images

• Moire interference patterns give relative distance information

52

Page 54: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.11 Radiometry in 3D vision (11.7.1,3.4.5/9.3)

• radiometry, photometry, shape from shading

• humans can perceive distances from intensity changes

• measured intensity depends on surface reflectance and direction

• light source(s) have effect on the measured intensity

• radiometric methods are generally quite unreliable

• reflectance function R(Ω) in spherical angle coordinates Ω

– Lambertian matte

– specular (mirror) surface

• surface distance z(x, y), surface gradient space

– normal vector of the surface: (p(x, y), q(x, y)) =(∂z∂x, ∂z∂y

)• reflectance map R(p(x, y), q(x, y))

• shade smoothness constraint: (∇p(x, y))2 and (∇q(x, y))2 small

• good match of f(x, y) and R(p(x, y), q(x, y)) reveals z(x, y)

53

Page 55: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.12 Shape from X (12/10)

• shape from stereo

• shape from shading

• shape from motion

• shape from optical flow

• shape from texture

• shape from focus

• shape from de-focus

• shape from vergence

• shape from contour

54

Page 56: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.13 Shape from motion (12.1.1/10.1.1)

• human eyes and brain use motion information very efficiently

• Ullman’s experiment with virtual coaxial cylinders

• static background is generally assumed

• objects need to be rigid, ie. their shape must not change

• objects can move and rotate, they have six degrees of freedom

• sequence of images captures motion of the object

• assume we can match N correspondence points (x, y) in all images

• images from different times are equivalent to different projections

• 3 projections × 4 matched points ⇒ 1 interpretation

• maximum likelihood estimator: bundle adjustment

• random sampling with outlier removal: RANSAC

55

Page 57: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #3, 26.1.2015

Learning goals: After this lecture the student should be able to

• explain shape from X with motion

• explain the principle of optical flow

• understand the basics of shape from texture

• describe computer vision models of the 3D world

• explain the Line labeling and Goad’s algorithms

56

Page 58: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Shape from motion: optical flow

• depending on the lighting, human eye can distinguish 15–20 frames/s

• assume static background, camera moving in z direction with speed v

• all points (x, y) move continuosly, optical flow field (dxdt, dydt

)

• spherical polar coordinates (r, θ, ϕ)

• depth map r(x, y) and surface directions can be solved

r

z

θ

y

ϕ

x

(x,y,z)dx

dt= 0 ,

dy

dt= 0 ,

dz

dt= −v

dr

dt= −v cosϕ ,

dt= 0

dt=v(1− cos2 ϕ)

r sinϕ=v sinϕ

r

57

Page 59: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.14 Shape from texture (12.1.2/10.1.2)

• human eyes and brain use texture information very efficiently

• texture primitives or texel are distorted ⇒ texture gradient

• distance, slant = z angle, tilt = rotation in xy

• circles are perceived as ellipses:

– slant: ratio of ellipse axis lengths

– tilt: direction of ellipse axes

• shape from texture ≈ shape from shading

58

Page 60: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.15 Models of 3D world (12.2/10.2)

• 3D models of the world have two very different uses:

– reconstruction of the model by the actual object (↑)

– recognition of the actual object by the model (↓)

• model creation can be compared with CAD systems

• wire models vs. surface models vs. volumetric models

• the completeness and uniqueness of presentation

general caseworldmodels

???

models worldcomplete model

models worldunique model

59

Page 61: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.16 Line labeling algorithm (12.2.2/10.2.2)

• Roberts 1965, Clowes 1971, Huffman 1971

• for modelling of (only) blocks world

• each 3D edge is a meeting of two planar faces

• each 3D vertex is a meeting of three planar faces

• each 3D vertex can be seen in four different types of junction

• 22 different 2D to 3D vertex interpretations exist

• all interpretations of all detected 2D edges can be listed

• both ends of an edge need to have same interpretation (convex/concave)

• global coherence of interpretation for all edges and surfaces

60

Page 62: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.17 More models of 3D world (12.2.4,5/10.2.4,5)

• Constructive Solid Geometry (CSG): cuboid, cylinder, sphere, coneand half-space

• volumetric models: voxels or super-quadrics:(( xa1

)2/εv+( ya2

)2/εv

)εh/εv+( za3

)2/εv= 1

• generalized cylinders

• surface models: surfaces+edges+graph

• surface triangulation, eg. Delaunay triangulation

• surface modelling with quadric model:∑

i,j,k∈0,1,2

aijkxiyjzk = 0

61

Page 63: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.18 On recognition of 3D objects (12.3/102.3)

• top-down: sensor data matched with existing model

• sensor data is often limited to part of the object

• also matching needs to be based on partial object model

• part of the model is used to formulate a matching hypothesis

• matching can be performed on data or feature level

62

Page 64: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.19 Goad’s algorithm (12.3.2/10.3.2)

• recovers coordinates (location and rotation) of a known 3D object

• the object known as wire model, edges are detected in the image

• distance to the camera is known (and therefore also the size)

• the object is fully visible in narrow field of view

• the 5 degrees of freedom of the camera are quantized

• matching is done edge by edge

• cameras location gets more precise on each iteration

• matching choises → branching points

• no choices left → backtracing

• preprocessing of the model can be used to speed up the process

63

Page 65: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.20 Model-based 3D recognition from intensity images(12.3.3/10.3.3)

• description of curved surfaces more difficult than linear ones

• one image doesn’t provide enough information

• a partial model can be created from the image

• the partial model can be compared with stored full models

• surface features, typically eg. curvature

• surface characterization, partitioning of the surface

• invariances to projection, rotation and shift

64

Page 66: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• topographic description, topographic primal sketch:

– partial differentials of intensity in all pixels

– 5-dimensional description for each pixel

– 10 pixel types

– invariant to brightness and contrast changes

65

Page 67: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

6.21 2D view-based representations for 3D (12.4/10.4)

• viewer-centered representation (as compared to object-centered)

• characteristic images stored for all different viewpoints

• 2D projections of all surfaces and vertices

• creation of an aspect graph

66

Page 68: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Geons as 2D view-based representation (12.4.3/10.4.3)

• geons (GEOmetrical iONs): 36 enumerated models with followingattributes:

– edge: straight / curved

– symmetry: rotational / reflective / asymmetric

– size variation: constant / expanding / varying

– spine: straight / curved

Other techniques (12.4.4/10.4.4)

• use of multiple stored 2D views

• 2D reference views

• creation of a virtual view

67

Page 69: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

7. Data structures

7.1 Introduction (4.1/3.1)

• data structures pass information from one abstraction level to another

• different information abstractions call for different data structures

• data structures and algorithms are always coupled

– “iconic” pixel image

– edges detected in image to match object borders

– image segmented in regions

– geometric representations

– relational models

68

Page 70: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

7.2 Traditional data structures (4.2/3.2)

• matrices

– spatial and neighborhood relations

– binary / grayscale / multichannel

– use of different resolutions leads to hierarchic structure

– co-occurrence matrix, integral image matrix

• chain codes

• topological descriptions

• relational structures

7.3 Hierarchic data structures (4.3/3.3)

• matrix pyramids and tree pyramids

• quadtrees

69

Page 71: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

7.4 Co-occurrence matrix (4.2.1/3.2.1)

• 2-dimensional generalization of histogram

• joint distribution of graylevel values of neighboring pixels

Cr(z1, z2) = #f(x1, y1) = z1, f(x2, y2) = z2, (x1, y1)r(x2, y2)

• if relation r is =, then Cr(z, z) is histogram

• typically r is a shift: x2 = x1 + ∆x, y2 = y1 + ∆y

• often r is assumed symmetric → Cr is symmetric

• measurement of edges of specific orientation and values

• texture analysis

• intermediate representation for feature extraction

70

Page 72: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #4, 2.2.2015

Learning goals: After this lecture the student should be able to

• use different kinds of data structures for computer vision algorithms

• use different kinds of image preprocessing methods

• understand Marr-Hildreth edge detector

71

Page 73: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

7.5 Integral image matrix (4.2.1/3.2.1)

• Calculating sums like

S(x0, y0, x1, y1) =

x1∑x=x0

y1∑y=y0

f(x, y)

exhaustively over a range of x0, y0, x1, y1 is time consuming.

• An efficient solution is to use integral image:

iif(x, y) =x∑i=0

y∑j=0

f(i, j)

• Then the calculation of S() reduces to three additions:

S(x0, y0, x1, y1) = iif(x1, y1)− iif(x1, y0 − 1)

− iif(x0 − 1, y1) + iif(x0 − 1, y0 − 1)

• Used successfully eg. in Viola-Jones face detector.

72

Page 74: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

7.6 Chain structures (4.2.2/3.2.2)

• used often for describing object boundaries

• chain-coded object is not bound to any specific location

• chain code aka Freeman code aka F-code

– directions between adjacent 4- or 8-neighbor pixels

– starting point has to be fixed

23 1

04

5 6 7

00077665555556600000006...

• vectors between chained pixels can also be longer than one

• chains can be either closed or open

• rotations are easy to implement with chain codes

73

Page 75: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

7.7 Topological data structures (4.2.3/3.2.3)

• graphs G = (V,E)

• nodes V = v1, v2, · · · , vn

• edges E = e1, e2, · · · , em

• degree of a node

• weighted or evaluated graph: costs associated to nodes and edges

• region adjacency graph and region map

01

34

5

2

5

21

3 4

0

74

Page 76: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

7.8 Relational database structures (4.2.4/3.2.4)

Nr. Object Color Start row Start column Inside of

1 sun white 5 40 22 sky blue 0 0 -3 cloud gray 20 180 24 tree trunk brown 95 75 65 tree crown green 53 63 -6 hill light green 97 0 -7 pond blue 100 160 6

1 23

7

65

4

75

Page 77: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

7.9 Hierarchical data structures (4.3/3.3)

Pyramids (3.3.1/4.3.1)

• matrix or M pyramids

– series of matrices ML,ML−1, · · · ,M0– ML = original image

– M0 = 1 pixel

– Mi−1 = 14

of Mi

• tree or T pyramids

– graph where nodes are placed in layers

– each layer matches a matrix in M pyramid

– each node has 4 child nodes

– method for calculating values in parent nodes is needed

– total number of nodes N2(1 + 14

+ 116

+ · · · ) ≈ 1.33N2

76

Page 78: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Quadtrees (4.3.2/3.3.2)

• resemble T pyramids, but:

• only heterogeneous nodes are divided

• non-balanced tree

• sensitive to small changes in input images

• bounding to object coordinates

• paths can be coded as symbol strings

77

Page 79: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8. Pre-processing

• pixel image → pixel image

• processing remains on low abstraction level

• data enhancement for later processing stages

• different pre-processing techniques:

– changing brightness value of a single pixel

– geometric transformations

– local neighborhood methods

– frequency domain operations

• or another taxonomy:

– image enhancement

– image restoration

78

Page 80: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• or third division on basis of a priori information:

– no information about the properties of the error

– some knowledge about the data acquisition

– error properties are estimated from the image itself

79

Page 81: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.1 Brightness value changes in single pixels (5.1/4.1)

• brightness corrections

– correction of optics and acquisition f(i, j) = e(i, j)g(i, j)

– a reference image from flat surface in constant lighting c,

g(i, j) =f(i, j)

e(i, j)=cf(i, j)

fc(i, j)

• gray-scale transformations

– q = T (p), T : [p0, pk]→ [q0, qk]

– equalization of histogram H(i)

q = T (p) =qk − q0

N2

p∑i=p0

H(i) + q0

– logarithmic gray-scale transformation T (p) = c1 log p+ c2

– pseudo-color transformation

80

Page 82: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.2 Geometric co-ordinate transformations (5.2.1/4.2.1)

• restoration of image’s distorted geometry

T

x’

y

x

y’

• coordinate transform (x′, y′) = T (x, y)

• Jacobian value of the transform

J =∣∣∂(x′, y′)

∂(x, y)

∣∣ =

∣∣∣∣∣ ∂x′

∂x∂x′

∂y∂y′

∂x∂y′

∂y

∣∣∣∣∣• general polynomial form

x′ =m∑r=0

m−r∑k=0

arkxryk y′ =

m∑r=0

m−r∑k=0

brkxryk

81

Page 83: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• bilinear transform

x′ = a0 + a1x+ a2y + a3xy

y′ = b0 + b1x+ b2y + b3xy

J = a1b2 − a2b1 + (a1b3 − a3b1)x+ (a3b2 − a2b3)y

• affine transform

x′ = a0 + a1x+ a2y

y′ = b0 + b1x+ b2y

J = a1b2 − a2b1

• rotation scaling skewingx′ = x cosφ+ y sinφ x′ = ax x′ = x+ y tanφy′ = −x sinφ+ y cosφ y′ = by y′ = yJ = 1 J = ab J = 1

82

Page 84: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.3 Brightness interpolation (5.2.2/4.2.2)

• inverse mapping (x, y) = T−1(x′, y′)

• nearest-neighbor interpolation: f1(x, y) = gs(round(x), round(y))

0-0.5 0.5

1

x

h

• linear interpolation:

l = round(x), k = round(y), a = x− l, b = y − k

f2(x, y) = (1− a)(1− b)gs(l, k) + a(1− b)gs(l + 1, k) + · · ·

0 x

h

1-1

2

• bi-cubic interpolation

83

Page 85: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.4 Local pre-processing (5.3.1/4.3.1)

• masks, convolutions, filtering/filtration

• smoothing and gradient operators

• linear and non-linear methods

• edge-preserving smoothing

• sequential smoothing, noise supression σ2/n

• spatial averaging

h = 19

[ 1 1 11 1 11 1 1

]h = 1

10

[ 1 1 11 2 11 1 1

]h = 1

16

[ 1 2 12 4 21 2 1

]

84

Page 86: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.5 Additional constraints for local averaging (?/4.3.1)

• only for limited grayvalue range

• only for limited range of grayvalue changes

• only for small gradient magnitudes

• in proportion to inverse of gradient magnitude

δ(i, j,m, n) =1

|g(m,n)− g(i, j)|

h(i, j,m, n) = 0.5δ(i, j,m, n)∑

(m,n) δ(i, j,m, n)

85

Page 87: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Rotating mask in averaging (5.3.1/4.3.1)

• the neighborhood that produces the lowest variance is selected

Non-linear methods

• median filtering

• filterings based on ranks and order statistics

• non-linear mean filters

• homomorphic filtering

86

Page 88: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.6 Local neighborhood in edge detection (5.3.2/4.3.2)

• gradient has direction and magnitude

|grad g(x, y)| =

√(∂g∂x

)2

+(∂g∂y

)2

ψ = arg(∂g∂x,∂g

∂y

)• Laplacian

∇2g(x, y) =∂2g(x, y)

∂x2+∂2g(x, y)

∂y2

• image sharpening, unsharp masking

f(i, j) = g(i, j) + CS(i, j)

• approximation of derivatives

• zero-crossings of the second derivative

• parametric fitting

87

Page 89: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.7 Edge detection by derivative approximation (5.3.2/4.3.2)

• Roberts [ 1 00 −1

] [ 0 1−1 0

]• Prewitt[ 1 1 1

0 0 0−1 −1 −1

] [ 0 1 1−1 0 1−1 −1 0

] [ −1 0 1−1 0 1−1 0 1

]· · ·

• Sobel [ 1 2 10 0 0−1 −2 −1

] [ 0 1 2−1 0 1−2 −1 0

] [ −1 0 1−2 0 2−1 0 1

]· · ·

• Laplace[ 0 1 01 −4 10 1 0

] [ 1 1 11 −8 11 1 1

] [ 2 −1 2−1 −4 −1

2 −1 2

] [ −1 2 −12 −4 2−1 2 −1

]

88

Page 90: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.8 Marr-Hildreth edge detector (5.3.3/4.3.3)

• edge detection from second derivative zero-crossings

• image smoothing by a Gaussian kernel

G(x, y;σ) = e−x2+y2

2σ2 G(r;σ) = e−r2

2σ2

• calculation of Laplace image

∇2[G(x, y;σ) ∗ f(x, y)]

• association order of the operators is changed: Laplace of Gaussian,LoG

[∇2G(x, y;σ)] ∗ f(x, y)

• algebraic solution of the second derivative

G′′(r;σ) =1

σ2(r2

σ2− 1) e−

r2

2σ2

h(x, y;σ) = c(x2 + y2 − σ2

σ4) e−

x2+y2

2σ2

89

Page 91: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• “mexican hat” function

• proper mask size is approximately 6σ × 6σ · · · 10σ × 10σ

• operator can be separated in x- and y-directions

• resembles the operation of the human eye

• ∇2G can be approximated by the difference of two Gaussians, DoG

90

Page 92: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #5, 9.2.2015

Learning goals: After this lecture the student should be able to

• understand the basic principle of scale-space methods

• understand the principle of the Canny edge detector

• describe parametric edge models

• describe Moravec detector

• explain basic morphological operations

91

Page 93: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.9 Scale-space methods (5.3.4/4.3.4)

• smoothing parameter, eg. σ, is varied to produce a family of images

• 1) curves can be analyzed at multiple scales

• 2) scale-space filtering: f(x, y) image to a set of F (x, y, σ) images

– convolution with Gaussian function (1-dim. case)

G(x, σ) = e−x2

σ2 , F (x, σ) = G(x, σ) ∗ f(x)

– edges from second derivative’s zero-crossings

∂2F (x, σ0)

∂x2= 0 ,

∂3F (x, σ0)

∂x36= 0

– different qualitative information with different σ, interval tree

92

Page 94: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.10 Canny edge detector (5.3.5/4.3.5)

• optimal for step-shape edges in additive white noise

– detection: edges are not missed, no spurious responses

– localization: located and actual positions near each other

– uniqueness: single edge doesn’t produce multiple responses

1. image f is convolved with σ-scale Gaussian function

2. local edge’s normal direction is estimated in each pixel

n =∇(G ∗ f)

|∇(G ∗ f)|

3. 2nd derivative’s zero-crossings are located in the normal direction

∂2

∂n2G ∗ f = 0, non-maximal suppression

93

Page 95: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

4. edge thresholding with hysteresis

• generalization of thresholding with high and low thresholds

• weak edge pixels are supported by strong nearby edge pixels

• only strong changes are detected, increased signal-to-noise ratio

5. edge information is collected with different σ values

• edge feature synthesis from small σ to large σ

• differences between prediction and reality give true information

94

Page 96: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.11 Parametric edge models (5.3.6/4.3.6)

• a facet model is estimated for each pixel, eg.:

g(x, y) = c1 +c2x+c3y+c4x2 +c5xy+c6y

2 +c7x3 +c8x

2y+c9xy2 +c10y

3

• least-squares methods in matching

• extreme points and values of derivatives are solved from the parameters

• sub-pixel localization

8.12 Edges in multi-channel images (5.3.7/4.3.7)

• edges can be solved for each channel separately

• scalar value can be obtained from the sum or maximum value

• channel difference or ratio can also be used

• Roberts gradient has a 2× 2× n-sized generalization

– only magnitude, no direction information is produced

95

Page 97: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.13 Other local neighborhood operations (5.3.9/4.3.8)

• some methods fall under morphological operations

• narrow lines detected with matched masks of different orientation

f(i, j) = max0,maxkhk(i, j) ∗ g(i, j)

h1 =

0 0 0 0 00 −1 2 −1 00 −1 2 −1 00 −1 2 −1 00 0 0 0 0

, h2 =

0 0 0 0 00 0 −1 2 −10 −1 2 −1 00 −1 2 −1 00 0 0 0 0

,

h3 =

0 0 0 0 00 0 −1 2 −10 −1 2 −1 0−1 2 −1 0 00 0 0 0 0

, h4 =

0 0 0 0 00 −1 2 −1 00 −1 2 −1 0−1 2 −1 0 00 0 0 0 0

, · · ·

96

Page 98: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Line thinning (?/4.3.8)

• maximum of gradient in line’s normal direction, non-maximal sup-pression

• conditional change of pixel value: centermost 1→0 if the mask matches 1 x 01 1 0x 0 0

x 1 10 1 x0 0 0

x 1 x1 1 xx x 0

x 1 xx 1 00 x 0

0 0 0x 1 01 1 x

0 0 x0 1 10 x 1

x x 01 1 xx 1 x

0 x xx 1 1x 1 x

97

Page 99: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Filling of broken lines (?/4.3.8)

• conditional change of pixel value: centermost 0→1 if the mask matches 0 1 00 0 00 1 0

0 0 01 0 10 0 0

1 0 00 0 00 1 0

0 0 11 0 00 0 0

0 1 0

0 0 00 0 1

0 0 00 0 11 0 0

1 0 00 0 00 0 1

0 0 10 0 01 0 0

98

Page 100: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.14 Corner and interest point detection (5.3.10/4.3.8)

• Moravec detector

MO(i, j) =1

8

i+1∑k=i−1

j+1∑l=j−1

|g(k, l)− g(i, j)|

• the facet model can be used for detecting corner points

f(x, y) = c1+c2x+c3y+c4x2+c5xy+c6y

2+c7x3+c8x

2y+c9xy2+c10y

3

– Zuniga-Haralick operator

ZH(i, j) =−2(c2

2c6 − c2c3c5 − c23c4)

(c22 + c2

3)32

– Kitchen-Rosenfeld operator

• Harris corner detector

99

Page 101: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.15 Adaptive local pre-processing (?/4.3.9)

• each pixel’s local neighborhood and background is solved

• neighborhood by grayvalues, texture, motion, . . .

• region growing from a seed point by using 8-neighbors

|f(k, l)− f(i, j)| ≤ T1|f(k, l)− f(i, j)|

f(i, j)≤ T2

• growing of background around the neighborhood, eg. constant width

• different neighborhood for eachpixel, except redundant seed points

• noise reduction, histogram process-ing, contrast enhancement

c =F −BF +B

f ′(i, j) =B(1 + c′)

1− c′

100

Page 102: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

8.16 Frequency domain image restoration (5.3.8,5.4/4.4)

• correction of degradations caused by image formation

• deterministic / stochastic methods

• image degradation process h(a, b, i, j), additive noise ν(i, j)

g(i, j) = s

(∫ ∞−∞

∫ ∞−∞

f(a, b)h(a, b, i, j) da db

)+ ν(i, j)

g(i, j) = (h ∗ f)(i, j) + ν(i, j)

G(u, v) = H(u, v)F (u, v) +N(u, v)

– camera or object motion H(u, v) = sin(πV Tu)πV u

– wrong lense focus H(u, v) = J1(ar)ar

(Bessel function J1)

– atmospheric turbulence H(u, v) = e−c(u2+v2)

56

• inverse filtration Wiener filtration

F (u, v) = G(u,v)H(u,v)

F (u, v) = H∗(u,v)G(u,v)

|H(u,v)|2+Sνν (u,v)Sff (u,v)

101

Page 103: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9. Morphology

• processing of binary images with logical/set operations

• image is treated as a set of pixels

• generalizations to gray-level images exist

Application areas: (13.1/11.1)

• pre-processing: reduction of binary noise

• shape extraction or enhancement

• qualitative description of objects

Operations:

• dilation & erosion

• opening & closing

• hit-or-miss: thinning & thickening

• conditional operations102

Page 104: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.1 Basic notations and operations (13.1/11.1)

• point set E2 = Euclidean 2-dimensional space

• discrete point set Z2 or Z3

• subset ⊂, ⊃, intersection ∩, union ∪

• empty set ∅, complement ()C , difference X \ Y = X ∩ Y C

• symmetrical set or rational set or transpose B = −b : b ∈ B

• operator Ψ() has a dual operator Ψ∗(): Ψ(X) = [Ψ∗(Xc)]c

• structuring element, isotropic structuring element

• origin / reference point / current pixel

@@ @@ @@

103

Page 105: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Translation or shift

Xh = p ∈ E2, p = x+ h for some x ∈ X

@@ @@

Quantitative morphological operations (13.2/11.2)

• compatibility with translation Ψ(Xh) = [Ψ(X)]h

• compatibility with change of scale Ψ(X) = λΨ( 1λX)

• local knowledge [Ψ(X ∩ Z)] ∩ Z ′ = Ψ(X) ∩ Z ′

• upper semi-continuity

104

Page 106: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.2 Dilation ⊕ (fill, grow) (13.3.1/11.3.1)

• expands the image, fills gaps

• X ⊕B = p ∈ E2 : p = x+ b, x ∈ X and b ∈ B

@@

@@

@@

• union of translations X ⊕B = ∪b∈B Xb

• commutative X ⊕B = B ⊕X

• associative X ⊕ (B ⊕D) = (X ⊕B)⊕D

• invariant to translation Xh ⊕B = (X ⊕B)h

• increasing transformation X ⊆ Y ⇒ X ⊕B ⊆ Y ⊕B

105

Page 107: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.3 Erosion (shrink, reduce) (13.3.2/11.3.2)

• makes the image smaller, removes details

• X B = p ∈ E2 : p+ b ∈ X for all b ∈ B

@@

@@

@@

• (0, 0) ∈ B ⇒ X B ⊆ X

• D ⊆ B ⇒ X B ⊆ X D• intersection of translationsX B = ∩b∈B X−b• invariant to translations XhB = (XB)h , XBh = (XB)−h

• increasing transformation X ⊆ Y ⇒ X B ⊆ Y B• non-commutative X B 6= B X

106

Page 108: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.4 Some properties of dilation and erosion

Duality

(X Y )C = XC ⊕ Y

Combination laws(X ∩ Y )⊕B ⊆ (X ⊕B) ∩ (Y ⊕B)B ⊕ (X ∩ Y ) ⊆ (X ⊕B) ∩ (Y ⊕B)B ⊕ (X ∪ Y ) = (X ∪ Y )⊕BB ⊕ (X ∪ Y ) = (X ⊕B) ∪ (Y ⊕B)

(X ∩ Y )B = (X B) ∩ (Y B)B (X ∩ Y ) ⊇ (B X) ∩ (B Y )(X ∪ Y )B ⊇ (X B) ∪ (Y B)B (X ∪ Y ) = (X B) ∪ (Y B)

Association laws

(X ⊕B)⊕D = X ⊕ (B ⊕D)

(X B)D = X (B ⊕D)

107

Page 109: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.5 Opening and closing • (13.3.4/11.3.4)

• combinations of dilation and erosion

• opening removes non-connected points

• closing fills in holes and gaps

• area is preserved approximately in the operations

• X B = (X B)⊕B X •B = (X ⊕B)B

• operations are each other’s duals: (X •B)C = XC B

• opening and closing are idempotent operations

X B = (X B) BX •B = (X •B) •B

• one may say that X is open / closed with respect to B

108

Page 110: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.6 Gray-scale dilation and erosion (13.4/11.4)

• above operations work with binary images only

• generalizations to gray-scale images exist

• gray-scale dilation as max operation:

(f ⊕ k)(x) = maxf(x− z) + k(z), z ∈ K, x− z ∈ F

• gray-scale erosion as min operation:

(f k)(x) = minf(x+ z)− k(z), z ∈ K

• Point set A ⊆ En, n = 3

• A’s support F = x ∈ En−1 for some y ∈ E , (x, y) ∈ A

• A’s top surface T [A](x) = maxy, (x, y) ∈ A

• f(x)’s umbra U [f ] = (x, y) ∈ F × E , y ≤ f(x)

• gray-scale dilation f ⊕ k = TU [f ]⊕ U [k]

• gray-scale erosion f k = TU [f ] U [k]

109

Page 111: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #6, 23.2.2015

Learning goals: After this lecture the student should be able to

• understand skeletons and the maximal ball

• understand the quench and distance functions and ultimate erosion

• know the basics of geodesic transformations, reconstruction and gran-ulometry as morphological operations

• be familiar with the basic concepts of texture in image analysis

• understand autocorrelation based texture features

110

Page 112: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.7 Skeletons and maximal ball

Homotopic transforms (13.5.1/11.5.1)

• don’t change topological relations

• homotopic tree, that shows neighborhood relations, remains the same

Skeletons (13.5.2/11.5.2)

• medial axis transform

• grassfire metaphora

• formation with maximal balls

– the result can be non-homotopic

• homotopic skeleton can be extracted with morphological thinnings

• easy to understand in Euclidean world – discrete world is difficult

111

Page 113: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Maximal ball B(p, r)

• unit ball B or 1B contains the origin and points in distance 1 from it

• nB is B’s (n− 1)th successive dilation with itself

nB = B ⊕ · · · ⊕B︸ ︷︷ ︸n times

• ball B(p, r), shape B located in p with radius r, is maximal if

– B(p, r) ⊆ X and

– there cannot be a larger ball B′ so that B(p, r) ⊂ B′ ⊆ X

– for all B′ it holds B ⊆ B′ ⊆ X =⇒ B′ = B

• skeleton by maximal balls:

S(X) = p ∈ X : ∃r ≥ 0, B(p, r) is X’s maximal ball

S(X) = ∪∞n=0

((X nB) \ (X nB) B

)112

Page 114: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.8 Hit-or-miss ⊗, thinning , thickening (13.3.3,13.5.3)

• composite structuring element is an ordered pair B = (B1, B2)

• X ⊗B = x : B1 ⊂ X and B2 ⊂ XC

• X ⊗B = (X B1) ∩ (XC B2) = (X B1) \ (X ⊕ B2)

• X B = X \ (X ⊗B)

• X B = X ∪ (X ⊗B)

• thinning and thickening are dual transformations

(X B)C = XC B∗, B∗ = (B2, B1)

• sequential thinnings / thickenings with Golay alphabets

X B(i) = ((((X B(1))B(2)) · · · B(i)) · · · )X B(i) = ((((X B(1))B(2)) · · · B(i)) · · · )

• homotopic skeleton is ready when thinning is idempotentic

113

Page 115: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.9 Golay alphabets• thinning with L element (4-neighbors)

L(1) =

0 0 0∗ 1 ∗1 1 1

, L(2) =

∗ 0 ∗1 1 01 1 ∗

, · · ·

• thinning with E element (4-neighbors)

E(1) =

∗ ∗ ∗0 1 0∗ 0 ∗

, E(2) =

∗ 0 ∗0 1 ∗∗ 0 ∗

, · · ·

• thinning with M element (4-neighbors)

M(1) =

∗ 0 ∗∗ 1 ∗1 1 1

, M(2) =

∗ 0 ∗1 1 01 1 ∗

, · · ·

• thinning with D and thickening with Dt element (4-neighbors)

D(1) =

∗ 0 ∗0 1 1∗ 0 ∗

, D(2) =

0 0 ∗0 1 1∗ 1 1

, · · ·

• thickening with C element (4-neighbors)

C(1) =

1 1 ∗1 0 ∗∗ ∗ ∗

, C(2) =

∗ 1 1∗ 0 1∗ ∗ ∗

, · · ·

114

Page 116: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.10 Quench function and ultimate erosion (13.5.4/11.5.4)

• quench function qX(p):

X = ∪p∈S(X)(p+ qX(p)B)

• qX(p)’s regional maxima points = ultimate erosion Ult(X)

• ultimate erosion can be used to extract markers in objects

• original object can be reconstructed from markers

• market set B ⊆ A =⇒ reconstruction ρA(B)

• ultimate erosion can be expressed as

Ult(X) = ∪n∈N((X nB) \ ρXnB(X (n+ 1)B)

)

115

Page 117: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.11 Ultimate erosion and distance functions (11.5.5/13.5.5)

• distance function distX(p) is p’s distance from XC :

∀p ∈ X distX(p) = minn ∈ N , p /∈ (X nB)

• ultimate erosion is the set of distX(p)’s regional maxima points

• maximal ball skeleton is the set of distX(p)’s local maxima points

• each connected component Xi of set X has an influence zone

Z(Xi) = p ∈ Z2, ∀i 6= j, d(p,Xi) ≤ d(p,Xj)

• skeleton by influence zones (SKIZ) is the set of boundary pixels of theinfluence zones Z(Xi)

116

Page 118: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.12 Geodesic transformations (13.5.6/11.5.6)

• geodesic transformations are restricted inside subset X

• interpixel distances dX(x, y) measured along paths inside X

• a geodesic ball located at p with radius n inside X

BX(p, n) = p′ ∈ X, dX(p, p′) ≤ n

• Y ’s geodesic dilation δ(n)X with n-radius ball inside X

δ(n)X (Y ) = ∪p∈YBX(p, n) = p′ ∈ X, ∃p ∈ Y, dX(p, p′) ≤ n

• corresponding geodesic erosion ε(n)X

ε(n)X (Y ) = p ∈ Y, BX(p, n) ⊆ Y = p ∈ Y, ∀p′ ∈ X\Y, dX(p, p′) > n

• result of a geodesic operation is always a subset of X

• geodesic dilation with unit ball δ(1)X (Y ) = (Y ⊕B) ∩X

• geodesic dilation with n-radius ball δ(n)X = δ

(1)X (δ

(1)X (δ

(1)X (· · · )))︸ ︷︷ ︸

n times

117

Page 119: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.13 Morphological reconstruction (13.5.7/11.5.7)

• geodesic dilations can be used to implement reconstruction

• start with marker set Y inside object X

• dilation with geodesic ball grows Y while restricting it inside X

• only components of X that contain a marker are reconstructed

• many markers inside one component =⇒ geodesic SKIZ inside thecomponent

• reconstruction can be generalized for gray-scale images

• gray-scale image is interepreted as a stack binary images obtained bythresholding

118

Page 120: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.14 Granulometry (13.6/11.6)

• granulometry measures the sizes of objects or particles

• a size histogram is created that describes distribution of particle sizes

• particle sizes resolved by openings/erosions with an increasing ball

• ψn(X) is X after opening with n-sized ball

ψ0(X) ⊇ ψ1(X) ⊇ ψ2(X) ⊇ · · ·

• pattern spectrum or granulometric curve PSΨ(X)(n):

PSΨ(X)(n) = m[ψn(X)]−m[ψn−1(X)] ∀n > 0

• granulometric function GΨ(X)(x):

x ∈ X, GΨ(X)(x) = minn > 0, x /∈ ψn(X)

PSΨ(X)(n) = cardp, GΨ(X)(p) = n

119

Page 121: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

9.15 Morphological segmentation, watersheds (13.7/11.7)

• morphological segmentation is suitable for binary particles

• markers are first extracted inside the particles

• watershed method is then used for reconstructing the particles

• areas between the watersheds are “basins of increasing water”

• geodesic influence zones and SKIZ can produce incorrect segments

• watershed segmentation may produce a better result

120

Page 122: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10. Texture

Some examples of textured real-world surface images:

121

Page 123: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.1 Properties of natural textures

• surface shape / surface structure / surface image

• physical origin is very often 3-dimensional

• texture analysis uses 2-dimensional images

• effect of lighting and light directions?

• direction/orientation of the texture or is it unoriented?

• texture primitives / texture elements, texels

• spatial relations between primitives, dependency on the scale

• tone and structure

• fine / coarse texture, weak / strong texture

• can there exist constant texture in a natural image?

• statistical and structural descriptions, hybrid descriptions

• human eye’s ability to recognize textures, textons

122

Page 124: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #7, 27.2.2015

Learning goals: After this lecture the student should be able to

• understand co-occurrence based texture features

• explain how edge frequencies are related to texture

• use Laws’ texture energy measures

• understand basic syntactic texture description methods

• discuss hierarchical and hybrid texture description techniques

123

Page 125: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.2 Statistical texture descriptions (15.1.1,14.1.1)

• formation of a statistical feature vector

• one feature vector can describe a large area or a single pixel

• use of pixel-wise feature vectors:

– comparison between neighboring pixels, clustering

– averaging inside areas of nearly constant values, segmentation

• generally statistics of second order

• methods based on spatial frequencies

• autocorrelation function

Cff (p, q) =MN

∑M−pi=1

∑N−qj=1 f(i, j)f(i+ p, j + q)

(M − p)(N − q)∑M

i=1

∑Nj=1 f

2(i, j)

Cff (r) = Cff (p, q), r2 = p2 + q2

124

Page 126: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• optical Fourier transform

• discrete Fourier or Hadamard transform

• partitioning of Fourier spectrum for feature calculation

v

u

v

u

• for example, 28 spatial frequency-domain features

125

Page 127: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.3 Co-occurrence matrices (15.1.2,14.1.2)

• 2-dimensional generalizations of 1-dimensional histograms

• second order statistics of two nearby pixel values

• parameters: distance d, angle φ

• symmetric / asymmetric definition

P0,d(a, b) =|[(k, l), (m,n)] ∈ D :

k −m = 0, |l − n| = d, f(k, l) = a, f(m,n) = b|P45,d(a, b) =|[(k, l), (m,n)] ∈ D :

(k −m = d, l − n = −d) ∨ (k −m = −d, l − n = d),

f(k, l) = a, f(m,n) = b|P90,d(a, b) =|[(k, l), (m,n)] ∈ D :

|k −m| = d, l − n = 0, f(k, l) = a, f(m,n) = b|P135,d(a, b) =|[(k, l), (m,n)] ∈ D :

(k −m = d, l − n = d) ∨ (k −m = −d, l − n = −d),

f(k, l) = a, f(m,n) = b|

126

Page 128: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.4 Co-occurrence matrices – an example

Gray-scale image, 4 intensity levels:

0 0 1 10 0 1 10 2 2 22 2 3 3

Co-occurrence matrices:

P0,1 =

4 2 1 02 4 0 01 0 6 10 0 1 2

P135,1 =

2 1 3 01 2 1 03 1 0 20 0 2 0

127

Page 129: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.5 Haralick features from co-occurrence matrix

• energy∑

a,b P2φ,d(a, b)

• entropy∑

a,b Pφ,d(a, b) logPφ,d(a, b)

• maximum probability maxa,b Pφ,d(a, b)

• contrast∑

a,b |a− b|κP λφ,d(a, b)

• inverse difference moment∑

a,b;a6=bPλφ,d(a,b)

|a−b|κ

• correlation∑a,b[abPφ,d(a,b)]−µxµy

σxσy

128

Page 130: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.6 Edge frequency (15.1.3,14.1.3)

• average gradient magnitude can be calculated with varying scale d:

g(d) =|f(i, j)− f(i+ d, j)|+ |f(i, j)− f(i− d, j)|+|f(i, j)− f(i, j + d)|+ |f(i, j)− f(i, j − d)|

• compare with autocorrelation function: minima maxima

• first and second order edge statistics can be characterized:

– coarseness: finer texture ∼ higher number of edge pixels

– contrast: higher contrast ∼ stronger edges

– randomness: entropy of edge magnitude histogram

– directivity: histogram of edge directions

129

Page 131: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• more edge statistic features

– linearity: sequential edge pairs with same direction

– periodicity: parallel edge pairs with same direction

– size: parallel edge pairs with opposite directions

a b

cd

130

Page 132: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.7 Run length statistics (15.1.4,14.1.4)

• B(a, r): number of runs/primitives of length r and value a in M ×Nimage

• total number of runs K =∑L

a=1

∑Nrr=1B(a, r)

• short primitives emphasis 1K

∑La=1

∑Nrr=1

B(a,r)r2

• long primitives emphasis 1K

∑La=1

∑Nrr=1B(a, r)r2

• gray-level uniformity 1K

∑La=1(

∑Nrr=1B(a, r))2

• primitive length uniformity 1K

∑Nrr=1(

∑La=1B(a, r))2

• primitive percentage K∑La=1

∑Nrr=1 rB(a,r)

= KMN

131

Page 133: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.8 Laws’ texture energy measures (15.1.5,14.1.5)

• Laws’ texture energy masks can measure

– grayvalues

– edges

– spots

– waves

• three one-dimensional masks:

L3 = (1, 2, 1), E3 = (−1, 0, 1), S3 = (−1, 2,−1)

• their one-dimensional convolutions:

L3 ∗ L3 = L5 = (1, 4, 6, 4, 1)

L3 ∗ E3 = E5 = (−1,−2, 0, 2, 1)

L3 ∗ S3 = S5 = (−1, 0, 2, 0,−1)

S3 ∗ S3 = R5 = (1,−4, 6,−4, 1)

E3 ∗ S3 = W5 = (1,−2, 0, 2,−1)

132

Page 134: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• two-dimensional outer-products of the one-dimensional masks, eg.:

LT5 × S5 =

−1 0 2 0 −1−4 0 8 0 −4−6 0 12 0 −6−4 0 8 0 −4−1 0 2 0 −1

• energy (squared sum) of the response is calculated after convolution

• 25 masks can be used to create 25-dimensional feature vector

133

Page 135: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.9 Other statistical methods (15.1.6–8,14.1.6–7)

• fractal texture description

• wavelets, Gabor transforms, wavelet energy signatures

• morphological methods: erosion, opening

• texture transform f(x, y) −→ g(x, y)

• autoregression texture models

• peak and valley method

• Markov random fields

134

Page 136: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.10 Syntactic texture descriptions (15.2.1,14.2.1)

• description of a surface with a set of texture primitives and rules

• real-world textures are non-deterministic

• shape chain grammars

– texture synthesis

– terminal symbols Vt

– non-terminal symbols Vn

– start symbol S

– set of rules R

=

=

=

R :V

V

S

n

t

135

Page 137: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.11 Graph grammars (15.2.2,14.2.2)

• comparison between 2D texture primitive graphs

• recognition of a set of visual primitives

• thresholding of distances between texture primitives

• formation of a graph describing the texture

• comparison between graph of input image and stored grammar models

1) 1D chains of the graph compared with the grammar

2) stochastic grammar of graphs

3) direct graph comparison

136

Page 138: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.12 Primitive grouping and hierarchical textures (15/14.2.3)

• many textures are in fact hierarchical

• can be studied in different scales

• bottom-up texture primitive grouping

• detection of homogeneous texture regions

137

Page 139: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.13 Hybrid texture description methods (15.3,14.3)

• combinations of statistical and syntactic approaches

• weak textures:

– division of the image into homogeneous regions

– statistical analysis of region shapes and sizes

• strong textures:

– spatial relations between texture primitives

– primitive sizes one pixel or larger

• hierarchical multi-level description of textures

138

Page 140: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

10.14 Application areas for texture analysis (15.4,14.4)

• remote sensing data:

– yield of crops and forests

– localization of diseased forests

– vegetation type classification

– land cover typification

– recognition of cloud types

• X-ray diagnostics: lung diseases, etc.

• industrial quality inspection, eg. in paper mills

139

Page 141: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #8, 2.3.2015

Learning goals: After this lecture the student should be able to

• understand the role and importance of segmentation in computer vision

• apply thresholding-based segmentation methods

• understand the basics of edge-based segmentation

• implement border tracing algorithms for binary images

• understand the concept of extended boundary

• analyze the difficulties of border detection in gray-scale images

• use the A-algorithm

140

Page 142: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

11. Segmentation

• splitting image into semantically meaningful regions

• complete segmentation

– disjoint regions correspond uniquely with objects in the image

– information from higher-level processing stages

• partial segmentation

– similarity between pixels and regions, homogeneity

• segmentation methods

– thresholding: global knowledge concerning the whole image

– edge-based segmentation

– region-based segmentation

– template matching

141

Page 143: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

11.1 Thresholding methods in segmentation (6.1/5.1)

• complete segmentation in S regions R1, . . . , RS:

R =S⋃i=1

Ri and Ri ∩Rj = ∅, ∀i 6= j

• selecting of a global threshold T

• values larger / smaller than T are background / object

• difficult to find an efficient global solution

• ⇒ segmentation in partial images

• background or object can also be a range of values

• creation of an edge image with a narrow range of values

• many simultaneous value ranges

142

Page 144: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Thresholding can be based on

• grayvalues

• gradient

• texture

• motion

• something else

Threshold selection methods (6.1.1/5.1.1)

• histogram analysis and filtering, possibly in many scales

• is the total area of the objects known?

• uni-, bi- or multi-modal histogram?

• local maxima, minimum distance between them

• histogram extracted from small gradient pixels only

• uni-modal histogram from large gradient pixels

143

Page 145: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Optimal thresholding (6.1.2/5.1.2)

• model of the distribution needed

• fitting of normal distributions in the histogram

• iterative selection of the parameters

• initial guess for the background from image corners

Segmentation of multi-spectral images (6.1.3/5.1.3)

• each channel segmented separately

– channel-wise histogram peaks with lower and upper limits

– union of boundaries from all channels

– iterative division of the created regions

• multi-dimensional histograms

• classification of n-dimensional pixels

144

Page 146: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Thresholding in hierarchical data structures (6.1.4/-)

• computational efficiency

• removal of noise

• lowering in data pyramid onto higher-resolution level

• detection of important pixels on all levels in 3×3-size

• threshold is fixed on higher levels

• same or updated threshold is used on lower levels

145

Page 147: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

11.2 Edge-based segmentation (6.2/5.2)

• edge-based methods are important historically and in practice

• edges need to be developed to boundaries and regions

• importance of a priori knowledge

• comparison between detected edges and model predictions

• methods and topics:

– thresholding of gradient magnitude or something

– non-maximal suppression

– hysteresis

– relaxation

– border tracing

– use of location information

– region construction from borders

146

Page 148: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Edge relaxation (6.2.2/5.2.2)

• comparison of edge information between pixels

• iteration until coherence between neighboring pixels reached

• crack edgesi

a d

g

f

b

c

e

h

• types of crack edges

0-0 1-1 2-0 3-3

eeee

• each crack edge has a confidence value 0 ≤ c(k)(e) ≤ 1

147

Page 149: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• classification of crack edge types for each end (a, b, c and d, g, f)

a ≥ b ≥ c, m = max(a, 0.1)

type(i) = maxk=0,1,2,3

type(k)

type(0) = (m− a)(m− b)(m− c)type(1) = a(m− b)(m− c)type(2) = ab(m− c)type(3) = abc

• modification of each crack edge confidence value

– 0–0, 0–2, 0–3, c(k)(e) decreases

– 0–1, c(k)(e) increases little

– 1–1, c(k)(e) increases very much

– 1–2, 1–3, c(k)(e) increases quite much

– 2–2, 2–3, 3–3, no change

• iteration should not be too long, possibly non-linear roundings towards0 and 1

148

Page 150: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Border tracing (6.2.3/5.2.3)

• aim is to find a route around the object

• simple for binary images: ≈ change of presentation to chain code

• difficult for gray-scale images

• inner/outer border/boundary have different lengths

• 8- and 4-neighbors mixed on the opposite sides of the boundary

• starting from the top-left corner of the object

43 2 1

07

65

1

2

3

0

• outer boundary from the tested non-object pixels

• one pixel can belong to the boundary more than once: length?

149

Page 151: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Extended boundary

• neighboring regions don’t have common inner/outer boundary pixels

• solution from extended boundary

• in top and left follows inner boundary

• in bottom and right follows outer boundary

• boundary length is intuitively correct

• border tracing algorithm with 12 basic 3×3-sized cases

150

Page 152: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Border detection in gray-scale images

• much more difficult than for binary images

• gradient or other edge image is created first

– current boundary direction is continued to locate more edge pixels

– gradient directions are compared between neighboring pixels

– gray-value tells whether we are inside or outside the object

• turns and weak edges cause difficulties

• closed boundary can remain undetected

• heuristic search

– starting from the strongest edges

– continuation in backward and forward directions

151

Page 153: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

11.3 Border detection as graph searching (6.2.4/5.2.4)

• “A-algorithm”

• some amount of a priori knowledge needed: starting and end pointsassumed known for detecting optimal path between them

• directed weighted graph

• list of all open nodes, each node listed at most once

• full path cost estimate f(ni) = g(ni) + h(ni)

• the lowest cost estimate f(ni) expanded to new nodes

• if forward direction is known, one tries to follow it

• straightening of the path and image with geometric warping?

152

Page 154: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Optimal and heuristic search

• generally g(ni) is the real cost upto that node

• estimate h(ni) can effect the speed of search

• if h(ni) = 0, optimal result is guaranteed

• if h(ni) > h(ni), some result is obtained fast

• if 0 < h(ni) < h(ni), optimal result, iff c(np, nq) ≥ h(np)− h(nq)

• if h(ni) = h(ni), optimal result with minimal computation

• cost function f() can contain

– strength of edges, inverse of gradient

– difference between gradient directions of succeeding nodes

– distance to a priori assumed location

– distance to end point

153

Page 155: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Node pruning for search speedup

• is the total cost estimate too high?

• is the average cost per path length too high?

• favoring of the largest minimum or smallest maximum along path?

• favoring of the smallest increase?

– breadth first → depth first

– a lower bound of the cost is obtained: h(ni)

Search for a closed boundary

• selecting one pixel as both starting and end node

• starting into opposite directions

• paths meet (hopefully) on the opposite side

154

Page 156: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Border detection as dynamic programming (6.2.5/5.2.5)

• principle of optimality: all subpaths of an optimal path are alsooptimal

• one can use a set of starting and end nodes

• entering direction and cumulated cost stored in each nodeA

B

C

D G

H

I

E

F

72

261

38

56

34

5

27

A

B

C

G

H

I

D(B,2)

F(B,1)

E(A,2)

A

B

C

G(E,5)

H(F,3)

D

E

FI(E,7)

E

F

DA

B

C

G

H

I

start end

A

B

C

GD

E

F

H

I

C(xm+1k ) = min

i(C(xmi ) + gm(i, k))

C(xm+1k ) = min

i=−1,0,1(C(xmk+i) + gm(i, k))

min(C(x1, x2, · · · , xM)) = mink=1,...,n

(C(xMk ))

155

Page 157: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #9, 9.3.2015

Learning goals: After this lecture the student should be able to

• describe border detection as a dynamic programming problem

• use Hough transform

• understand the general techniques of region-based segmentation

• able to use splitting and merging methods

156

Page 158: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

11.4 Hough transforms (6.2.6/5.2.6)

• search for shapes in parameter space

q

k

q = -kx + y

q’

k’

q = -kx + y 11

22y = kx + q

A=(x , y )

y

x

B=(x , y )

C

1 1

2 2

• arbitrary curve equation f(x, a) = 0, eg. x cos θ + y sin θ − s = 0

• rather few than many parameters, only object’s size and shift

• limiting the search with a priori knowledge, eg. edge direction

r2 = (x1 − a)2 + (x2 − b)2

a = x1 −R cos(ψ(x))

b = x2 −R sin(ψ(x))

ψ(x) ∈ [φ(x)−∆φ, φ(x) + ∆φ]

157

Page 159: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Generalized Hough transform

• non-parametric shape relative to a reference point (Fig 6.38/5.39)

• edge direction determines possible locations of the reference point

• finds the size S and rotation τ of an object of the known shape

• gradient magnitude ∆A, R-table:

φ1 (r11, α

11), (r2

1, α21), . . . , (rn1

1 , αn11 )

φ2 (r12, α

12), (r2

2, α22), . . . , (rn2

2 , αn22 )

· · · · · ·φk (r1

k, α1k), (r

2k, α

2k), . . . , (r

nkk , αnkk )

xR1 = x1 + r(φ)S cos(α(φ) + τ)

xR2 = x2 + r(φ)S sin(α(φ) + τ)

A(xR, τ) = A(xR, τ) + ∆A

158

Page 160: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Fuzzy Hough transform (6.2.6/?)

• first the generalized Hough transform as earlier

• reference point of the fuzzy model is solved first

• exact locations of the borders are specified iteratively

• previous video frame produces the initial model for the next frame

Benefits of Hough transforms

• applicable also for partially occluded objects

• many objects can be detected concurrently

• tolerant for noise

• parallel implementation is possible

159

Page 161: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Border detection using location information (6.2.7/5.2.7)

• a priori knowledge about location, possibly from lower resolution

• search in the neighborhood of the assumed border

• sometimes starting and ending points are known

• recursive splitting in two

• “divide and conquer”

A

B12

34

160

Page 162: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Region construction from borders (6.2.8/5.2.8)

• comparing borders with threshold contours

– multiple thresholds

– good matchings are sought for

– partial edges are expanded to full borders

• superslice method

– searching for the opposite border

– orthogonal direction to the edge direction

– up to maximum distance M

– edge directions need to fulfill

π

2< |(φ(x)− φ(y)) mod (2π)| < 3π

2

– intermediate points are marked

– filtering with 3×3 mask

161

Page 163: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

11.5 Region-based segmentation (6.3/5.3)

• important to define homogeneity criterion for a region

• homogeneity inside regions, heterogeneity between them

Region merging (6.3.1/5.3.1)

• initial state: separate pixels, each in its own segment

• combining first neighboring pixels with same gray-scale values

• conditional merging of adjacent regions

162

Page 164: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Region merging as state space search

• super grid, ×=image pixel, =crack edge:

• • • • • • • • × × × × × × × ×• • • • • • • • × × × × × × × ×• • • • • • • • × × × × × × × ×

• thresholding of significant edges

vij = 0, if |f(xi)− f(xj)| < T1

1, otherwise

• counting the number W of weak edges on the separating boundary

• removing (melting) separating boundary if

W

min(li, lj)≥ T2 or

W

l≥ T3 or W ≥ T4

163

Page 165: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Region splitting (6.3.2/5.3.2)

• starting from the whole image in one segment

• segments are split into smaller ones according to some criteria

• criteria can include eg. histogram peaks and existing prominent edges

Splitting and merging (6.3.3/5.3.3)

• quadtrees

• how merging of nodes 03, 1, 30 and 310 is implemented in Fig 6.46/5.47?

• use of overlapping trees

– each child linked to the most probable parent

– content of the parents recalculated after reassingment of children

164

Page 166: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Single-pass split and merge

• top-to-down left-to-right scan of the image plane

• 12 templates of size 2×2 pixels

• criterion can be eg. mean of 4 pixel variance

• each pixel is given a segment label from a neighbor, or a new one

• possible contradictions are solved online or afterwards

• segments are merged if they are homogeneous

H(R1 ∪R2) = TRUE

|m1 −m2| < T

• sensitive to the order of operation, ie. scanning pattern

165

Page 167: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #10, 16.3.2015

Learning goals: After this lecture the student should be able to

• understand and implement template matching

• characterize different shape description techniques

• implement some boundary-based shape descriptors

• explain the principle of Fourier descriptors

• understand segment-based boundary descriptions

166

Page 168: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Watershed segmentation (6.3.4/5.3.4)

• analogical with geographical watersheds and water basins

• edge/border areas assumed to have larger values than inner parts

• “water” is allowed to rise, ie. the used threshold is increased

• pixels are merged in the basin areas

• too low watersheds are raised with “flood dams”

Region growing post-processing (6.3.5/5.3.5)

• bottom-up segmentation results are seldom optimal as such

• many different heuristic methods can be used

• comparison between output of region growing and detected edges

167

Page 169: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

11.6 Segmentation from template matching (6.4/5.4)

• correlating with a partial image

• different matching criteria:

C1(u, v) =1

max(i,j)∈V |f(i+ u, j + v)− h(i, j)|

C2(u, v) =1∑

(i,j)∈V |f(i+ u, j + v)− h(i, j)|

C3(u, v) =1∑

(i,j)∈V (f(i+ u, j + v)− h(i, j))2

• correlating in the Fourier plane?

• order of matching is important: efficient termination of summation?

• processing on different resolutions

• more precise search around points that match well

168

Page 170: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

12. Shape description

• 3D or 2D shape described

• description for (qualitative) recognition / (quantitative) analysis

• characterizations of the methods

– input representation form: boundary / area

– object reconstruction ability

– incomplete shape description ability

– mathematical / heuristic techniques

– statistical / syntactic descriptions

– invariances to shift, rotation, scaling and resolution changes

169

Page 171: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

12.1 Methods and stages in image analysis (8/6)

170

Page 172: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

12.2 Region identification from pixel labels (8.1/6.1)

• two-pass algorithm

– if pixel label exists above or left, it is used

– if label does not exist, new one is assigned

– if above and left have different labels, regions are marked forcombination

– second pass combines regions that have more than one label

• can be formed directly from run-lenght encoding

• can be formed from quadtree representation

171

Page 173: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

12.3 Boundary-based description (8.2/6.2)

• coordinates for boundary representation: xy, rφ tai nθ

x

yr

φ

n

θ

• 4/8 chain codes, difference code, what is the starting point?

• geometric representations

– boundary length

– direction histogram

– curvature ∼ number of turns

172

Page 174: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

– bending energy BE = 1L

∑Lk=1 c

2(k)

– signature, normal distance to opposite border point

– choird distribution

∆ x

y

173

Page 175: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Fourier descriptors (8.2.3/6.2.3)

• Fourier transform of the boundary coordinates

z(t) =∑n

Tneint t = 2πs/L

Tn =1

L

∫ L

0z(s)e−i(2π/L)ns ds

• discrete case

an =1

L− 1

L−1∑m=1

xme−i(2π/(L−1))nm

bn =1

L− 1

L−1∑m=1

yme−i(2π/(L−1))nm

• rotation invariance rn = (|an|2 + |bn|2)1/2

• scale invariance wn = rn/r1

• tangent coordinates

174

Page 176: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Boundary description with segment sequences (8.2.4/6.2.4)

• polygonal representation by split&merge

• tolerance interval method

x1

x1x1

x

x

2

3

e

e

’’

175

Page 177: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• recursive boundary splitting

• division in constant curvature pieces

dd

bbbb

b

b b

aa

aa

c

bc

a

b

d

c

• scale-space methods

• curvature primal sketch

176

Page 178: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #11, 23.3.2015

Learning goals: After this lecture the student should be able to

• use B-splines for boundary description

• explain what 3D shape invariants are

• use some scalar region-based descriptors

• calculate moments of binary shapes

• understand region decomposition with convex hull, skeletons and shapeprimitives

• analyze the difficulties of region neighbor definitions

• understand the general problem setting of object recognition

• name some principles of knowledge representation

177

Page 179: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

B-spline representation (8.2.5/6.2.5)

• piecewise polynomial curves

x(s) =

n∑i=0

viBi(s)

n=1 n=2 n=3 n=3

• most often 3rd degree polynomials

Bi+1(s)

B i-1(s)

(s)Bi

Bi+2(s)

i i+1s

4/6

1/6

0

C0(t) =t3

6

C1(t) =−3t3 + 3t2 + 3t+ 1

6

C2(t) =3t3 − 6t2 + 4

6

C3(t) =−t3 + 3t2 − 3t+ 1

6

178

Page 180: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Other contour-based shape descriptions (8.2.6/6.2.6)

• Hough transforms

• moments

• fractal descriptions

• morphological methods

• geometrical correlation function

• shape recognition with neural networks

179

Page 181: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

3D shape invariants (8.2.7/6.2.7)

• 3D descriptions that are invariant to changes in projection

• for example: cross ratio of four collinear points I = (A−C)(B−D)(A−D)(B−C)

AB

CD

AB

CD

’’

’’

• active and timely topic of research

180

Page 182: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

12.4 Region-based description (8.3/6.3)

• description of the region as a whole or in parts

• skeletons, division of regions

• characteristics of the descriptions:

– shift and rotation invariant descriptions

– invariant to small changes in region shapes

– intuitive techniques

– many descriptions fit mostly for structural/syntactic recognition

181

Page 183: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Simple scalar descriptors (8.3.1/6.3.1)

• area can be calculated from chain code coordinates:

A =1

2|n−1∑k=0

(ikjk+1 − ik+1jk) |

• Euler’s number (Genus, Euler-Poincare) ν = S −N

• horizontal and vertical projections, height and width from them

• eccentricity: ratio between the maximum dimension and its perpendic-ular dimension

• elongatedness: A/(2d)2

• rectangularity: maximum of the ratio of the area and surroundingrectangle

• direction can be calculated from moments: θ = 12

tan−1 2µ11µ20−µ02

• compactness: l2/A

182

Page 184: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Moments in shape description (8.3.2/6.3.2)

• moments mpq =∞∑

i=−∞

∞∑j=−∞

ipjqf(i, j)

• central moments µpq =

∞∑i=−∞

∞∑j=−∞

(i− m10

m00)p(j − m01

m00)qf(i, j)

• scaled central moments ηpq =µ′pq

µ′00p+q2 +1

µ′pq =µpq

αp+q+2

• normalized un-scaled central moments ϑpq =µpq

µ00p+q2 +1

• Hu’s moment invariants

ϕ1 = ϑ20 + ϑ02

ϕ2 = (ϑ20 − ϑ02)2 + 4ϑ211

ϕ3 = (ϑ30 − 3ϑ12)2 + (3ϑ21 − ϑ03)2

ϕ4 = (ϑ30 + ϑ12)2 + (ϑ21 + ϑ03)2

• boundary moments from the distance to center of mass: mr = 1N

∑Ni=1 z(i)

r

183

Page 185: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Convex hull of region (8.3.3/6.3.3)

Region concavity tree

S3S4

S5

S

S51

S52

S2

S1

S12

S11 S

SS SS S1 3 4 52

S 11 S 12 S 51 S 52

184

Page 186: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Region representation with a skeleton (8.3.4/6.3.4)

• skeleton, medial axis transform, thinning

• skeleton extraction by thinning

– Hi(R): inner boundary of region R

– Ho(R): outer boundary of region R

– S(R): ⊂ R, 8-neighbors ∈ Hi(R) ∪RC

Rnew = S(Rold) ∪ [Rold −Hi(Rold)] ∪ [H0(S(Rold)) ∩Rold]

X XXX

X X X**

**

185

Page 187: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Skeleton construction from medial axis

• medial axis: same minimum distance to at least two boundaries

• distance stored in the skeleton pixel

Region graph construction

• pixel types: end points, node points, normal points

• end and node points −→ graph nodes

• normal points −→ graph arcs

186

Page 188: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Region decomposition into shape primitives (8.3.5/6.3.5)

• region is segmented in primary convex sub-regions or kernels

• mutual relations of the sub-regions are described with a graph

• each graph node contains the following information:

– type of the node (primary sub-regions or kernel)

– number of vertices

– area

– main axis direction

– center of gravity

187

Page 189: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Region neighborhood graph (8.3.6/6.3.6)

• representation of the relations of sub-regions of a region (or image)

• sub-regions don’t need to be adjacent

• expressions for spatial relations:

– to the left/right of, above/below

– close to, between

• examples of definitions of “A is to the left of B”

– all Ai are to the left of all Bi

– at least one Ai is to the left of some Bi

– A’s center of gravity is to the left of that of B

– previous AND A’s rightmost pixel is to the left of B’s rightmostpixel

B B B B B B B BBBB BBBB BBBB BBB

B

A

BBBBBBBB B BBB B BBB B BBB B B

B

AB B

BBB B A

AAA A A A A A A A A A A

BBBBB

B B B BB

188

Page 190: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

13. Object recognition

• many machine vision tasks involve object recognition

• structural versus statistical versus soft computing methods

13.1 Knowledge representation (9.1/7.1)

• simple methods for complex data

• descriptions, features

• grammars, languages

• predicate logic

• production rules

• fuzzy logic

• semantic nets

• frames, scripts

inside

insideinside

inside

circularcircular

circular

horizontal

below

left-of

verticalleft-of

left-of

below below

189

Page 191: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

LECTURE #12, 30.3.2015

Learning goals: After this lecture the student should be able to

• understand the general problem setting of image understanding

• be familiar with different control strategies in image understanding

• use active contour models, point distribution models and principal com-ponent analysis

• use statistical pattern recognition methods in image understanding

• explain discrete and probabilistic scene labeling

• understand semantic image segmentation

• use simple differential motion analysis techniques

• understand the basic principles of optical flow

• name methods used for finding interest point correspondence

190

Page 192: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

13.2 Statistical pattern recognition (9.2/7.2)

• classification of quantitative object descriptions

• object classes, classification, classifiers

• classification function, discrimination function

• pattern, pattern space, pattern vector

• feature, feature space, feature vector

• (linear) separability, clustering

• minimum distance principle

• error criterion, optimal Bayes classifier

• training set, validation set, testing set

• probability density estimation methods

• direct optimization / regression methods

• support vector machines

• clustering: K-means, ISODATA

191

Page 193: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Dichotomies of statistical pattern recognition

Form of Density Form of Density

Function Unknown Function Unknown

Training Samples

Number of

Training Samples

Labeled

Training Samples

Unlabelled

FiniteInfinite

Bayes

Function Known

Form of Density

Function Known

Form of Density

No. of Pattern

Classes Unknown

No. of Pattern

Classes Known

Cluster AnalysisMixture

Resolving

k-NN

Rules

Density

Estimation

Plug-in

Rules

“Optimal”

Rules

Decision Rule

(Jain&Mao 1994)

192

Page 194: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

13.3 Neural network classifiers (9.3/7.3)

• supervised / unsupervised learning

• parametric / semi-parametric / non-parametric methods

• prototype-based classifiers, support vector machines

• perceptron, non-linear feed-forward networks

• error back-propagation

• competitive learning, self-organizing map

• recognition as an optimization task, Hopfield net

• hybrid classifiers

193

Page 195: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

13.4 Syntactic pattern recognition (9.4/7.4)

• classification of qualitative object descriptions

• primitives and the relational structure between them

• rules of thumb concerning primitives

– small number, but enough for appropriate object representation

– easily segmentable and recognizable

– should correspond with significant elements of the object

• main groups of grammars:

– general, context-sensitive, context-free, regular

• non-deterministic, stochastic, fuzzy

• top-down / bottom-up matching

• pruning of the search tree, backtracking

• syntactic classifier learning, grammar inference: enumeration, induc-tion

194

Page 196: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

13.5 Recognition as graph matching (9.5/7.5)

• exact matching of graphs, isomorphism

– graph–graph

– graph–sub-graph

– sub-graph–sub-graph

– graph partitioning

• non-exact matching

– similarity measures between two graphs

– Levenshtein distance between strings

– deletions, insertions and substitutions

195

Page 197: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• spring / energy minimization models

196

Page 198: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

13.6 Optimization techniques (9.6/7.6)

• parameters used for object description need to be optimized

• difficult due to typically non-convex objective functions

f : D → R fmin(x) = minx∈D

f(x) fmax(x) = maxx∈D

f(x)

• natural and real number parameters

• mutual dependencies between parameters

• iterative optimization methods

• high probability of stucking in local extrema points

• genetic algorithms

• simulated annealing

197

Page 199: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

14. Image understanding

• image interpretation, scene analysis

• even humans need practicing

• the highest and most difficult stage of computer vision

• interaction between lower and higher level processing stages needed

• top-down hypotheses: formulation, testing, correction

Topics:

• different control strategies

• active contour models

• pattern recognition in image understanding

• scene labeling and constraint propagation

• semantic segmentation and understanding

198

Page 200: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

14.1 Control strategies (10.1/8.1)

• controlling the interaction between processing stages

• parallel / serial execution

• bottom-up / top-down in data and abstraction hierarchy

• non-hierarchical blackboard/daemon control

• hybrid approaches

199

Page 201: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

System example: coronary angiograms (10.1.5/8.1.5)

1) ↓ interactive detection of the vessel centerline

2) ↑ image edge detection in high resolution

3) ↑ local edge direction estimation

4) ↓ cost matrix for pairwise edge directions

5) ↑ low-resolution image and cost matrix

6) ↓ searhing low-resolution symmetric border pairs

7) ↓ accurate border positions in high resolution

8) ↑ transform from the straigtened image to the original

9) ↓ diagnosis

200

Page 202: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

14.2 Active contour models aka snakes (7.2/8.2)

• minimization of the spline model’s energy, iterative search

• total energy E∗snake =∫ 1

0Esnake(v(s))ds

• internal energy Eint = α(s)|dvds|2 + β(s)|d2v

ds2|2

• image energy Eimage = wlineEline + wedgeEedge + wtermEterm

• line energy Eline = f(x, y)

• edge energy Eedge = −|∇f(x, y)|2

• termination energy Eterm = ∂ψ∂nR

=∂2g/∂n2

R

∂g/∂n

• boundary conditions Econ(v(s))

201

Page 203: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• stabilization

• snake stretching and fitting

• inflating balloon

202

Page 204: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

14.3 Point distribution models, PDMs (10.3/8.3)

• PDMs can be used for semi-parametric shape representation

• set of M similar training shapes

• N landmark points extracted from boundary of each training shape

• each boundary produces a 2N -dimensional point distribution vector

x = (x1, y1, x2, y2, . . . , xN , yN)T

• point distribution vector can be translated, scaled and rotated

Ts,θ,tx,ty(x) = s

(cos θ − sin θsin θ cos θ

)(xiyi

)+

(txty

)• point distribution vector x2 aligned with model x1 minimizing

mins,θ,tx,ty

E = ‖x1 − Ts,θ,tx,ty(x2)‖

203

Page 205: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

14.4 Principal component analysis, PCA (3.2.10/8.3)

• Hotelling / Karhunen-Loeve transform, KLT

• PCA can be used for fitting point distribution models

• dimensionality reduction for a high-dimensional data set

• eigenvectors of the data’s covariance matrix used in linear transform

• linear transform is as y = A(x−mx)

• in PCA, rows of the transform matrix A are eigenvectors ex,i of Cx

• according to the eigenequation Cxex,i = λx,iex,i

• Cx is x data set’s covariance matrix and mx is its mean

Cx = E(x−mx)(x−mx)T

• inverse transform is as x′ = ATy + mx

• squared reconstruction error E‖x′ − x‖2 is minimized by PCA

204

Page 206: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

14.5 Example: metacarpal bones, PCA+PDM (3.2.10/8.3)

205

Page 207: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

14.6 Pattern recognition in image understanding (10.5/8.4)

• formation of a statistical feature vector for each pixel

– SIFT, SURF, HoG features

• pixel matching / classification / clustering

• utilization of context information

– noise reduction, eg. by median filtering or histograms

– second classification of each pixel and its neighborhood

– merging of homogeneous regions before classification

– feature extraction from pixel neighborhoods

– combination of spectral and spatial information

image

image

labelslabels

image labels labels

206

Page 208: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

14.7 Scene labeling and constraint propagation (10.7/8.5)

• aiming at consistent interpretation of the image

• discrete / probabilistic labeling

• regions, attributes, relations

• regions Ri, i = 1, · · · , N , labels Ω = ω1, · · · , ωR

• moving from local constraints to image level

• relaxation in constraint propagation

• discrete relaxation

– attributes are discrete Boolean values: is / is not

– first all regions are given all labels

– impossible labels are removed one by one

207

Page 209: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Discrete relaxation: example (10.7.1/8.5.1)

1

2

3

5 64

WTDPB

WTPDB

WTPDB WTPDBWTPDB

WTPDB

B

W

D DT

P

a. window (W) is rectangular

b. table (T) is rectangular

c. drawer (D) is rectangular

d. phone (P) is above table

e. drawer is inside table

f. background (B) is adjacent to the border

208

Page 210: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Probabilistic relaxation (10.7.2/8.5.2)

• produces always some solution

• support for label ωk in region θi at iteration step s:

Qs(θi = ωk) =

N∑j=1

cijqsj (θi = ωk) ,

N∑j=1

cij = 1

=

N∑j=1

cij

R∑l=1

r(θi = ωk, θj = ωl)Ps(θj = ωl)

• linear relaxationP 0(θi = ωk) = P (θi = ωk | Xi)

P s+1(θi = ωk) = Qs(θi = ωk) ∀i, k

• non-linear relaxation

P s+1(θi = ωk) =1

KP s(θi = ωk)Q

s(θi = ωk)

K =

R∑l=1

P s(θi = ωl)Qs(θi = ωl)

209

Page 211: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Relaxation as optimization problem

Maximization F :

F =R∑k=1

N∑i=1

P (θi = ωk)N∑j=1

cij

R∑l=1

r(θi = ωk, θj = ωl)P (θj = ωl)

R∑k=1

P (θi = ωk) = 1 ∀i, P (θi = ωl) > 0 ∀i, k

Image interpretation as tree search (10.7.3/8.5.3)

• number of image regions = number of layers in search tree

• leaves of the tree correspond to different full image labelings

210

Page 212: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

14.8 Semantic image segmentation (10.8/8.6)

• region adjacency graph and its dual

12

3

45

34

5

21

54

3

21

• iterative updating of data structures

• semantic region growing

• merging of adjacent regions

• aiming at maximizing objective function F

• always the most probable interpretation is fixed

211

Page 213: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

15. Motion analysis

• a collection of diverse problem settings and algorithms

– detection of motion

– detection of a moving object

– extraction of 3D properties of the object

• assumptions concerning the object’s motion

– the maximal speed is known

– the maximal acceleration is small

– the motion is uniform / the object is rigid

– mutual correspondence between reference points

212

Page 214: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

15.1 Differential motion analysis methods (16.1/15.1)

• difference image

d(i, j) =

0 if |f1(i, j)− f2(i, j)| ≤ ε

1 otherwise

• object–background, object–another object, object–object, noise

• cumulative difference image

dcum(i, j) =n∑k=1

ak|f1(i, j)− fk(i, j)|

• static reference image and its composition from pieces

??

213

Page 215: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

15.2 Optical flow (16.2/15.2)

• it is assumed that

– each point’s illumination is constant

– neighboring points have similar grayvalues

• modeling f() by using Taylor’s series

f(x+ dx, y + dy, t+ dt) = f(x, y, t) + fxdx+ fydy + ftdt+O(∂2)

• locating matching image areas with different t

f(x+ dx, y + dy, t+ dt) = f(x, y, t) ⇒ −ft = fxx+ fyy

214

Page 216: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• we aim at solving the speed vector for each pixel

c = (x, y) = (u, v) ⇒ −ft = fxu+ fyv = c∇f

• smoothness conditions incorporated with Lagrange coefficient λ

E2(x, y) = (fxu+ fyv + ft)2 + λ(u2

x + u2y + v2

x + v2y)

• solution

u = u−fxP

D, v = v−fy

P

D, P = fxu+fyv, D = λ2 +f 2

x +f 2y

• relaxation with Gauss–Seidel iteration

uk(i, j) =uk−1(i, j)− fx(i, j)P (i, j)

D(i, j)

vk(i, j) =vk−1(i, j)− fy(i, j)P (i, j)

D(i, j)

215

Page 217: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

15.3 Optical flow in motion analysis (16.2.4/15.2.4)

• four elementary movement types

– translation at constant distance

– translation in depth: approaching / drawing away

– rotation with axis aligned with view axis

– rotation with axis perpendicular to view axis

216

Page 218: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• perspective image (x′, y′) = ( x0+utz0+wt

, y0+vtz0+wt

)

• focus of expansion, FOE: x′FOE = ( uw, vw

)

• D(t) = 2D distance from the FOE in the image plane

• speed in the image plane V (t) = dD/dt

D(t)

V (t)=z(t)

w(t)

• z distance can be solved for any pixel

z2(t) =z1(t)V1(t)D2(t)

D1(t)V2(t)

• for all points it holds that

x(t) =x′(t)w(t)D(t)

V (t), y(t) =

y′(t)w(t)D(t)

V (t), z(t) =

w(t)D(t)

V (t)

217

Page 219: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

15.4 Correspondence of interest points (16.3/15.3)

• interest points are detected and traced in video frames

• a priori knowledge about the maximal speed

• a sparse field of speed vectors is formed

• selection of the interest points– special pixels: edges, corners– eg. Moravec detector– or Zuniga–Haralick/Kitchen–Rosenfeld detector– Laplacian or Difference of Gaussians (LoG/DoG)– Determinant of Hessian (DoH)

• matching of the interest points– first non-1-1 matching xm and yn– each point pair has probability of match Pmn– consistency between the closest neighbor pairs, relaxation

• explicit markers used eg. in crash test dummies

• 2D dynamical programming can be applied in matching

218

Page 220: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

EXAM GUIDE

You may have in the exam a pen, paper and a calculator capable for trigono-metric and logarithmic calculations. No table nor formula books are needed.

Third Edition book (x.xx/· · · )

The importance of the Third Edition book’s chapters in Spring 2015’s teach-ing and exam:

• Chapters 1–6 belong to the course’s central content.

– sections 2.4–2.5, 3.2.5–3.2.9, 3.4.3, 5.3.11, 6.5 were not treated

• Chapter 7 presents material mostly beyond the course requirements.

– only sections 7.2–7.2.1 were treated

• Chapter 8 belongs to the course’s central content.

– section 8.2.7 was not treated in detail

219

Page 221: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• Chapter 9 belongs to pattern recognition and neural networks courses.The most important sections for Computer Vision course are 9.1, 9.4and 9.5.

– section 9.5.1’s algorithms were not treated

• Chapter 10 is central content.

– sections 10.2, 10.4, 10.6, 10.8.2, 10.9–10.10 were not treated

• Chapters 11–12 were treated superficially compared to the amount oftext in the book. Lecture slides have references to book sections andgive a hint what parts were treated and which were not.

• Chapter 13 is central content.

• Chapter 14 belongs to digital image processing course and is not in-cluded in Computer Vision course’s exam.

• Chapters 15–16 are central content.

– sections 15.1.6–15.1.8, 16.4–16.6 were not treated

220

Page 222: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

Second Edition book (· · · /x.xx)

The importance of the Second Edition book’s chapters in Spring 2015’steaching and exam:

• Chapters 1–6 belong to the course’s central content.

– section 5.5 was not treated

– section 6.2.7 was not treated in detail

• Chapter 7 belongs to pattern recognition and neural networks courses.The most important sections for Computer Vision course are 7.1, 7.4and 7.5.

– section 7.5.1’s algorithms were not treated

• Chapter 8 is central content.

– section 8.6.2 was not treated

– section 8.7 was not treated

221

Page 223: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad

• Chapters 9–10 were treated superficially compared to the amount oftext in the book. Lecture slides have references to book sections andgive a hint what parts were treated and which were not.

• Chapter 11 is central content.

• Chapters 12–13 belong to digital image processing course and are notincluded in Computer Vision course’s exam.

• Chapters 14–15 are central content.

– section 14.1.6 was not treated

– sections 15.3.3–15.4.1 were not treated

• Chapter 16 was not treated

222

Page 224: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad
Page 225: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad
Page 226: Computer Vision T-61.5070 (5 cr) P Exercises: Rao …users.ics.aalto.fi/jorma/cv-slides.pdfComputer Vision T-61.5070 (5 cr) P Spring 2015 Lectures: Jorma Laaksonen Exercises: Rao Muhammad