F 9.09.09 INF 4300 1 - Forsiden - Universitetet i Oslo · INF 4300 From pixels to objects Object...

INF 4300

From pixels to objectsFrom pixels to objectsObject representation

Anne Solberg ([email protected])Anne Solberg ([email protected])

Introduction to image analysisFrom pixels to connected components

Representing spatial objectsCh t 11 1 G l d W dChapter 11.1 – Gonzalez and Woods

and Fourier descriptors

F 9.09.09 INF 4300 1

How can we create a program that recognizes the numbers ?the numbers ?

• Goal: get the series of digits, e.g. 14159265358971415926535897……

Steps in the program:1. Segment the image to find digit pixels.2 Find angle of rotation and rotate back2. Find angle of rotation and rotate back.

(learn Hough-transform (21.10) or morphology (7.10))

3. Create region objects – one object pr. digit or connected component.

4. Compute features describing the shape ofthe digits (23.9).

5. Train a classifier on many objects of eachdigit.

6. Classify a new object as the one with thehighest probability

Focus of this lecture

2

highest probability.

Example 2: recognize the music symbolsExample 2: recognize the music symbols

• Goal:interpret the notes and symbols to create a MIDI-file and playcreate a MIDI-file and play.

Steps in the program:1. Segment the image to find symbol pixels.2. Find angle of rotation and rotate back.3. Find the note lines and remove them.4. Create regions objects for connected

components. 5 Match each object will the known object5. Match each object will the known object

type to recognize symbol type (wholenote, quarter note, rest, bar, accidentals, etc.)

6 F ll fi d h i h i i6. For all notes: find note height given itsvertical position.

7. Create a midi file from this.

3

From pixels to features – text recognitionFrom pixels to features text recognition

• Input to the classifier is normally a setof features derived from the image data, not the image data itself.

• Why can’t we just use all the greySegmentation

• Why can t we just use all the greylevel pixels as they are for textrecognition? Region feature

extraction

– Objects corresponds to regions. We needthe spatial relationship between the pixels.

– For text recogntion: the information is in

Region features:-Area-Perimeter-Curvature

shape, not grey levels. -Moment of inertia-.....-....

4

More on the segmentation stepMore on the segmentation step• Segmentation goal: get one object

pr. digit. • What you have already learned:

– Global thresholding and e.g. Otsu’s method to estimate the threshold (INF 2310).t es o d ( 3 0)

• But this image has non-homegeneous background. After global thresholding it looks likeglobal thresholding it looks like this…..

• Better alternatives for segmentation are:

– Estimate the background e.g. using morphology (Oct. 7 2009)

– Use different thresholds in local windows and let the threshold values vary over the image (e.g. apply Otsu’s method in local

i d )windows). – Use more sophisticated segmentation

methods.

5

Introduction to object recognitionIntroduction to object recognition

• How do we teach a computer to recognize a certain type of object in a scene?

• Humans are very good at this, but teaching a computer this is very difficult!computer this is very difficult!

• Presently, we can only teach a computer to discriminate between certain types of objects, e.g. recognize printed or hand-written characters

• We have to train the algorithm by telling which objects types we want it to recognize and showobjects types we want it to recognize, and show many examples of each object.

6

Typical image analysis tasksTypical image analysis tasks• Preprocessing/noise filtering• Segmentationg• Feature extraction

– Are are original image pixel values sufficient for classification, or do we need additional features?

– What kind of features do we use to discriminate between the objectWhat kind of features do we use to discriminate between the object types involved?

• Exploratory feature analysis and selection– Which features separate the object classes best?

H f t d d?– How many features are needed?• Classification

– From a set of object examples with known class, decide on a method that separates objects of different types (e.g. different characters) p j yp ( g )For example: fit the parameteres mean and variance in a Gaussian distribution.

– For new objects: assign each object/pixel to the class with the highest probability

7

• Validating classifier accuracy

Object recognition by supervised classificationsupervised classification

• Supervised classification: compare an bj t ith t f bj t f k tobject with a set of objects of known type

(e.g. a set of characters)• Train the system by showing it many

t ti bj t f h trepresentative objects of each type– Build a database of typical feature

values for each object type.• Classify a new object:

– For all object types:• Compute the probability that the new

object belongs to each of the object types.

• Classify the object by assigning the object label with the highest probability

8

object label with the highest probability.

What is feature extraction?What is feature extraction?

• Devijver and Kittler (1982): ”Extracting from the raw data the information which is

most relevant for classification purposes, in the sense of minimizing the within-class pattern variability whileof minimizing the within-class pattern variability while enhancing the between-class variability”.

– Within-class pattern variability: variance between objects belonging to the same class.

– Between-class pattern variability: variance between objects from different classes. A B C D E F G H I J

9

Feature extractionFeature extraction• We will discriminate between different object classes based on a

t f f tset of features.• The features are chosen given the application.• Normally, a large set of different features is investigated.• Classifier design also involves feature selection - selecting the

best subset out of a large feature set.• Given a training set of a certain size, the dimensionality of the g , y

feature vector must be limited.• Careful selection of features is the most important step in image

classification!

10

Describing the shape of a segmented objectDescribing the shape of a segmented object

Assumptions: • We have a segmented, labeled image.• Each object that is to be described has been

identified during segmentation.– Ideally, one region in the segmented image

should correspond to one object.– One object should not be fragmented intoj g

several non-connected regions. – Some small noise objects will often occurr,

these can often be removed later. – This lecture and the lecture in two weeks

will describe how to find connected region objects and how to find mathematical

11

jnumbers representing their shapes.

More assumptions for shape descriptionMore assumptions for shape description

• The image objects can be represented as:– grey level image (whole regions)– binary image (whole regions)– contours (region boundaries)contours (region boundaries)– through a run length code– through a chain code

i t i di t– in cartesian coordinates– in polar coordinates– in some other coordinates– as coefficients of some transform (e.g. Fourier)

12

What is ”shape”?What is shape ?• A numerical description of the spatial configurations

in the imagein the image.• There is no generally accepted methodology of shape

description.p• Location and description of high curvature points give

essential information about the shape.

Three curvature points

Ellipses with different size and orientation

Objects with more irregular shape

13

Objects with more irregular shape

Is invariance needed?Is invariance needed?

•Can we recognize independently of size, position and orientation?•Translation invariance•Scale invariance•Rotation invariance, but what about 6 and 9?•Warp invariance

14

•For grey-level regions: invariance to contrast and mean gray level

Region identificationRegion identification• Goal of segmentation was to achieve

complete segmentation, now, the regions must be labeled.

• Search image pixel for pixel, and sequentially number each foreground pixel you find according to the labeling of its neighbors.its neighbors.

• The algorithm is basically the same in 4-connectivity and 8-connectivity, the only difference being in the neighborhood mask shape.

• Result of region identification is a matrix• Result of region identification is a matrix the same size as the image with integers representing each pixels region label.

• This description of regions will be the input to our shape descriptors

All pixels in a foreground object (connectedComponent) get the same label value, pixels In the next object get a different label etc.

• Matlab function implementing this is bwlabel

• XITE: regionAnalyse

15

Creating the region objectsCreating the region objects• Start in the upper left corner.• Find first object pixel• Find first object pixel• Traverse the border of this object

– (recursive programming efficient)

C ti ith th i l t t i it d• Continue with other pixels not yet visited• Implementing this is time consuming

(but interesting)

16

Algorithm for boundary finding (11 1)Algorithm for boundary finding (11.1)

1. Let the starting point b0 be the uppermost left foreground pixel. Let c0 be thewest neighbor (c0 is background). Check all neighbors of b0, starting at c0 in clockwise direction. Let the first foreground neighbor be b1, and let c1 be thebackground point preceding b1 in the search.

2. Set b= b1 and c= c1 . 3. Let the 8-neighbors of b, starting at c, be numbered n1,…n8. Find the lowest3. Let the 8 neighbors of b, starting at c, be numbered n1,…n8. Find the lowest

foreground pixel nk. 4. Let b= nk and c= nk-1 .5. Repeat step 3 until b=b0 and the next boundary point is b1. The sequence of

boundary points is the sequence of points that b traversesboundary points is the sequence of points that b traverses.

17

Software for generating regions from a segmented imagesegmented image

• Matlab: Functions bwlabel and regionprops are the starting i tpoint:

B1=bwlabel(B); B is a segmented binary image with object pixels = 255;D=regionprops(B1,’area’,’boundingbox’); Compute area and boundingbox for

all regions in the imageall regions in the image. imshow(label2rgb(B1)); %Display the regions in different colors. aa=[D.Area];See help regionprops for a list of all descriptors. Typical use: get a list pixels for each regiong=bwperim(B,8); Find perimeter pixels of all region in binary image B

• XITE – Image processing library/software in Cf f– http://www.ifi.uio.no/forskning/grupper/dsb/Programvare/Xite/

– Main program: xshow <filename.biff> – Region tools program: regionAnalyse– Main function to call from your C-program: regionYline – creates the region

18

Main function to call from your C program: regionYline creates the region objects.

Contour representationContour representation• Goal of contour methods is to describe contours in images

H f ll t d t ti th d d li f• Hopefully, our contour detection method delivers a sequence of pixel coordinates in the image!

• The contour can be represented as:Cartesian coordinates– Cartesian coordinates

– Polar coordinates from a reference point (usually image origin)– Chain code and a starting point

• Connectivity: 4- or 8-neighborsConnectivity: 4 or 8 neighbors• Note: chain code is very sensitive to noise, image resolution and object

rotation.• Not well suited for classification directly, but useful for

computation of other featurescomputation of other features.

19

ChainsChains

• Chains represent objects within the image or borders between an object and the background

Object pixel

•How do we define border pixels?• 4-neighbors• 8 neighbors• 8-neighbors

•In which order do we check theneighbors? Can vary betweenl i halgorithms.

•Chains represent the objectpixel, but not their spatial

20

p , prelationship directly

Chain codesChain codes

• Chain codes represent the 1 4-directionalboundary of a region

• Chain codes are formed by following the boundary in a given

02

4 directionalcode

following the boundary in a given direction (e.g. clockwise) with 4-neighbors or 8-neighbors 3

8-directional

• A code is associated with each direction

• A code is based on a starting3

2

1

code

• A code is based on a starting point, often the upper leftmost point of the object

0

75

4

6

21

6

Chain codesChain codes

2Search directionStart

04

32

1

756

Chain code:00077665555556600000064444444222111111223444565221

22

Using all border point vs. resampling the borderUsing all border point vs. resampling the border

• If chain codes are generated from every point on the border, the chain will be both long and can be sensitive to smalldisturbances along the border.

• The border is often resampled to a larger grid before chaincoding is done. The accuracy of the chain code depends on thegrid size.

• Comment: Chain codes are good for border representation, but

23

often too noisy for shape recognition.

Comments about the chain codeComments about the chain code• The chain code depends on the starting point. • It can be normalized for start point by treating it as a

circular/periodic sequence, and redefine the starting point so that the resulting number is of minimum p gmagnitude.

• It can also be normalized for rotation by using thefirst difference of the chain code instead of the chainfirst difference of the chain code instead of the chaincode itself. – Code: 10103322– First difference: 3133030

02

1 4-directionalcode

– First difference: 3133030

• This invariance is only valid if the boundary itself is invariant to rotation and scale

3

24

invariant to rotation and scale.

Chain code example with smoothingChain code example with smoothing

25

Signature representationsSignature representations

A different way of thinking about the contour:• A signature is a 1D functional representation of a

boundary.It b t d i l• It can be represented in several ways.

• Simple choice: distance vs. angle:

26

Signature exampleSignature example

27

Normalization of signaturesNormalization of signatures• Normalization for starting point:

– Find unique point on the boundary according to some criteria:• Farthest away from the center (may not be unique)• Obtain the chain code and normalize as describe for chain codes.

N li ti f l• Normalization for scale:– If the boundary is sampled at equal intervals in angle, changes in

size only affect the amplitude (r).• The function can be scales to that it always span the range [0 1]• The function can be scales to that it always span the range [0,1].

– This is not robust for extreme values of min/max. • Scaling by subtracting the mean and dividing by the standard

deviation is more robust. μ−r

σ

28

Boundary segmentsBoundary segments• The boundary can be decomposed into segments.

f l f f f h b– Useful to extract information from concave parts of the objects.• Convex hull H of set S is the smallest convex set containing S. • The set difference H-S is called the convex deficiency D . • If we trace the boundary and identify the points where we go in and

out of the convex deficiency, these points can represent importantborder points charaterizing the shape of the border.

• Border points are often noisy and smoothing can be applied first We• Border points are often noisy, and smoothing can be applied first. Wecan smooth only the border by taking the average of k boundarypoints.

Question: are thesePoints useful forCharacterizing the digit 8?What about A?

29

What about A?

Features extracted from the boundaryFeatures extracted from the boundary

Useful features for shape characterization can be e.g.:

• Area of object minus area of convex deficiency• Number of components of convex deficiency

30

SkeletonsSkeletons• Skeleton of a region is defined by the medial axis transform: For a

region R with border B for every point p in R find the closest point onregion R with border B, for every point p in R, find the closest point onB. The points p that have equal distance to two or more border pointare on the skeleton.

• The skeleton S(A) of an object is the axis of the object (binary value)The skeleton S(A) of an object is the axis of the object (binary value)• Medial axis transform gives the distance to the border

31

Algorithm for thinning to find the skeletonAlgorithm for thinning to find the skeleton

• There are many algorithms in the literature, here is just one. • Assume that region points have value 1 and background point

0. • Define a contour point as a point that has value 1 and at least

one 8-neighbor with value 0. • Flag contour point p1 for delete if:

a) 2≤N(p1) ≤6) (p1)b) T(p1) =1c) p2⋅p4⋅p6=0d) p p p =0d) p4⋅p6⋅p8=0N(p1) is the number of nonzero neighbors of p1.T(p1) is the number of 0-1 transitions in the

32

ordered sequence p2, p3,...., p8, p9, p2⋅

Thinning algorithm – contThinning algorithm cont. • Step 2: change condition c) and d) to

c’) p2⋅p4⋅p8=0d’) p2⋅p6⋅p8=0

• Step 1 is applied to every border pixel in the region p pp y p gunder consideration. If any of the conditions a)-d) are violated, the value of the point is unchanged.– If all conditions are satisfied, the point is flagged. The point a co d o s a e sa s ed, e po s agged e po

is not deleted until all points have been satisfied.– After completion of step 1, the flagged points are deleted.

• Step 2 is then applied in the same manner as step 1.Step 2 is then applied in the same manner as step 1. • Step1+Step2 defines one iteration of the algorithm.• More iterations are done until no further points are

d l t d

33

deleted.

Introduction to Fourier descriptorsIntroduction to Fourier descriptors• Suppose that we have an object S and that

we are able to find the lenght of itsgcontour. The contour should be a closedcurve.

• We partition the contour into M segments f l l th d th b fi d M

y

of equal length, and thereby find M equidistant points along the contour of S.

• Traveling anti-clockwise along this contourat constant speed we can collect a pair of

x

at constant speed, we can collect a pair ofwaveforms x(k) and y(k). Any 1D signal representation can be used for these.

• If the speed is such that onecircumnavigation of the object takes 2π, x(k) and y(x) will we periodic with period2π.

34

Contour description using 1D Fourier transform

Start

transform

• The coordinates (x,y) of these M0 7

0

points are then put into a complex vector s:

s(k)=x(k)+iy(k) k∈[0 M-1]s(k) x(k)+iy(k), k∈[0,M 1]

• We view the x-axis as the real axis and the y-axis as the imaginary one f f l b

7

is 13)1( +=

for a sequence of complex numbers.• The description of the object contour

is changed but all the information is M

is

is

33)3(

22)2(

+=+=

is changed, but all the information is preseved.

• We have transformed the contour

35

problem from 2D to 1D.

Fourier-coefficients from f(k)Fourier coefficients from f(k)• We perform an 1D forward Fourier transform

• The complex coefficients a(u) are called the Fourier descriptors of the boundary

[ ]1,0 ,2

sin2

cos)(12

exp)(1

)(1

0

1

0

−∈⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛−⎟

⎠⎞

⎜⎝⎛=⎟

⎠⎞

⎜⎝⎛ −= ∑∑

−

=

−

=

MuM

uki

M

ukks

MM

iukks

Mua

M

k

M

k

πππ

of the boundary.• a(0) now contains the center of mass (average coordinates) of

the object, and the coefficients a(1), a(2),....,a(M-1) will describe the object in increasing detaildescribe the object in increasing detail.

• Since a(0) is the center of mass, exclude it as a feature for object recognition.

• These features depend on rotation, scaling and starting point ofThese features depend on rotation, scaling and starting point of the contour.

• For object recognitions, we use only the N first coefficients (a(N), N<M)

36

• This corresponds to setting a(k)=0, k>N-1

Symbol reconstruction - FourierSymbol reconstruction Fourier• When using the inverse Fourier transform, we get an approximation to

the original contourthe original contour

We have only used N features to reconstruct each component of

[ ]1,0 ,2

exp)()(ˆ1

0

−∈⎟⎠⎞

⎜⎝⎛=∑

−

=

MkM

iukuaks

N

k

π

)(ˆ ks• We have only used N features to reconstruct each component of• The number of points in the approximation is the same (M), but the

number of coefficients (features) used to reconstruct each point is smaller (N<M).

)(ks

smaller (N<M).

• Use an even number of descriptors. • The first 10-16 descriptors are found to be sufficient for character p

description. They can be used as features for classification.• The Fourier descriptors can be invariant to translation and rotation if

the coordinate system is appropriately chosen. All properties of 1D F i t f i ( li t l ti t ti ) b li d

37

Fourier transform pairs (scaling, translation, rotation) can be applied.

Fourer descriptor exampleFourer descriptor exampleMatlab DIPUM Toolbox:

Image, 26x21 pixels

b=boundaries(f);b=b{1};%size(b) tells that the contour is 65

Boundary 8 coefficients

%size(b) tells that the contour is 65 pixels long

bim=bound2im(b,26,21);%must tell image dimensionz frdescp(b);

2 coefficients20 coefficients

z=frdescp(b);zinv2=ifrdesc(z,2);z2im=bound2im(zinv2,26,21);imshow(z2im);

4 coefficients

6 coefficients

38

Fourier coefficients and invarianceFourier coefficients and invariance

• Translation affects only the center of mass (a(0)).• Rotation only affects the phase of the coefficients. • Scaling affects all coefficients in the same way, so

ti ( )/ ( ) t ff t dratios a(u1)/a(u2) are not affected. • The start point affects the phase of the coefficients. • Normalized coefficients can be obtained but is• Normalized coefficients can be obtained, but is

beyond the scope of this course. – See e.g. Feature extraction methods for character

iti Ø D T i A J i d T T trecognition – a survey. Ø. D. Trier, A. Jain and T. Taxt, Pattern Recognition, vol. 29, no. 4, pp. 641-662, 1996.

39

Remark about Fourier descriptorsRemark about Fourier descriptorsNote that the countour ofis

Objects with holes, like 8, 9, 6 will only by described by the outer contour.

The length of a(u) will depend on the number ofThe length of a(u) will depend on the number of boundary points (and the size of the object). We should resample the contour by selecting M evenly spaced point along the contour

40

spaced point along the contour.

F 9.09.09 INF 4300 1 - Forsiden - Universitetet i Oslo · INF 4300 From pixels to objects Object...

Documents

Transcript of F 9.09.09 INF 4300 1 - Forsiden - Universitetet i Oslo · INF 4300 From pixels to objects Object...