implementing.the.scale.invariant.feature.transform.sift.method.pdf

8/17/2019 implementing.the.scale.invariant.feature.transform.sift.method.pdf

1/9

Implementing the Scale Invariant Feature Transform(SIFT) Method

YU MENG and Dr. Bernard Tiddeman(supervisor)

Department of Computer Science

University of St. ndre!s

yumen"#dcs.st$and.ac.u%

Abstract

The SIFT algorithm[1] takes an image and transforms it into a collection of local feature vectors.

Each of these feature vectors is supposed to be distinctive and invariant to any scaling rotation or

translation of the image. In the original implementation these features can be used to find

distinctive ob!ects in differerent images and the transform can be e"tended to match faces in

images. This report describes our o#n implementation of the SIFT algorithm and highlights

potential direction for future research.

1 Introduction & Bacground

&ace reco"nition is 'ecomin" an increasin"y important for many appications incudin" uman$

macine interfaces* mutimedia* security* communication* visuay mediated interaction and

antropomorpic environments. +ne of te most difficut pro'ems is tat te process of

identifyin" a person from facia appearance as to 'e performed differenty for eac ima"e*

'ecause tere are so many confictin" factors aterin" facia appearance. +ur impementaion

focuses on derivin" S,&T features from an ima"e and tryin" usin" tese features to perform face

identification.

Te approac of S,&T feature detection ta%en in our impementation is simiar !it te one ta%en

'y -o!e et. /0* !ic is used for o'1ect reco"nition. ccordin" to teir !or%* te invariant

features e2tracted from ima"es can 'e used to perform reia'e matcin" 'et!een different vie!s

of an o'1ect or scene. Te features ave 'een so!n to 'e invariant to ima"e rotation and scae and

ro'ust across a su'stantia ran"e of affine distortion* addition of noise* and can"e in iumination.

Te approac is efficient on feature e2traction and as te a'iity to identify ar"e num'ers of

features.


2/9

&ace identification is supposed to discriminate 'et!een different faces. ,n o!e3s/0 paper* te

nearest$nei"'our tempate matcin" is presented. Te nearest nei"'or is defined as te %eypoint

!it minimum Eucidean distance for te invariant descriptor vector. 4o!ever* different specific

stra"e"ies are ta%en for different pro'ems. ,f tere are mutipe trainin" ima"es of te same face*

ten !e define te second$cosest nei"'or as 'ein" te cosest nei"'or tat is %no!n to come

from a differnt face tan te first.

Because of te varia'ity in facia appearance statistica metods are often used to do ep !it te

ocaisation of facia features. 4emer. et.50 as ta%en an approac !ic is time efficient and can

'e used !it compicated modes. Tey use S,&T features "rouped usin" a pro'a'iistic mode

initiai6ed !it a fe! parts of an o'1ect. Tey earn te parts incrementay and try addin" possi'e

parts to te mode durin" te process of trainin". Ten tey use te E2pectation$

Ma2imi6ation(EM) a"oritm to update te mode.

+ur metod is impemented as te foo!in" sta"es7 Creatin" te Difference of Gaussian 8yramid*

E2trema Detection* Noise Eimination* +rientation ssi"nment* Descriptor Computation*

9eypoints Matcin".

Figure 1$ %ottom$ &n the left is the 'aussian pyramid #ith neighbouring images

separated by a constant scale factor. These are subtracted to give the (o' pyramid on

the right. Top$ The 'aussian #ith ) t#ice that of the original is subsampled and used

to construct the ne"t octave.


3/9

! "reating the #ifference of $aussian %ramid

Te first sta"e is to construct a Gaussian :scae space: function from te input ima"e /0. Tis is

formed 'y convoution (fiterin") of te ori"ina ima"e !it Gaussian functions of varyin" !idts.

Te difference of Gaussian (DoG)* D(2* y* ;)* is cacuated as te difference 'et!een t!o fitered

ima"es* one !it % mutipied 'y scae of te oter.

(*" y )+ , -*" y k)+-*" y )+ (/)

Tese ima"es* -(2* y* ;)* are produced from te convoution of Gaussian functions* G(2* y* %;)*

!it an input ima"e* ,(2* y).

-*" y )+ , '*" y )+/I*" y+ (5)

'*" y )+,/

5 . Te effect of convovin" !it t!o Gaussian functions of

different !idts is most easiy found 'y convertin" to te &ourier domain* in !ic convoution

'ecomes mutipication i.e.' i∗'>∗ f "

' i'>

f (@)

Te &ourier transform of a Gaussian function* ea"

5

is "iven 'y.

F " [ea"

5

] t = 0

a e

−0 5 t

5 /a

(A)

Su'stitutin" tis into e?uation (@) and e?uatin" it to a covoution !it a sin"e Gaussian of !idt

k > !e "et7

e−t

5 i

5

e−t

5>

5

=e−t

5k 5>

5

()

8erformin" te mutipication of te t!o e2ponentias on te eft of tis e?uation and comparin"

te coefficients of t "ives7

i5>

5=k 5 >5

()

nd so !e "et7

i=> k 5−/ ()

Tis su'te point is not made cear in te ori"ina paper* and it is important 'ecause after

su'sampin" of te o!$passed fitered ima"es to form te o!er eves of te pyramid !e no

on"er ave access to te ori"ina ima"e at te appropriate resoution* and so !e cannot fiter !it


4/9

'k directy.

Eac octave of scae space in divided into an inte"er num'er* s* of intervas and !e et k =5/ / s .

e produce sF= ima"es for eac octave in order to form sF5 difference of Gaussian (DoG) ima"es

and ave pus and minus one scae interva for eac DoG for te e2trema detection step. ,n tis

e2periment* !e set s to =* foo!in" on from e2perimenta resuts in /0 tat su""ests tat tis

num'er produces te most sta'e %ey$points. +nce a compete octave as 'een processed* !e

su'sampe te Gaussian ima"e tat as t!ice te initia vaue of ; 'y ta%in" every second pi2e in

eac ro! and coumn. Tis "reaty improves te efficiency of te a"oritm at o!er scaes. Te

process is so!n in &i"ure /.

' trema #etection

Tis sta"e is to find te e2trema points in te D+G pyramid. To detect te oca ma2ima and

minima of D(2* y* ;)* eac point is compared !it te pi2es of a its 5 nei"'ours (&i"ure 5). ,f

tis vaue is te minimum or ma2imum tis point is an e2trema. e ten improve te ocaisation

of te %eypoint to su'pi2e accuracy* 'y usin" a second order Tayor series e2pansion. Tis "ives

te true e2trema ocation as7

z =−∂

5

D∂ x

!−/

∂ D∂ x

()

!ere D and its derivatives are evauated at te sampe point and "= " y ) T is te offset from

te sampe point.

* +e points limination

Tis sta"e attempts to eiminate some points from te candidate ist of %eypoints 'y findin" tose

tat ave o! contrast or are poory ocaised on an ed"e./0. Te vaue of te %eypoint in te DoG

pyramid at te e2trema is "iven 'y7

( 2 = (/5 ∂ (−/

∂ " 2 (/>)

,f te function vaue at , is 'eo! a tresod vaue tis point is e2cuded.

Figure $ 3n e"trema is

defined as any value in the

(o' greater than all its

neighbours in scalespace.


5/9

To eiminate poory ocaised e2trema !e use te fact tat in tese cases tere is a ar"e principe

curvature across te ed"e 'ut a sma curvature in te perpendicuar direction in te difference of

Gaussian function. 525 4essian matri2* 4* computed at te ocation and scae of te %eypoint is

used to find te curvature. it tese fomuas* te ratio of princepa curvature can 'e cec%ed

efficienty.

4 =[ ( "" ( "y ( "y ( yy] (//)

( "" ( yy

( "" ( yy− ( "y 5 r /5

r (/5)

So if ine?uaity (/5) fais* te %ey point is removed from te candidate ist.

- .rientation Assignment

Tis step aims to assi"n a consistent orientation to te %eypoints 'ased on oca ima"e properties.

n orientation isto"ram is formed from te "radient orientations of sampe points !itin a re"ion

around te %eypoint as iustrated in &i"ure =. /2/ s?uare is cosen in tis impementation.

Te orientation isto"ram as = 'ins coverin" te => de"ree ran"e of orientations/0. Te

"radient ma"nitude* m(2* y)* and orientation* H(2* y)* is precomputed usin" pi2e differences7

m " y= - "/* y − - "−/* y 5 - " y/− - " y−/5 (/=)

5 " y =arctan - " y/− - " y−// - "/* y − - "−/* y (/@)

Eac sampe is !ei"ted 'y its "radient ma"nitude and 'y a Gaussian$!ei"ted circuar !indo!

!it a ; tat is /.A times tat of te scae of te %eypoint.

Figure 6$ -eft$ The point in the middle of the left figure is the keypoint candidate. The

orientations of the points in the s7uare area around this point are precomputed using pi"el

differences. 8ight$ Each bin in the histogram holds 19 degree so it covers the #hole 6:9

degree #ith 6: bins in it. The value of each bin holds the magnitude sums from all the points

precomputed #ithin that orientation.

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8

8.5

9

9.5


6/9

8ea%s in te orientation isto"ram correspond to dominant directions of oca "radients. e ocate

te i"est pea% in te isto"ram and use tis pea% and any oter oca pea% !itin >I of te

ei"t of tis pea% to create a %eypoint !it tat orientation. Some points !i 'e assi"ned mutipe

orientations if tere are mutipe pea%s of simiar ma"nitude. Gaussian distri'ution is fit to te =

isto"ram vaues cosest to eac pea% to interpoate te pea%s position for 'etter accuracy.

Tis computes te ocation* orientation and scae of S,&T features tat ave 'een found in te

ima"e. Tese features respond stron"y to te corners and intensity "radients. Te S,&T features

appear mosty in te eyes* nostris* te top of te nose and te corners of te ips for face ima"es.

,n &i"ure @* %eypoints are indicated as arro!s. Te en"t of te arro!s indicates te ma"nititude

of te contrast at te %eypoints* and te arro!s point from te dar% to te 'ri"t side.

/ #escriptor "omputation

,n tis sta"e* a descriptor is computed for te oca ima"e re"ion tat is as distinctive as possi'e at

eac candidate %eypoint. Te ima"e "radient ma"nitudes and orientations are samped around te

%eypoint ocation. Tese vaues are iustrated !it sma arro!s at eac sampe ocation on te

first ima"e of &i"ures. Gaussian !ei"tin" function !it ; reated to te scae of te %eypoint is

used to assi"n a !ei"t to te ma"nitude. e use a ; e?ua to one af te !idt of te descriptor

!indo! in tis impementation. ,n order to acieve orientation invariance* te coordinates of te

descriptor and te "radient orientations are rotated reative to te %eypoint orientation. Tis process

is indicated in &i"ure A. ,n our impementation* a /2/ sampe array is computed and a isto"ram

!it 'ins is used. So a descriptor contains /2/2 eements in tota.

Figure ;$ ost of the features

appear on the eyes nostrils the top of nose the corners of the mouth and the earlobes.


7/9

0 Transformation

,n tis sta"e* some matcin" tests are runnin" to test te repeata'iity and sta'iity of te S,&T

features. n ima"e and a transformed version of te ima"e are used as indicated in &i"ure . Te

features of te t!o ima"es are computed separatey. Ten eac %eypoint in te ori"ina ima"e

(mode ima"e) is compared to every %eypoints in te transformed ima"e usin" te descriptors

computed in te previous sta"e. &or eac comparison* one feature is pic%ed in eac ima"e. f/ is te

descriptor array for one %ey point in te ori"ina ima"e and f5 is te descriptor array for a %ey point

in te transformed ima"e. Te most i%ey vaue for eac pair of te %eypoints is computed 'y7

,f te num'er of features in te t!o ima"es is n/ and n5* ten tere are n/Jn5 possi'e pairs

ato"eter. tese data are sorted in ascendin" order of matcin" error. Ten te first t!o

?uaified pairs of te %eypoints are cosen to set te transformation.

To appy te transform* te foo!in" functions are introduced into te impementation. Te

transform "ives te mappin" of a mode ima"e point (2* y) to a transformed ima"e point (u* v) in

terms of an ima"e scain"* s* an ima"e rotation* H* and an ima"e transation* t " * t y 07 A0

[u

v

]=[ s cos5 − s sin5

s sin5 s cos5

][ "

y

][t "

t y

] (/)

Figure ?$ -eft$ the gradient magnitude and orientation at a sample point in a s7uare region

around the keypoint location. These are #eighted by a 'aussian #indo# indicated by the

overlaid circle. 8ight$ The image gradients are added to an orientation histogram. Each

histogram include @ directions indicated by the arro#s and is computed from ;";

subregions. The length of each arro# corresponds to the sum of the gradient magnitudes

near that direction #ithin the region.

error i!=∑i=>

i=n1

∑ !=>

!=n

∣ f1i− f !∣ /A


8/9

2esults

Kesuts from our impementation are so!n in &i"ures =* @ and . e can currenty cacuate teS,&T features for an ima"e and ave e2perimented !it some simpe matcin" scemes 'et!een

ima"es. Noise ad1ustment is a very essentia part for our approac !ic coud resut in inefficient

or fase matcin". 4o!ever* !e ave used parameters !ic soud ep te %eep te feature

matcin" ro'ust to noise in tis impementation.

3 "onclusion and Future 4or

Te S,&T features descri'ed in our impementation are computed at te ed"es and tey are

invariant to ima"e scain"* rotation* addition of noise. Tey are usefu due to teir distinctiveness*

!ic ena'es te correct matc for %eypoints 'et!een faces. Tese are acieved 'y usin" our

Gradient$Based Ed"e Detector and te oca descriptors presented around te %eypoints.

Ed"es are poory defined and usuay ard to detect* 'ut tere are sti ar"e num'ers of %eypoints

can 'e e2tracted from typica ima"es. So !e can sti perform te feature matcin" even te faces

are sma. Sometimes te ima"es are too smoot to find tat many features for a matcin"* and in

tat case a sma face coud 'e unreco"ni6ed from te trainin" ima"es. Te reco"nition

Figure :$ The first image is the original image #ith it=s o#n features. The second image is the

transformation of the original #ith the features after operation. The third one is the

transformation of the original image #ith the features before the operation.


9/9

performance coud 'e improved 'y addin" ne! S,&T fetures or varyin" feature si6es and offsets.

,n te ne2t step* !e !i try to perform some face identification* and !e coose te nearest

nei"'or or second$cosest nei"'or a"oritm !ic is a "ood metod to do te %eypoints

matcin". Tere is anoter usefu metod to reco"ni6e faces 'y earnin" a statistica mode 50. ,n

tis metod* a pro'aistic mode is used to reco"ni6e te faces. n E2pectation$Ma2imi6ation(EM)

a"oritm is used to earn te parameters in a Ma2imum -i%eiood frame!or%. 4opefuy* !e can

imit te mode to a sma amount of parts !ic is efficient for matcin" faces.

it te usefu information represented as te feature$'ased mode* !e can find many directions as

te appications. e may try to trac% persons trou" a carema 'y trac%in" S,&T features. e may

render =D pictures usin" te S,&T features e2tracted from sti ima"es. To acieve tat* !e coudtry to fit te S,&T features of a specific face to a =D face mode !e3ve created. Tat3s one of te

attractive aspect of te invariant oca feature metod !e3re usin".

2eferences

/0-o!e* D.G..5>>@. Distinctive ,ma"e &eatures from Scae$,nvariant 9eypoints. Lanuary A* 5>>@

504emer* S.-o!e*D.G..+'1ect Cass Keco"nition !it Many -oca &eatures.

=0Gordon* ,.-o!e*D.G..Scene Modein"* Keco"nition and Trac%in" !it ,nvariant ima"e

&eatures.

@0Mi%oa1c6y%* 9.* and Scmid* C. 5>>5. n affine invariant interest point detector. ,n European

Conference on Computer ision (ECC)* Copena"en* Denmar%* pp. /5$/@5.

A0 -o!e* D.G. 5>>/. -oca feature vie! custerin" for =D o'1ect reco"nition. ,EEE Conference on

Computer ision and 8attern Keco"nition* 9auai* 4a!aii* pp. 5$.

0 Matcin" ,ma"es !it Different Kesoutions.

0 Mi%oa1c6y%* 9. Scmid* C. 5>>/. ,nde2in" 'ased on scae invariant interest points.

,nternationa Conference on Computer ision* ancouver* Canada (Luy 5>>/)* pp. A5A$$A=/.

0 Mi%oa1c6y%* 9. Scmid* C. 5>>5. n ffine ,nvariant ,nterest 8oint Detector. ECC.

0&orsyt* D..8once* L..(5>>=) Aomputer Bision C 3 >odern 3pproach. Ne! Lersey78rentice

4a

implementing.the.scale.invariant.feature.transform.sift.method.pdf

Documents

Transcript of implementing.the.scale.invariant.feature.transform.sift.method.pdf