implementing.the.scale.invariant.feature.transform.sift.method.pdf

download implementing.the.scale.invariant.feature.transform.sift.method.pdf

of 9

Transcript of implementing.the.scale.invariant.feature.transform.sift.method.pdf

  • 8/17/2019 implementing.the.scale.invariant.feature.transform.sift.method.pdf

    1/9

    Implementing the Scale Invariant Feature Transform(SIFT) Method

    YU MENG and Dr. Bernard Tiddeman(supervisor)

    Department of Computer Science

    University of St. ndre!s

    yumen"#dcs.st$and.ac.u% 

    Abstract

    The SIFT algorithm[1] takes an image and transforms it into a collection of local feature vectors.

     Each of these feature vectors is supposed to be distinctive and invariant to any scaling rotation or

    translation of the image. In the original implementation these features can be used to find

    distinctive ob!ects in differerent images and the transform can be e"tended to match faces in

    images. This report describes our o#n implementation of the SIFT algorithm and highlights

     potential direction for future research.

    1 Introduction & Bacground

    &ace reco"nition is 'ecomin" an increasin"y important for many appications incudin" uman$

    macine interfaces* mutimedia* security* communication* visuay mediated interaction and

    antropomorpic environments. +ne of te most difficut pro'ems is tat te process of

    identifyin" a person from facia appearance as to 'e performed differenty for eac ima"e*

     'ecause tere are so many confictin" factors aterin" facia appearance. +ur impementaion

    focuses on derivin" S,&T features from an ima"e and tryin" usin" tese features to perform face

    identification.

    Te approac of S,&T feature detection ta%en in our impementation is simiar !it te one ta%en

     'y -o!e et. /0* !ic is used for o'1ect reco"nition. ccordin" to teir !or%* te invariant

    features e2tracted from ima"es can 'e used to perform reia'e matcin" 'et!een different vie!s

    of an o'1ect or scene. Te features ave 'een so!n to 'e invariant to ima"e rotation and scae and

    ro'ust across a su'stantia ran"e of affine distortion* addition of noise* and can"e in iumination.

    Te approac is efficient on feature e2traction and as te a'iity to identify ar"e num'ers of

    features.

  • 8/17/2019 implementing.the.scale.invariant.feature.transform.sift.method.pdf

    2/9

    &ace identification is supposed to discriminate 'et!een different faces. ,n o!e3s/0 paper* te

    nearest$nei"'our tempate matcin" is presented. Te nearest nei"'or is defined as te %eypoint

    !it minimum Eucidean distance for te invariant descriptor vector. 4o!ever* different specific

    stra"e"ies are ta%en for different pro'ems. ,f tere are mutipe trainin" ima"es of te same face*

    ten !e define te second$cosest nei"'or as 'ein" te cosest nei"'or tat is %no!n to come

    from a differnt face tan te first.

    Because of te varia'ity in facia appearance statistica metods are often used to do ep !it te

    ocaisation of facia features. 4emer. et.50 as ta%en an approac !ic is time efficient and can

     'e used !it compicated modes. Tey use S,&T features "rouped usin" a pro'a'iistic mode

    initiai6ed !it a fe! parts of an o'1ect. Tey earn te parts incrementay and try addin" possi'e

     parts to te mode durin" te process of trainin". Ten tey use te E2pectation$

    Ma2imi6ation(EM) a"oritm to update te mode.

    +ur metod is impemented as te foo!in" sta"es7 Creatin" te Difference of Gaussian 8yramid*

    E2trema Detection* Noise Eimination* +rientation ssi"nment* Descriptor Computation*

    9eypoints Matcin".

     Figure 1$ %ottom$ &n the left is the 'aussian pyramid #ith neighbouring images

     separated by a constant scale factor. These are subtracted to give the (o' pyramid on

    the right. Top$ The 'aussian #ith ) t#ice that of the original is subsampled and used

    to construct the ne"t octave.

  • 8/17/2019 implementing.the.scale.invariant.feature.transform.sift.method.pdf

    3/9

    ! "reating the #ifference of $aussian %ramid

    Te first sta"e is to construct a Gaussian :scae space: function from te input ima"e /0. Tis is

    formed 'y convoution (fiterin") of te ori"ina ima"e !it Gaussian functions of varyin" !idts.

    Te difference of Gaussian (DoG)* D(2* y* ;)* is cacuated as te difference 'et!een t!o fitered

    ima"es* one !it % mutipied 'y scae of te oter.

     (*" y )+ , -*" y k)+-*" y )+ (/) 

    Tese ima"es* -(2* y* ;)* are produced from te convoution of Gaussian functions* G(2* y* %;)*

    !it an input ima"e* ,(2* y).

     -*" y )+ , '*" y )+/I*" y+ (5)

    '*" y )+,/

    5 . Te effect of convovin" !it t!o Gaussian functions of

    different !idts is most easiy found 'y convertin" to te &ourier domain* in !ic convoution

     'ecomes mutipication i.e.' i∗'>∗ f     "

      ' i'>

     f     (@) 

    Te &ourier transform of a Gaussian function*   ea"

    5

     is "iven 'y.

     F  " [ea"

    5

    ] t = 0 

    a e

    −0 5 t 

    5 /a

      (A)

    Su'stitutin" tis into e?uation (@) and e?uatin" it to a covoution !it a sin"e Gaussian of !idt

    k  > !e "et7

    e−t 

    5 i

    5

    e−t 

    5>

    5

    =e−t 

    5k 5>

    5

      ()

    8erformin" te mutipication of te t!o e2ponentias on te eft of tis e?uation and comparin"

    te coefficients of t  "ives7

    i5>

    5=k 5 >5

      ()

    nd so !e "et7

    i=> k 5−/   () 

    Tis su'te point is not made cear in te ori"ina paper* and it is important 'ecause after

    su'sampin" of te o!$passed fitered ima"es to form te o!er eves of te pyramid !e no

    on"er ave access to te ori"ina ima"e at te appropriate resoution* and so !e cannot fiter !it

  • 8/17/2019 implementing.the.scale.invariant.feature.transform.sift.method.pdf

    4/9

    'k  directy.

    Eac octave of scae space in divided into an inte"er num'er*  s* of intervas and !e et k =5/ / s .

    e produce sF= ima"es for eac octave in order to form sF5 difference of Gaussian (DoG) ima"es

    and ave pus and minus one scae interva for eac DoG for te e2trema detection step. ,n tis

    e2periment* !e set s to =* foo!in" on from e2perimenta resuts in /0 tat su""ests tat tis

    num'er produces te most sta'e %ey$points. +nce a compete octave as 'een processed* !e

    su'sampe te Gaussian ima"e tat as t!ice te initia vaue of ; 'y ta%in" every second pi2e in

    eac ro! and coumn. Tis "reaty improves te efficiency of te a"oritm at o!er scaes. Te

     process is so!n in &i"ure /.

    ' trema #etection

    Tis sta"e is to find te e2trema points in te D+G pyramid. To detect te oca ma2ima and

    minima of D(2* y* ;)* eac point is compared !it te pi2es of a its 5 nei"'ours (&i"ure 5). ,f

    tis vaue is te minimum or ma2imum tis point is an e2trema. e ten improve te ocaisation

    of te %eypoint to su'pi2e accuracy* 'y usin" a second order Tayor series e2pansion. Tis "ives

    te true e2trema ocation as7

       z =−∂

    5

     D∂ x 

    !−/

    ∂ D∂  x 

      ()

    !ere D and its derivatives are evauated at te sampe point and   "= " y ) T   is te offset from

    te sampe point.

    * +e points limination

    Tis sta"e attempts to eiminate some points from te candidate ist of %eypoints 'y findin" tose

    tat ave o! contrast or are poory ocaised on an ed"e./0. Te vaue of te %eypoint in te DoG

     pyramid at te e2trema is "iven 'y7

     ( 2 = (/5 ∂ (−/

    ∂ "   2    (/>) 

    ,f te function vaue at , is 'eo! a tresod vaue tis point is e2cuded.

     Figure $ 3n e"trema is

    defined as any value in the

     (o' greater than all its

    neighbours in scalespace.

  • 8/17/2019 implementing.the.scale.invariant.feature.transform.sift.method.pdf

    5/9

    To eiminate poory ocaised e2trema !e use te fact tat in tese cases tere is a ar"e principe

    curvature across te ed"e 'ut a sma curvature in te perpendicuar direction in te difference of

    Gaussian function. 525 4essian matri2* 4* computed at te ocation and scae of te %eypoint is

    used to find te curvature. it tese fomuas* te ratio of princepa curvature can 'e cec%ed

    efficienty.

     4 =[ ( "" ( "y ( "y ( yy]   (//) 

     ( "" ( yy

     ( "" (  yy− ( "y 5 r /5

    r   (/5) 

    So if ine?uaity (/5) fais* te %ey point is removed from te candidate ist.

    - .rientation Assignment

    Tis step aims to assi"n a consistent orientation to te %eypoints 'ased on oca ima"e properties.

    n orientation isto"ram is formed from te "radient orientations of sampe points !itin a re"ion

    around te %eypoint as iustrated in &i"ure =. /2/ s?uare is cosen in tis impementation.

    Te orientation isto"ram as = 'ins coverin" te => de"ree ran"e of orientations/0. Te

    "radient ma"nitude* m(2* y)* and orientation* H(2* y)* is precomputed usin" pi2e differences7

      m  " y=  - "/* y − -  "−/* y 5 -  " y/− -  " y−/5   (/=)

    5  " y =arctan  - " y/− -  " y−// - "/* y − - "−/* y   (/@)

    Eac sampe is !ei"ted 'y its "radient ma"nitude and 'y a Gaussian$!ei"ted circuar !indo!

    !it a ; tat is /.A times tat of te scae of te %eypoint.

     Figure 6$ -eft$ The point in the middle of the left figure is the keypoint candidate. The

    orientations of the points in the s7uare area around this point are precomputed using pi"el

    differences. 8ight$ Each bin in the histogram holds 19 degree so it covers the #hole 6:9

    degree #ith 6: bins in it. The value of each bin holds the magnitude sums from all the points

     precomputed #ithin that orientation.

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    5.5

    6

    6.5

    7

    7.5

    8

    8.5

    9

    9.5

  • 8/17/2019 implementing.the.scale.invariant.feature.transform.sift.method.pdf

    6/9

    8ea%s in te orientation isto"ram correspond to dominant directions of oca "radients. e ocate

    te i"est pea% in te isto"ram and use tis pea% and any oter oca pea% !itin >I of te

    ei"t of tis pea% to create a %eypoint !it tat orientation. Some points !i 'e assi"ned mutipe

    orientations if tere are mutipe pea%s of simiar ma"nitude. Gaussian distri'ution is fit to te =

    isto"ram vaues cosest to eac pea% to interpoate te pea%s position for 'etter accuracy.

    Tis computes te ocation* orientation and scae of S,&T features tat ave 'een found in te

    ima"e. Tese features respond stron"y to te corners and intensity "radients. Te S,&T features

    appear mosty in te eyes* nostris* te top of te nose and te corners of te ips for face ima"es.

    ,n &i"ure @* %eypoints are indicated as arro!s. Te en"t of te arro!s indicates te ma"nititude

    of te contrast at te %eypoints* and te arro!s point from te dar% to te 'ri"t side.

    / #escriptor "omputation

    ,n tis sta"e* a descriptor is computed for te oca ima"e re"ion tat is as distinctive as possi'e at

    eac candidate %eypoint. Te ima"e "radient ma"nitudes and orientations are samped around te

    %eypoint ocation. Tese vaues are iustrated !it sma arro!s at eac sampe ocation on te

    first ima"e of &i"ures. Gaussian !ei"tin" function !it ; reated to te scae of te %eypoint is

    used to assi"n a !ei"t to te ma"nitude. e use a ; e?ua to one af te !idt of te descriptor

    !indo! in tis impementation. ,n order to acieve orientation invariance* te coordinates of te

    descriptor and te "radient orientations are rotated reative to te %eypoint orientation. Tis process

    is indicated in &i"ure A. ,n our impementation* a /2/ sampe array is computed and a isto"ram

    !it 'ins is used. So a descriptor contains /2/2 eements in tota.

     Figure ;$ ost of the features

    appear on the eyes nostrils the top of nose the corners of the mouth and the earlobes.

  • 8/17/2019 implementing.the.scale.invariant.feature.transform.sift.method.pdf

    7/9

    0 Transformation

    ,n tis sta"e* some matcin" tests are runnin" to test te repeata'iity and sta'iity of te S,&T

    features. n ima"e and a transformed version of te ima"e are used as indicated in &i"ure . Te

    features of te t!o ima"es are computed separatey. Ten eac %eypoint in te ori"ina ima"e

    (mode ima"e) is compared to every %eypoints in te transformed ima"e usin" te descriptors

    computed in te previous sta"e. &or eac comparison* one feature is pic%ed in eac ima"e. f/ is te

    descriptor array for one %ey point in te ori"ina ima"e and f5 is te descriptor array for a %ey point

    in te transformed ima"e. Te most i%ey vaue for eac pair of te %eypoints is computed 'y7

    ,f te num'er of features in te t!o ima"es is n/ and n5* ten tere are n/Jn5 possi'e pairs

    ato"eter. tese data are sorted in ascendin" order of matcin" error. Ten te first t!o

    ?uaified pairs of te %eypoints are cosen to set te transformation.

    To appy te transform* te foo!in" functions are introduced into te impementation. Te

    transform "ives te mappin" of a mode ima"e point (2* y) to a transformed ima"e point (u* v) in

    terms of an ima"e scain"* s* an ima"e rotation* H* and an ima"e transation*   t  " * t  y 07 A0

    [u

    v

    ]=[ s cos5    − s sin5 

     s sin5 s cos5 

     ][ "

     y

    ][t  "

    t  y

    ]  (/)

     Figure ?$ -eft$ the gradient magnitude and orientation at a sample point in a s7uare region

    around the keypoint location. These are #eighted by a 'aussian #indo# indicated by the

    overlaid circle. 8ight$ The image gradients are added to an orientation histogram. Each

    histogram include @ directions indicated by the arro#s and is computed from ;";

     subregions. The length of each arro# corresponds to the sum of the gradient magnitudes

    near that direction #ithin the region.

    error i!=∑i=>

    i=n1

    ∑ !=>

     !=n

    ∣ f1i− f  !∣   /A

  • 8/17/2019 implementing.the.scale.invariant.feature.transform.sift.method.pdf

    8/9

     

    2esults

    Kesuts from our impementation are so!n in &i"ures =* @ and . e can currenty cacuate teS,&T features for an ima"e and ave e2perimented !it some simpe matcin" scemes 'et!een

    ima"es. Noise ad1ustment is a very essentia part for our approac !ic coud resut in inefficient

    or fase matcin". 4o!ever* !e ave used parameters !ic soud ep te %eep te feature

    matcin" ro'ust to noise in tis impementation.

    3 "onclusion and Future 4or 

    Te S,&T features descri'ed in our impementation are computed at te ed"es and tey are

    invariant to ima"e scain"* rotation* addition of noise. Tey are usefu due to teir distinctiveness*

    !ic ena'es te correct matc for %eypoints 'et!een faces. Tese are acieved 'y usin" our

    Gradient$Based Ed"e Detector and te oca descriptors presented around te %eypoints.

    Ed"es are poory defined and usuay ard to detect* 'ut tere are sti ar"e num'ers of %eypoints

    can 'e e2tracted from typica ima"es. So !e can sti perform te feature matcin" even te faces

    are sma. Sometimes te ima"es are too smoot to find tat many features for a matcin"* and in

    tat case a sma face coud 'e unreco"ni6ed from te trainin" ima"es. Te reco"nition

     Figure :$ The first image is the original image #ith it=s o#n features. The second image is the

    transformation of the original #ith the features after operation. The third one is the

    transformation of the original image #ith the features before the operation.

  • 8/17/2019 implementing.the.scale.invariant.feature.transform.sift.method.pdf

    9/9

     performance coud 'e improved 'y addin" ne! S,&T fetures or varyin" feature si6es and offsets.

    ,n te ne2t step* !e !i try to perform some face identification* and !e coose te nearest

    nei"'or or second$cosest nei"'or a"oritm !ic is a "ood metod to do te %eypoints

    matcin". Tere is anoter usefu metod to reco"ni6e faces 'y earnin" a statistica mode 50. ,n

    tis metod* a pro'aistic mode is used to reco"ni6e te faces. n E2pectation$Ma2imi6ation(EM)

    a"oritm is used to earn te parameters in a Ma2imum -i%eiood frame!or%. 4opefuy* !e can

    imit te mode to a sma amount of parts !ic is efficient for matcin" faces.

    it te usefu information represented as te feature$'ased mode* !e can find many directions as

    te appications. e may try to trac% persons trou" a carema 'y trac%in" S,&T features. e may

    render =D pictures usin" te S,&T features e2tracted from sti ima"es. To acieve tat* !e coudtry to fit te S,&T features of a specific face to a =D face mode !e3ve created. Tat3s one of te

    attractive aspect of te invariant oca feature metod !e3re usin".

    2eferences

    /0-o!e* D.G..5>>@. Distinctive ,ma"e &eatures from Scae$,nvariant 9eypoints. Lanuary A* 5>>@

    504emer* S.-o!e*D.G..+'1ect Cass Keco"nition !it Many -oca &eatures.

    =0Gordon* ,.-o!e*D.G..Scene Modein"* Keco"nition and Trac%in" !it ,nvariant ima"e

    &eatures.

    @0Mi%oa1c6y%* 9.* and Scmid* C. 5>>5. n affine invariant interest point detector. ,n European

    Conference on Computer ision (ECC)* Copena"en* Denmar%* pp. /5$/@5.

    A0 -o!e* D.G. 5>>/. -oca feature vie! custerin" for =D o'1ect reco"nition. ,EEE Conference on

    Computer ision and 8attern Keco"nition* 9auai* 4a!aii* pp. 5$.

    0 Matcin" ,ma"es !it Different Kesoutions.

    0 Mi%oa1c6y%* 9. Scmid* C. 5>>/. ,nde2in" 'ased on scae invariant interest points.

    ,nternationa Conference on Computer ision* ancouver* Canada (Luy 5>>/)* pp. A5A$$A=/.

    0 Mi%oa1c6y%* 9. Scmid* C. 5>>5. n ffine ,nvariant ,nterest 8oint Detector. ECC.

    0&orsyt* D..8once* L..(5>>=) Aomputer Bision C 3 >odern 3pproach. Ne! Lersey78rentice

    4a