Jul09 Hinton Deeplearn

download Jul09 Hinton Deeplearn

of 126

Transcript of Jul09 Hinton Deeplearn

  • 8/13/2019 Jul09 Hinton Deeplearn

    1/126

    UCL Tutorial on:Deep Belief Nets

    (An updated and extended version of my 2007 N!" tutorial#

    $eoffrey %inton

    Canadian nstitute for Advan&ed 'esear&

    )

    Department of Computer "&ien&e

    University of Toronto

  • 8/13/2019 Jul09 Hinton Deeplearn

    2/126

    "&edule for te Tutorial

    * 2+00 , -+-0 Tutorial part .

    * -+-0 , -+/ 1uestions

    * -+/ /+. Tea Brea3

    * /+. , +/ Tutorial part 2

    * +/ , 4+00 1uestions

  • 8/13/2019 Jul09 Hinton Deeplearn

    3/126

    "ome tin5s you 6ill learn in tis tutorial

    * %o6 to learn multilayer 5enerative models of unlaelled

    data y learnin5 one layer of features at a time+, %o6 to add 8ar3ov 'andom 9ields in ea& idden layer+

    * %o6 to use 5enerative models to ma3e dis&riminativetrainin5 metods 6or3 mu& etter for &lassifi&ation and

    re5ression+, %o6 to extend tis approa& to $aussian !ro&esses and

    o6 to learn &omplex domainspe&ifi& 3ernels for a$aussian !ro&ess+

    * %o6 to perform nonlinear dimensionality redu&tion on verylar5e datasets

    , %o6 to learn inary lo6dimensional &odes and o6 touse tem for very fast do&ument retrieval+

    * %o6 to learn multilayer 5enerative models of i5

    dimensional se;uential data+

  • 8/13/2019 Jul09 Hinton Deeplearn

    4/126

    A spe&trum of ma&ine learnin5 tas3s

    * Lo6dimensional data (e+5+

    less tan .00 dimensions#

    * Lots of noise in te data

    * Tere is not mu& stru&ture in

    te data and 6at stru&ture

    tere is &an e represented y

    a fairly simple model+

    * Te main prolem is

    distin5uisin5 true stru&ture

    from noise+

    * %i5dimensional data (e+5+

    more tan .00 dimensions#

    * Te noise is not suffi&ient to

    os&ure te stru&ture in te

    data if 6e pro&ess it ri5t+* Tere is a u5e amount of

    stru&turein te data ut te

    stru&ture is too &ompli&ated to

    e represented y a simple

    model+

    * Te main prolem is fi5urin5

    out a 6ay to represent te

    &ompli&ated stru&ture so tat it

    &an e learned+

    Typi&al "tatisti&sArtifi&ial ntelli5en&e

  • 8/13/2019 Jul09 Hinton Deeplearn

    5/126

    %istori&al a&35round:9irst 5eneration neural net6or3s

    * !er&eptrons (e o?e&ts y

    learnin5 o6 to 6ei5ttese features+

    , Tere 6as a neatlearnin5 al5oritm forad?ustin5 te 6ei5ts+

    , But per&eptrons arefundamentally limitedin 6at tey &an learnto do+

    nonadaptive

    and&oded

    features

    output units

    e+5+ &lass laels

    input units

    e+5+ pixels

    "3et& of a typi&al

    per&eptron from te .=40@s

    Bom Toy

  • 8/13/2019 Jul09 Hinton Deeplearn

    6/126

    "e&ond 5eneration neural net6or3s (

  • 8/13/2019 Jul09 Hinton Deeplearn

    7/126

    A temporary di5ression

    * apni3 and is &o6or3ers developed a very &lever type

    of per&eptron &alled a "upport e&tor 8a&ine+

    , nstead of and&odin5 te layer of nonadaptive

    features ea& trainin5 example is used to &reate a

    ne6 feature usin5 a fixed re&ipe+* Te feature &omputes o6 similar a test example is to tat

    trainin5 example+

    , Ten a &lever optimi>ation teue is used to sele&t

    te est suset of te features and to de&ide o6 to

    6ei5t ea& feature 6en &lassifyin5 a test &ase+* But its ?ust a per&eptron and as all te same limitations+

    * n te .==0@s many resear&ers aandoned neural

    net6or3s 6it multiple adaptive idden layers e&ause

    "upport e&tor 8a&ines 6or3ed etter+

  • 8/13/2019 Jul09 Hinton Deeplearn

    8/126

  • 8/13/2019 Jul09 Hinton Deeplearn

    9/126

    Ever&omin5 te limitations of a&3

    propa5ation

    * Feep te effi&ien&y and simpli&ity of usin5 a

    5radient metod for ad?ustin5 te 6ei5ts ut use

    it for modelin5 te stru&ture of te sensory input+

    ,Ad?ust te 6ei5ts to maximi>e te proailitytat a 5enerative model 6ould ave produ&ed

    te sensory input+

    , Learn p(ima5e# not p(lael G ima5e#

    * f you 6ant to do &omputer vision first learn

    &omputer 5rapi&s

    * at 3ind of 5enerative model sould 6e learn

  • 8/13/2019 Jul09 Hinton Deeplearn

    10/126

    Belief Nets

    * A elief net is a dire&ted

    a&y&li& 5rap &omposed of

    sto&asti& variales+

    * e 5et to oserve some of

    te variales and 6e 6ould

    li3e to solve t6o prolems:

    * Te inferen&e prolem:nfer

    te states of te unoserved

    variales+

    * Te learnin5 prolem:Ad?ust

    te intera&tions et6een

    variales to ma3e te

    net6or3 more li3ely to

    5enerate te oserved data+

    sto&asti&idden

    &ause

    visile

    effe&t

    e 6ill use nets &omposed of

    layers of sto&asti& inary variales

    6it 6ei5ted &onne&tions+ Later

    6e 6ill 5enerali>e to oter types of

    variale+

  • 8/13/2019 Jul09 Hinton Deeplearn

    11/126

    "to&asti& inary units(Bernoulli variales#

    * Tese ave a state of .

    or 0+

    * Te proaility ofturnin5 on is determined

    y te 6ei5ted input

    from oter units (plus a

    ias#

    0

    0

    .

    +==

    j

    jijii

    wsbsp

    )exp(1)(

    11

    +j

    jiji wsb

    )( 1=isp

  • 8/13/2019 Jul09 Hinton Deeplearn

    12/126

    Learnin5 Deep Belief Nets

    * t is easy to 5enerate anuniased example at te

    leaf nodes so 6e &an see

    6at 3inds of data te

    net6or3 elieves in+

    * t is ard to infer te

    posterior distriution over

    all possile &onfi5urations

    of idden &auses+

    * t is ard to even 5et asample from te posterior+

    * "o o6 &an 6e learn deep

    elief nets tat ave

    millions of parameters

    sto&asti&

    idden

    &ause

    visile

    effe&t

  • 8/13/2019 Jul09 Hinton Deeplearn

    13/126

    Te learnin5 rule for si5moid elief nets

    * Learnin5 is easy if 6e &an

    5et an uniased sample

    from te posterior

    distriution over idden

    states 5iven te oserveddata+

    * 9or ea& unit maximi>e

    te lo5 proaility tat itsinary state in te sample

    from te posterior 6ould e

    5enerated y te sampled

    inary states of its parents+

    +==

    j

    jijii

    wsspp

    )exp(1)(

    11

    ?

    i

    jiw

    )( iijji pssw =

    is

    js

    learnin5

    rate

  • 8/13/2019 Jul09 Hinton Deeplearn

    14/126

    Hxplainin5 a6ay (Iudea !earl#

    * Hven if t6o idden &auses are independent tey &ane&ome dependent 6en 6e oserve an effe&t tat tey &an

    ot influen&e+

    , f 6e learn tat tere 6as an eart;ua3e it redu&es te

    proaility tat te ouse ?umped e&ause of a tru&3+

    tru&3 its ouse eart;ua3e

    ouse ?umps

    20 20

    20

    .0 .0

    p(..#J+000.

    p(.0#J+/===

    p(0.#J+/===

    p(00#J+000.

    posterior

  • 8/13/2019 Jul09 Hinton Deeplearn

    15/126

    y it is usually very ard to learn

    si5moid elief nets one layer at a time* To learn 6e need te posterior

    distriution in te first idden layer+

    * !rolem .: Te posterior is typi&ally&ompli&ated e&ause of Kexplainin5a6ay+

    * !rolem 2:Te posterior dependson te prior as 6ell as te li3eliood+

    , "o to learn 6e need to 3no6te 6ei5ts in i5er layers evenif 6e are only approximatin5 te

    posterior+All te 6ei5ts intera&t+* !rolem -:e need to inte5rate

    over all possile &onfi5urations ofte i5er variales to 5et te prior

    for first idden layer+ Mu3

    data

    idden variales

    idden variales

    idden variales

    li3eliood

    prior

  • 8/13/2019 Jul09 Hinton Deeplearn

    16/126

    "ome metods of learnin5

    deep elief nets

    * 8onte Carlo metods &an e used to sample

    from te posterior+

    , But its painfully slo6 for lar5e deep models+

    * n te .==0@s people developed variationalmetods for learnin5 deep elief nets

    , Tese only 5et approximate samples from te

    posterior+, Neveteless te learnin5 is still 5uaranteed to

    improve a variational ound on te lo5

    proaility of 5eneratin5 te oserved data+

  • 8/13/2019 Jul09 Hinton Deeplearn

    17/126

    Te rea3trou5 tat ma3es deep

    learnin5 effi&ient

    * To learn deep nets effi&iently 6e need to learn one layer

    of features at a time+ Tis does not 6or3 6ell if 6e

    assume tat te latent variales are independent in te

    prior :

    , Te latent variales are not independent in te

    posterior so inferen&e is ard for nonlinear models+

    , Te learnin5 tries to find independent &auses usin5

    one idden layer 6i& is not usually possile+

    * e need a 6ay of learnin5 one layer at a time tat ta3es

    into a&&ount te fa&t tat 6e 6ill e learnin5 more

    idden layers later+

    , e solve tis prolem y usin5 an undire&ted model+

  • 8/13/2019 Jul09 Hinton Deeplearn

    18/126

    T6o types of 5enerative neural net6or3

    * f 6e &onne&t inary sto&asti& neurons in a

    dire&ted a&y&li& 5rap 6e 5et a "i5moid Belief

    Net ('adford Neal .==2#+

    * f 6e &onne&t inary sto&asti& neurons usin5

    symmetri& &onne&tions 6e 5et a Bolt>mann

    8a&ine (%inton ) "e?no6s3i .=-#+

    , f 6e restri&t te &onne&tivity in a spe&ial 6ay

    it is easy to learn a Bolt>mann ma&ine+

  • 8/13/2019 Jul09 Hinton Deeplearn

    19/126

    'estri&ted Bolt>mann 8a&ines("molens3y .=4 &alled tem Karmoniums#

    * e restri&t te &onne&tivity to ma3e

    learnin5 easier+

    , Enly one layer of idden units+

    * e 6ill deal 6it more layers later

    , No &onne&tions et6een idden units+

    * n an 'B8 te idden units are

    &onditionally independent 5iven te

    visile states+

    , "o 6e &an ;ui&3ly 5et an uniased

    sample from te posterior distriution

    6en 5iven a datave&tor+

    , Tis is a i5 advanta5e over dire&ted

    elief nets

    idden

    i

    ?

    visile

  • 8/13/2019 Jul09 Hinton Deeplearn

    20/126

    Te Hner5y of a ?oint &onfi5uration(i5norin5 terms to do 6it iases#

    = ji ijji whvv,hE ,)(6ei5t et6een

    units i and ?

    Hner5y 6it &onfi5uration

    von te visile units and

    on te idden units

    inary state of

    visile unit i

    inary state of

    idden unit ?

    ji

    ij

    hvw

    hvE=

    ),(

  • 8/13/2019 Jul09 Hinton Deeplearn

    21/126

    ei5tsHner5ies!roailities

    * Ha& possile ?oint &onfi5uration of te visileand idden units as an ener5y

    , Te ener5y is determined y te 6ei5ts and

    iases (as in a %opfield net#+* Te ener5y of a ?oint &onfi5uration of te visile

    and idden units determines its proaility:

    * Te proaility of a &onfi5uration over te visile

    units is found y summin5 te proailities of all

    te ?oint &onfi5urations tat &ontain it+

    ),(),(

    hvEhvp e

  • 8/13/2019 Jul09 Hinton Deeplearn

    22/126

    Usin5 ener5ies to define proailities

    * Te proaility of a ?oint&onfi5uration over ot visile

    and idden units depends on

    te ener5y of tat ?oint

    &onfi5uration &ompared 6itte ener5y of all oter ?oint

    &onfi5urations+

    * Te proaility of a&onfi5uration of te visile

    units is te sum of te

    proailities of all te ?oint

    &onfi5urations tat &ontain it+

    =

    gu

    guE

    hvE

    eehvp

    ,

    ),(

    ),(

    ),(

    =

    gu

    guEh

    hvE

    e

    e

    vp

    ,

    ),(

    ),(

    )(

    partition

    fun&tion

  • 8/13/2019 Jul09 Hinton Deeplearn

    23/126

    A pi&ture of te maximum li3eliood learnin5

    al5oritm for an 'B8

    0>< jihv

    >< jihv

    i

    ?

    i

    ?

    i

    ?

    i

    ?

    t J 0 t J . t J 2 t J infinity

    >

  • 8/13/2019 Jul09 Hinton Deeplearn

    24/126

    A ;ui&3 6ay to learn an 'B8

    0>< jihv

    1>< jihv

    i

    ?

    i

    ?

    t J 0 t J .

    )( 10

    >

  • 8/13/2019 Jul09 Hinton Deeplearn

    25/126

    %o6 to learn a set of features tat are 5ood for

    re&onstru&tin5 ima5es of te di5it 2

    0 inary

    feature

    neurons

    .4 x .4pixel

    ima5e

    0 inary

    feature

    neurons

    .4 x .4pixel

    ima5e

    n&rement6ei5tset6een an a&tive

    pixel and an a&tive

    feature

    De&rement 6ei5tset6een an a&tive

    pixel and an a&tive

    feature

    data(reality#

    re&onstru&tion

    (etter tan reality#

  • 8/13/2019 Jul09 Hinton Deeplearn

    26/126

    Te final 0x 24 6ei5ts

    Ha& neuron 5ras a different feature+

  • 8/13/2019 Jul09 Hinton Deeplearn

    27/126

    'e&onstru&tion

    from a&tivated

    inary featuresData

    'e&onstru&tion

    from a&tivated

    inary featuresData

    %o6 6ell &an 6e re&onstru&t te di5it ima5es

    from te inary feature a&tivations

    Ne6 test ima5es fromte di5it &lass tat te

    model 6as trained on

    ma5es from an

    unfamiliar di5it &lass

    (te net6or3 tries to see

    every ima5e as a 2#

  • 8/13/2019 Jul09 Hinton Deeplearn

    28/126

    Tree 6ays to &omine proaility density

    models (an underlyin5 teme of te tutorial#

    * Mixture: Ta3e a 6ei5ted avera5e of te distriutions+

    , t &an never e sarper tan te individual distriutions+t@s a very 6ea3 6ay to &omine models+

    * Product:8ultiply te distriutions at ea& point and tenrenormali>e (tis is o6 an 'B8 &omines te distriutions definedy ea& idden unit#

    , Hxponentiallymore po6erful tan a mixture+ Tenormali>ation ma3es maximum li3eliood learnin5

    diffi&ult ut approximations allo6 us to learn any6ay+* Composition:Use te values of te latent variales of onemodel as te data for te next model+

    , or3s 6ell for learnin5 multiple layers of representationut only if te individual models are undire&ted+

  • 8/13/2019 Jul09 Hinton Deeplearn

    29/126

    Trainin5 a deep net6or3(te main reason 'B8@s are interestin5#

    * 9irst train a layer of features tat re&eive input dire&tly

    from te pixels+

    * Ten treat te a&tivations of te trained features as if

    tey 6ere pixels and learn features of features in a

    se&ond idden layer+

    * t &an e proved tat ea& time 6e add anoter layer of

    features 6e improve a variational lo6er ound on te lo5

    proaility of te trainin5 data+

    , Te proof is sli5tly &ompli&ated+

    , But it is ased on a neat e;uivalen&e et6een an

    'B8 and a deep dire&ted model (des&ried later#

  • 8/13/2019 Jul09 Hinton Deeplearn

    30/126

    Te 5enerative model after learnin5 - layers

    * To 5enerate data:

    .+ $et an e;uilirium sample

    from te toplevel 'B8 y

    performin5 alternatin5 $is

    samplin5 for a lon5 time+2+ !erform a topdo6n pass to

    5et states for all te oter

    layers+

    "o te lo6er level ottomup

    &onne&tions are not part of

    te 5enerative model+ Tey

    are ?ust used for inferen&e+

    2

    data

    .

    -

    2W

    3W

    1W

    d d l i 3

  • 8/13/2019 Jul09 Hinton Deeplearn

    31/126

    y does 5reedy learnin5 6or3An aside: Avera5in5 fa&torial distriutions

    * f you avera5e some fa&torial distriutions you

    do NET 5et a fa&torial distriution+

    , n an 'B8 te posterior over te idden units

    is fa&torial for ea& visile ve&tor+, But te a55re5ated posterior over all trainin5

    &ases is not fa&torial (even if te data 6as

    5enerated y te 'B8 itself#+

  • 8/13/2019 Jul09 Hinton Deeplearn

    32/126

    y does 5reedy learnin5 6or3

    * Ha& 'B8 &onverts its data distriutioninto an a55re5ated posterior distriutionover its idden units+

    * Tis divides te tas3 of modelin5 itsdata into t6o tas3s:

    , Tas3 .:Learn 5enerative 6ei5tstat &an &onvert te a55re5ated

    posterior distriution over te iddenunits a&3 into te data distriution+

    , Tas3 2:Learn to model tea55re5ated posterior distriutionover te idden units+

    , Te 'B8 does a 5ood ?o of tas3 .and a moderately 5ood ?o of tas3 2+

    * Tas3 2 is easier (for te next 'B8# tanmodelin5 te ori5inal data e&ause tea55re5ated posterior distriution is&loser to a distriution tat an 'B8 &an

    model perfe&tly+

    data distriution

    on visile units

    a55re5ated

    posterior distriutionon idden units

    )|( Whp

    ),|( Whvp

    Tas3 2

    Tas3 .

  • 8/13/2019 Jul09 Hinton Deeplearn

    33/126

    y does 5reedy learnin5 6or3

    =h

    hvphpvp )|()()(

    Te 6ei5ts in te ottom level 'B8 definep(vG# and tey also indire&tly define p(#+

    "o 6e &an express te 'B8 model as

    f 6e leave p(vG# alone and improve p(# 6e 6ill

    improve p(v#+

    To improve p(# 6e need it to e a etter model of

    te a55re5ated posteriordistriution over idden

    ve&tors produ&ed y applyin5 to te data+

  • 8/13/2019 Jul09 Hinton Deeplearn

    34/126

    i& distriutions are fa&torial in a

    dire&ted elief net

    * n a dire&ted elief net 6it one idden layer te

    posterior over te idden units p(Gv# is non

    fa&torial (due to explainin5 a6ay#+

    , Te a55re5ated posterior is fa&torial if te

    data 6as 5enerated y te dire&ted model+

    * t@s te opposite 6ay round from an undire&ted

    model 6i& as fa&torial posteriors and a nonfa&torial prior p(# over te iddens+

    * Te intuitions tat people ave from usin5 dire&ted

    models are very misleadin5 for undire&ted models+

  • 8/13/2019 Jul09 Hinton Deeplearn

    35/126

    y does 5reedy learnin5 fail in a dire&ted module

    * A dire&ted module also &onverts its datadistriution into an a55re5ated posterior

    , Tas3 .Te learnin5 is no6 ardere&ause te posterior for ea& trainin5&ase is nonfa&torial+

    ,Tas3 2is performed usin5 anindependent prior+ Tis is a very adapproximation unless te a55re5atedposterior is &lose to fa&torial+

    * A dire&ted module attempts to ma3e te

    a55re5ated posterior fa&torial in one step+, Tis is too diffi&ult and leads to a ad

    &ompromise+ Tere is also no5uarantee tat te a55re5atedposterior is easier to model tan tedata distriution+

    data distriution

    on visile units

    )|( 2Whp

    ),|( 1Whvp

    Tas3 2

    Tas3 .

    a55re5ated

    posterior distriutionon idden units

  • 8/13/2019 Jul09 Hinton Deeplearn

    36/126

    A model of di5it re&o5nition

    2000 toplevel neurons

    00 neurons

    00 neurons

    2 x 2

    pixel

    ima5e

    .0 lael

    neurons

    Te model learns to 5enerate

    &ominations of laels and ima5es+

    To perform re&o5nition 6e start 6it aneutral state of te lael units and do

    an uppass from te ima5e follo6ed

    y a fe6 iterations of te toplevel

    asso&iative memory+

    Te top t6o layers form an

    asso&iative memory 6oseener5y lands&ape models te lo6

    dimensional manifolds of te

    di5its+

    Te ener5y valleys ave names

  • 8/13/2019 Jul09 Hinton Deeplearn

    37/126

    9inetunin5 6it a &ontrastive version of te

    K6a3esleep al5oritm

    After learnin5 many layers of features 6e &an finetune

    te features to improve 5eneration+

    .+ Do a sto&asti& ottomup pass

    ,Ad?ust te topdo6n 6ei5ts to e 5ood at

    re&onstru&tin5 te feature a&tivities in te layer elo6+

    -+ Do a fe6 iterations of samplin5 in te top level 'B8

    Ad?ust te 6ei5ts in te toplevel 'B8+

    /+ Do a sto&asti& topdo6n pass,Ad?ust te ottomup 6ei5ts to e 5ood at

    re&onstru&tin5 te feature a&tivities in te layer aove+

  • 8/13/2019 Jul09 Hinton Deeplearn

    38/126

    "o6 te movie of te net6or35eneratin5 di5its

    (availale at 666+&s+torontoO

  • 8/13/2019 Jul09 Hinton Deeplearn

    39/126

    "amples 5enerated y lettin5 te asso&iative

    memory run 6it one lael &lamped+ Tere are

    .000 iterations of alternatin5 $is samplin5

    et6een samples+

  • 8/13/2019 Jul09 Hinton Deeplearn

    40/126

    Hxamples of &orre&tly re&o5ni>ed and6ritten di5its

    tat te neural net6or3 ad never seen efore

    ts very

    5ood

  • 8/13/2019 Jul09 Hinton Deeplearn

    41/126

    %o6 6ell does it dis&riminate on 8N"T test set 6it

    no extra information aout 5eometri& distortions

    * $enerative model ased on 'B8@s .+2P

    * "upport e&tor 8a&ine (De&oste et+ al+# .+/P

    * Ba&3prop 6it .000 iddens (!latt#

  • 8/13/2019 Jul09 Hinton Deeplearn

    42/126

    Unsupervised Kpretrainin5 also elps for

    models tat ave more data and etter priors

    * 'an>ato et+ al+ (N!" 2004# used an additional

    400000 distorted di5its+

    * Tey also used &onvolutional multilayer neural

    net6or3s tat ave some uiltin lo&altranslational invarian&e+

    Ba&3propa5ation alone: 0+/=P

    Unsupervised layerylayer

    pretrainin5 follo6ed y a&3prop: 0+-=P (re&ord#

  • 8/13/2019 Jul09 Hinton Deeplearn

    43/126

    Anoter vie6 of 6y layerylayer

    learnin5 6or3s (%inton Esindero ) Te 2004#

    * Tere is an unexpe&ted e;uivalen&e et6een

    'B8@s and dire&ted net6or3s 6it many layers

    tat all use te same 6ei5ts+

    , Tis e;uivalen&e also 5ives insi5t into 6y

    &ontrastive diver5en&e learnin5 6or3s+

  • 8/13/2019 Jul09 Hinton Deeplearn

    44/126

    An infinite si5moid elief net

    tat is e;uivalent to an 'B8

    * Te distriution 5enerated y tis

    infinite dire&ted net 6it repli&ated

    6ei5ts is te e;uilirium distriution

    for a &ompatile pair of &onditional

    distriutions: p(vG# and p(Gv# tatare ot defined y

    ,A topdo6n pass of te dire&ted

    net is exa&tly e;uivalent to lettin5

    a 'estri&ted Bolt>mann 8a&inesettle to e;uilirium+

    , "o tis infinite dire&ted net

    defines te same distriution as

    an 'B8+

    W

    v.

    .

    v0

    0

    v2

    2

    TW

    TW

    TW

    W

    W

    et&+

  • 8/13/2019 Jul09 Hinton Deeplearn

    45/126

    * Te variales in 0 are &onditionallyindependent 5iven v0+

    , nferen&e is trivial+ e ?ust

    multiply v0 y transpose+

    , Te model aove 0 implementsa &omplementary prior+

    , 8ultiplyin5 v0 y transpose5ives te produ&tof te li3eliood

    term and te prior term+* nferen&e in te dire&ted net isexa&tly e;uivalent to lettin5 a'estri&ted Bolt>mann 8a&ine settleto e;uilirium startin5 at te data+

    nferen&e in a dire&ted net

    6it repli&ated 6ei5ts

    W

    v.

    .

    v0

    0

    v2

    2

    TW

    TW

    TW

    W

    W

    et&+

    R

    R

    R

    R

  • 8/13/2019 Jul09 Hinton Deeplearn

    46/126

    * Te learnin5 rule for a si5moid elief

    net is:

    * it repli&ated 6ei5ts tis e&omes:

    W

    v.

    .

    v0

    0

    v2

    2

    T

    W

    TW

    TW

    W

    W

    et&+

    0

    i

    s

    0

    js

    1js

    2

    js

    1

    is

    2

    is

    +

    +

    +

    ij

    iij

    jji

    iij

    ss

    sss

    sss

    sss

    ...)(

    )(

    )(

    211

    101

    100

    TW

    TW

    TW

    W

    W

    )( iijij sssw

  • 8/13/2019 Jul09 Hinton Deeplearn

    47/126

    * 9irst learn 6it all te 6ei5ts tied, Tis is exa&tly e;uivalent to

    learnin5 an 'B8

    , Contrastive diver5en&e learnin5

    is e;uivalent to i5norin5 te smallderivatives &ontriuted y te tied

    6ei5ts et6een deeper layers+

    Learnin5 a deep dire&ted

    net6or3

    W

    W

    v.

    .

    v0

    0

    v2

    2

    TW

    TW

    TW

    W

    et&+

    v0

    0

    W

  • 8/13/2019 Jul09 Hinton Deeplearn

    48/126

    * Ten free>e te first layer of 6ei5ts

    in ot dire&tions and learn te

    remainin5 6ei5ts (still tied

    to5eter#+

    , Tis is e;uivalent to learnin5

    anoter 'B8 usin5 te

    a55re5ated posterior distriution

    of 0 as te data+

    W

    v.

    .

    v0

    0

    v2

    2

    TW

    TW

    TW

    W

    et&+

    frozenW

    v.

    0

    W

    TfrozenW

  • 8/13/2019 Jul09 Hinton Deeplearn

    49/126

    %o6 many layers sould 6e use and o6

    6ide sould tey e

    * Tere is no simple ans6er+

    , Hxtensive experiments y Mosua Ben5io@s 5roup

    (des&ried later# su55est tat several idden layers is

    etter tan one+, 'esults are fairly roust a5ainst &an5es in te si>e of a

    layer ut te top layer sould e i5+

    * Deep elief nets 5ive teir &reator a lot of freedom+

    , Te est 6ay to use tat freedom depends on te tas3+, it enou5 narro6 layers 6e &an model any distriution

    over inary ve&tors ("uts3ever ) %inton 2007#

  • 8/13/2019 Jul09 Hinton Deeplearn

    50/126

    at appens 6en te 6ei5ts in i5er layers

    e&ome different from te 6ei5ts in te first layer

    * Te i5er layers no lon5er implement a &omplementaryprior+, "o performin5 inferen&e usin5 te fro>en 6ei5ts in

    te first layer is no lon5er &orre&t+ But its still pretty5ood+

    , Usin5 tis in&orre&t inferen&e pro&edure 5ives avariational lo6er ound on te lo5 proaility of tedata+

    * Te i5er layers learn a prior tat is &loser to te

    a55re5ated posterior distriution of te first idden layer+, Tis improves te net6or3@s model of te data+

    * %inton Esindero and Te (2004# prove tat tisimprovement is al6ays i55er tan te loss in te variationalound &aused y usin5 less a&&urate inferen&e+

  • 8/13/2019 Jul09 Hinton Deeplearn

    51/126

    An improved version of Contrastive

    Diver5en&e learnin5 (if time permits#

    * Te main 6orry 6it CD is tat tere 6ill e deepminima of te ener5y fun&tion far a6ay from tedata+, To find tese 6e need to run te 8ar3ov &ain for

    a lon5 time (maye tousands of steps#+, But 6e &annot afford to run te &ain for too lon5for ea& update of te 6ei5ts+

    * 8aye 6e &an run te same 8ar3ov &ain overmany 6ei5t updates (Neal .==2#

    , f te learnin5 rate is very small tis sould ee;uivalent to runnin5 te &ain for many stepsand ten doin5 a i55er 6ei5t update+

  • 8/13/2019 Jul09 Hinton Deeplearn

    52/126

    !ersistent CD(Ti?men Teileman C8L 200 ) 200=#

    * Use miniat&es of .00 &ases to estimate te

    first term in te 5radient+ Use a sin5le at& of

    .00 fantasies to estimate te se&ond term in te

    5radient+

    * After ea& 6ei5t update 5enerate te ne6

    fantasies from te previous fantasies y usin5one alternatin5 $is update+

    , "o te fantasies &an 5et far from te data+

    C t ti di

  • 8/13/2019 Jul09 Hinton Deeplearn

    53/126

    Contrastive diver5en&e as an

    adversarial 5ame

    * y does persisitent CD 6or3 so 6ell 6it only

    .00 ne5ative examples to &ara&teri>e te

    6ole partition fun&tion

    , 9or all interestin5 prolems te partition

    fun&tion is i5ly multimodal+

    , %o6 does it mana5e to find all te modes

    6itout startin5 at te data

  • 8/13/2019 Jul09 Hinton Deeplearn

    54/126

    Te learnin5 &auses very fast mixin5

    * Te learnin5 intera&ts 6it te 8ar3ov &ain+

    * !ersisitent Contrastive Diver5en&e &annot eanalysed y vie6in5 te learnin5 as an outer loop+

    , erever te fantasies outnumer te

    positive data te freeener5y surfa&e israised+ Tis ma3es te fantasies rus around

    ypera&tively+

    % i t t CD t t

  • 8/13/2019 Jul09 Hinton Deeplearn

    55/126

    %o6 persistent CD moves et6een te

    modes of te model@s distriution

    * f a mode as more fantasy

    parti&les tan data te free

    ener5y surfa&e is raised until

    te fantasy parti&les es&ape+

    , Tis &an over&ome free

    ener5y arriers tat 6ould

    e too i5 for te 8ar3ov

    Cain to ?ump+

    * Te freeener5y surfa&e is

    ein5 &an5ed to elp

    mixin5 in addition to definin5

    te model+

  • 8/13/2019 Jul09 Hinton Deeplearn

    56/126

    "ummary so far

    * 'estri&ted Bolt>mann 8a&ines provide a simple 6ay tolearn a layer of features 6itout any supervision+

    , 8aximum li3eliood learnin5 is &omputationallyexpensive e&ause of te normali>ation term ut

    &ontrastive diver5en&e learnin5 is fast and usually6or3s 6ell+

    * 8any layers of representation &an e learned y treatin5te idden states of one 'B8 as te visile data fortrainin5 te next 'B8 (a &omposition of experts#+

    * Tis &reates 5ood 5enerative models tat &an ten efinetuned+

    , Contrastive 6a3esleep &an finetune 5eneration+

  • 8/13/2019 Jul09 Hinton Deeplearn

    57/126

    B'HAF

  • 8/13/2019 Jul09 Hinton Deeplearn

    58/126

    Evervie6 of te rest of te tutorial

    * %o6 to finetune a 5reedily trained 5enerativemodel to e etter at dis&rimination+

    * %o6 to learn a 3ernel for a $aussian pro&ess+

    * %o6 to use deep elief nets for nonlinear

    dimensionality redu&tion and do&ument retrieval+

    * %o6 to learn a 5enerative ierar&y of

    &onditional random fields+

    * A more advan&ed learnin5 module for deepelief nets tat &ontains multipli&ative

    intera&tions+

    * %o6 to learn deep models of se;uential data+

  • 8/13/2019 Jul09 Hinton Deeplearn

    59/126

    9inetunin5 for dis&rimination

    * 9irst learn one layer at a time 5reedily+

    * Ten treat tis as Kpretrainin5 tat finds a 5oodinitial set of 6ei5ts 6i& &an e finetuned ya lo&al sear& pro&edure+

    , Contrastive 6a3esleep is one 6ay of finetunin5 te model to e etter at 5eneration+

    * Ba&3propa5ation &an e used to finetune te

    model for etter dis&rimination+, Tis over&omes many of te limitations of

    standard a&3propa5ation+

  • 8/13/2019 Jul09 Hinton Deeplearn

    60/126

    y a&3propa5ation 6or3s etter 6it

    5reedy pretrainin5: Te optimi>ation vie6

    * $reedily learnin5 one layer at a time s&ales 6ellto really i5 net6or3s espe&ially if 6e ave

    lo&ality in ea& layer+

    * e do not start a&3propa5ation until 6e alreadyave sensile feature dete&tors tat souldalready e very elpful for te dis&rimination tas3+, "o te initial 5radients are sensile and

    a&3prop only needs to perform a lo&alsear&from a sensile startin5 point+

    y a&3propa5ation 6or3s etter 6it

  • 8/13/2019 Jul09 Hinton Deeplearn

    61/126

    y a&3propa5ation 6or3s etter 6it

    5reedy pretrainin5: Te overfittin5 vie6

    * 8ost of te information in te final 6ei5ts &omes frommodelin5 te distriution of input ve&tors+, Te input ve&tors 5enerally &ontain a lot more

    information tan te laels+

    , Te pre&ious information in te laels is only used forte final finetunin5+

    , Te finetunin5 only modifies te features sli5tly to 5ette &ate5ory oundaries ri5t+ t does not need todis&over features+

    * Tis type of a&3propa5ation 6or3s 6ell even if most ofte trainin5 data is unlaeled+, Te unlaeled data is still very useful for dis&overin5

    5ood features+

  • 8/13/2019 Jul09 Hinton Deeplearn

    62/126

    9irst model te distriution of di5it ima5es

    2000 units

    00 units

    00 units

    2 x 2

    pixel

    ima5e

    Te net6or3 learns a density model for

    unlaeled di5it ima5es+ en 6e 5enerate

    from te model 6e 5et tin5s tat loo3 li3e

    real di5its of all &lasses+

    But do te idden features really elp 6itdi5it dis&rimination

    Add .0 softmaxed units to te top and do

    a&3propa5ation+

    Te top t6o layers form a restri&ted

    Bolt>mann ma&ine 6ose free ener5y

    lands&ape sould model te lo6

    dimensional manifolds of te di5its+

  • 8/13/2019 Jul09 Hinton Deeplearn

    63/126

    'esults on permutationinvariant 8N"T tas3

    * ery &arefully trained a&3prop net 6it .+4Pone or t6o idden layers (!lattS %inton#

    * "8 (De&oste ) "&oel3opf 2002# .+/P

    * $enerative model of ?oint density of .+2Pima5es and laels (R 5enerative finetunin5#

    * $enerative model of unlaelled di5its .+.Pfollo6ed y 5entle a&3propa5ation(%inton ) "ala3utdinov "&ien&e 2004#

  • 8/13/2019 Jul09 Hinton Deeplearn

    64/126

    Learnin5 Dynami&s of Deep Nets

    te next / slides des&rie 6or3 y Mosua Ben5io@s 5roup

    Before fine-tuning After fine-tuning

  • 8/13/2019 Jul09 Hinton Deeplearn

    65/126

    Hffe&t of Unsupervised !retrainin5

    4

    Erhan et. al. AISTATS2009

  • 8/13/2019 Jul09 Hinton Deeplearn

    66/126

    Hffe&t of Dept

    44

    w/o pre-trainingwith pre-trainingwithout pre-training

    L i T ? t i i 9 ti "

  • 8/13/2019 Jul09 Hinton Deeplearn

    67/126

    Learnin5 Tra?e&tories in 9un&tion "pa&e(a 2D visuali>ation produ&ed 6it t"NH#

    * Ha& point is a

    model in fun&tion

    spa&e

    * Color J epo&

    * Top: tra?e&tories

    6itout pretrainin5+

    Ha& tra?e&tory

    &onver5es to a

    different lo&al min+

    * Bottom: Tra?e&tories

    6it pretrainin5+

    * No overlap

    Erhan et. al. AISTATS2009

    i d t i i 3

  • 8/13/2019 Jul09 Hinton Deeplearn

    68/126

    y unsupervised pretrainin5 ma3es sense

    stuff

    ima5e lael

    stuff

    ima5e lael

    f ima5elael pairs 6ere

    5enerated tis 6ay it

    6ould ma3e sense to tryto 5o strai5t from

    ima5es to laels+

    9or example do te

    pixels ave even parity

    f ima5elael pairs are

    5enerated tis 6ay it

    ma3es sense to first learnto re&over te stuff tat

    &aused te ima5e y

    invertin5 te i5

    and6idt pat6ay+

    i5

    and6idtlo6

    and6idt

  • 8/13/2019 Jul09 Hinton Deeplearn

    69/126

    8odelin5 realvalued data

    * 9or ima5es of di5its it is possile to representintermediate intensities as if tey 6ere proailities y

    usin5 Kmeanfield lo5isti& units+

    , e &an treat intermediate values as te proaility

    tat te pixel is in3ed+* Tis 6ill not 6or3 for real ima5es+

    , n a real ima5e te intensity of a pixel is almost

    al6ays almost exa&tly te avera5e of te nei5orin5

    pixels+, 8eanfield lo5isti& units &annot represent pre&ise

    intermediate values+

  • 8/13/2019 Jul09 Hinton Deeplearn

    70/126

    'epla&in5 inary variales y

    inte5ervalued variales

    (Te and %inton 200.#

    * Ene 6ay to model an inte5ervalued variale is

    to ma3e N identi&al &opies of a inary unit+

    * All &opies ave te same proaility

    of ein5 Kon : p J lo5isti&(x#

    , Te total numer of Kon &opies is li3e te

    firin5 rate of a neuron+, t as a inomial distriution 6it mean N p

    and varian&e N p(.p#

  • 8/13/2019 Jul09 Hinton Deeplearn

    71/126

    A etter 6ay to implement inte5er values

    * 8a3e many &opies of a inary unit+* All &opies ave te same 6ei5ts and te same

    adaptive ias ut tey ave different fixed offsets to

    te ias:

    ....,5.3,5.2,5.1,5.0 bbbb

    x

  • 8/13/2019 Jul09 Hinton Deeplearn

    72/126

    A fast approximation

    * Contrastive diver5en&e learnin5 6or3s 6ell for te sum of

    inary units 6it offset iases+* t also 6or3s for re&tified linear units+ Tese are mu& faster

    to &ompute tan te sum of many lo5isti& units+

    output J max(0 x R randns;rt(lo5isti&(x## #

    )1log()5.0(logistic

    1

    x

    n

    n

    enx ++=

    =

    %o6 to train a ipartite net6or3 of re&tified

  • 8/13/2019 Jul09 Hinton Deeplearn

    73/126

    %o6 to train a ipartite net6or3 of re&tified

    linear units

    * Iust use &ontrastive diver5en&e to lo6er te ener5y ofdata and raise te ener5y of neary &onfi5urations tatte model prefers to te data+

    data>< jihv

    recon>< jihv

    i

    ?

    i

    ?

    )( recondata >

  • 8/13/2019 Jul09 Hinton Deeplearn

    74/126

    3D Object Recognition: The NORB dataset

    Stereopairs o! gra"sca#e images o! to" objects$

    % #ighting conditions& '%( )ie*points+i)e object instances per c#ass in the training set, differentset o! !i)e instances per c#ass in the test set

    (-&3.. training cases& (-&3.. test cases

    ,nima#s

    /umans

    P#anes

    Truc0s

    Cars

    Norma#i1ed

    uni!orm

    )ersion o!

    NORB

  • 8/13/2019 Jul09 Hinton Deeplearn

    75/126

    "implifyin5 te data

    * Ha& trainin5 &ase is a stereopair of =4x=4 ima5es+

    , Te o?e&t is &entered+

    , Te ed5es of te ima5e are mainly lan3+

    , Te a&35round is uniform and ri5t+* To ma3e learnin5 faster used simplified te data:

    , Tro6 a6ay one ima5e+

    , Enly use te middle 4/x4/ pixels of te oter

    ima5e+

    , Do6nsample to -2x-2 y avera5in5 / pixels+

    "implifyin5 te data even more so tat it &an

  • 8/13/2019 Jul09 Hinton Deeplearn

    76/126

    "implifyin5 te data even more so tat it &an

    e modeled y re&tified linear units

    * Te intensity isto5ram for ea& -2x-2 ima5e as asarp pea3 for te ri5t a&35round+

    * 9ind tis pea3 and &all it >ero+

    * Call all intensities ri5ter tan te a&35round >ero+

    * 8easure intensities do6n6ards from te a&35round

    intensity+

    0

    Test set error rates on NE'B after 5reedy

  • 8/13/2019 Jul09 Hinton Deeplearn

    77/126

    learnin5 of one or t6o idden layers usin5

    re&tified linear units

    9ull NE'B (2 ima5es of =4x=4#

    * Lo5isti& re5ression on te ra6 pixels 20+P

    * $aussian "8 (trained y Leon Bottou# ..+4P

    * Convolutional neural net (Le Cun@s 5roup# 4+0P(&onvolutional nets ave 3no6led5e of translations uilt in#

    'edu&ed NE'B (. ima5e -2x-2#

    * Lo5isti& re5ression on te ra6 pixels-0+2P

    * Lo5isti& re5ression on first idden layer ./+=P

    * Lo5isti& re5ression on se&ond idden layer .0+2P

    T

  • 8/13/2019 Jul09 Hinton Deeplearn

    78/126

    Te

    re&eptive

    fields of

    somere&tified

    linear

    idden

    units+

    A standard type of realvalued visile unit

  • 8/13/2019 Jul09 Hinton Deeplearn

    79/126

    A standard type of realvalued visile unit

    * e &an model pixels as$aussian variales+

    Alternatin5 $issamplin5 is still easytou5 learnin5 needs to

    e mu& slo6er+

    ijj

    ji i

    iv

    hidj

    jj

    visi i

    ii whhbbv,E =,

    2

    2

    2

    )()(

    hv

    H

    ener5y5radient

    produ&ed y te total

    input to a visile unit

    paraoli&

    &ontainment

    fun&tion

    ii vb

    ellin5 et+ al+ (200# so6 o6 to extend 'B8@s to te

    exponential family+ "ee also Ben5io et+ al+ (2007#

    A random sample of .0000 inary filters learned

  • 8/13/2019 Jul09 Hinton Deeplearn

    80/126

    y Alex Fri>evs3y on a million -2x-2 &olor ima5es+

    Cominin5 deep elief nets 6it $aussian pro&esses

  • 8/13/2019 Jul09 Hinton Deeplearn

    81/126

    Cominin5 deep elief nets 6it $aussian pro&esses

    * Deep elief nets &an enefit a lot from unlaeled data

    6en laeled data is s&ar&e+, Tey ?ust use te laeled data for finetunin5+

    * Fernel metods li3e $aussian pro&esses 6or3 6ell onsmall laeled trainin5 sets ut are slo6 for lar5e trainin5sets+

    * "o 6en tere is a lot of unlaeled data and only a littlelaeled data &omine te t6o approa&es:, 9irst learn a deep elief net 6itout usin5 te laels+, Ten apply a $aussian pro&ess model to te deepest

    layer of features+ Tis 6or3s etter tan usin5 te ra6data+

    , Ten use $!@s to 5et te derivatives tat are a&3propa5ated trou5 te deep elief net+ Tis is afurter 6in+ t allo6s $!@s to finetune &ompli&ateddomainspe&ifi& 3ernels+

    Learnin5 to extra&t te orientation of a fa&e pat&

  • 8/13/2019 Jul09 Hinton Deeplearn

    82/126

    Learnin5 to extra&t te orientation of a fa&e pat&("ala3utdinov ) %inton N!" 2007#

    Te trainin5 and test sets for predi&tin5

  • 8/13/2019 Jul09 Hinton Deeplearn

    83/126

    Te trainin5 and test sets for predi&tin5

    fa&e orientation

    ..000 unlaeled &ases.00 00 or .000 laeled &ases

    fa&e pat&es from ne6 people

    Te root mean s;uared error in te orientation

  • 8/13/2019 Jul09 Hinton Deeplearn

    84/126

    Te root mean s;uared error in te orientation

    6en &ominin5 $!@s 6it deep elief nets

    22+2 .7+= .+2

    .7+2 .2+7 7+2

    .4+- ..+2 4+/

    $! on

    te

    pixels

    $! on

    toplevel

    features

    $! on toplevel

    features 6it

    finetunin5

    .00 laels 00 laels

    .000 laels

    Con&lusion: Te deep features are mu& etter

    tan te pixels+ 9inetunin5 elps a lot+

    Deep Autoen&oders 2x2W T

  • 8/13/2019 Jul09 Hinton Deeplearn

    85/126

    (%inton ) "ala3utdinov 2004#

    * Tey al6ays loo3ed li3e a really

    ni&e 6ay to do nonlinear

    dimensionality redu&tion:

    , But it is very diffi&ult to

    optimi>e deep autoen&oders

    usin5 a&3propa5ation+

    * e no6 ave a mu& etter 6ay

    to optimi>e tem:

    , 9irst train a sta&3 of / 'B8@s

    , Ten Kunroll tem+

    , Ten finetune 6it a&3prop+

    .000 neurons

    00 neurons

    00 neurons

    20 neurons

    20 neurons

    -0

    .000 neurons

    2x2

    1

    2

    3

    4

    4

    3

    2

    1

    W

    W

    W

    W

    W

    W

    W

    W

    T

    T

    T

    T

    linearunits

    A &omparison of metods for &ompressin5

  • 8/13/2019 Jul09 Hinton Deeplearn

    86/126

    A &omparison of metods for &ompressin5

    di5it ima5es to -0 real numers+

    real

    data

    -0Ddeep auto

    -0D lo5isti&

    !CA

    -0D

    !CA

    'etrievin5 do&uments tat are similar

  • 8/13/2019 Jul09 Hinton Deeplearn

    87/126

    'etrievin5 do&uments tat are similar

    to a ;uery do&ument

    * e &an use an autoen&oder to find lo6dimensional &odes for do&uments tat allo6

    fast and a&&urate retrieval of similar

    do&uments from a lar5e set+

    * e start y &onvertin5 ea& do&ument into a

    Ka5 of 6ords+ Tis a 2000 dimensional

    ve&tor tat &ontains te &ounts for ea& of te2000 &ommonest 6ords+

    %o6 to &ompress te &ount ve&tor

  • 8/13/2019 Jul09 Hinton Deeplearn

    88/126

    p

    * e train te neuralnet6or3 to reprodu&e its

    input ve&tor as its output

    * Tis for&es it to

    &ompress as mu&information as possile

    into te .0 numers in

    te &entral ottlene&3+

    * Tese .0 numers areten a 5ood 6ay to

    &ompare do&uments+

    2000 re&onstru&ted &ounts

    00 neurons

    2000 6ord &ounts

    00 neurons

    20 neurons

    20 neurons

    .0

    input

    ve&tor

    output

    ve&tor

    !erforman&e of te autoen&oder at

  • 8/13/2019 Jul09 Hinton Deeplearn

    89/126

    !erforman&e of te autoen&oder at

    do&ument retrieval

    * Train on a5s of 2000 6ords for /00000 trainin5 &asesof usiness do&uments+, 9irst train a sta&3 of 'B8@s+ Ten finetune 6it

    a&3prop+* Test on a separate /00000 do&uments+

    , !i&3 one test do&ument as a ;uery+ 'an3 order all teoter test do&uments y usin5 te &osine of te an5leet6een &odes+

    , 'epeat tis usin5 ea& of te /00000 test do&umentsas te ;uery (re;uires 0+.4 trillion &omparisons#+

    * !lot te numer of retrieved do&uments a5ainst teproportion tat are in te same andlaeled &lass as te;uery do&ument+

    !roportion of retrieved do&uments in same &lass as ;uery

  • 8/13/2019 Jul09 Hinton Deeplearn

    90/126

    p ; y

    Numer of do&uments retrieved

    9irst &ompress all do&uments to 2 numers usin5 a type of !CA

  • 8/13/2019 Jul09 Hinton Deeplearn

    91/126

    9irst &ompress all do&uments to 2 numers usin5 a type of !CA

    Ten use different &olors for different

    do&ument &ate5ories

    9irst &ompress all do&uments to 2 numers+Ten use different &olors for different do&ument &ate5ories

  • 8/13/2019 Jul09 Hinton Deeplearn

    92/126

    5

    9indin5 inary &odes for do&uments

  • 8/13/2019 Jul09 Hinton Deeplearn

    93/126

    d 5 a y &odes o do&u e s

    *Train an autoen&oder usin5 -0lo5isti& units for te &ode layer+

    * Durin5 te finetunin5 sta5eadd noise to te inputs to te&ode units+,

    Te Knoise ve&tor for ea&trainin5 &ase is fixed+ "o 6estill 5et a deterministi&5radient+

    , Te noise for&es teira&tivities to e&ome imodalin order to resist te effe&tsof te noise+

    , Ten 6e simply round tea&tivities of te -0 &ode unitsto . or 0+

    2000 re&onstru&ted &ounts

    00 neurons

    2000 6ord &ounts

    00 neurons

    20 neurons

    20 neurons

    -0

    noise

    "emanti& asin5: Usin5 a deep autoen&oder as aasfun&tion for findin5 approximate mat&es

  • 8/13/2019 Jul09 Hinton Deeplearn

    94/126

    as fun&tion for findin5 approximatemat&es

    ("ala3utdinov ) %inton 2007#

    as

    fun&tion

    Ksupermar3et sear&

    %o6 5ood is a sortlist found tis 6ay

  • 8/13/2019 Jul09 Hinton Deeplearn

    95/126

    5 y

    * e ave only implemented it for a milliondo&uments 6it 20it &odes ut 6at &ould

    possily 5o 6ron5

    ,A 20D yper&ue allo6s us to &apture enou5

    of te similarity stru&ture of our do&ument set+

    * Te sortlist found usin5 inary &odes a&tually

    improves te pre&isionre&all &urves of T9D9+

    , Lo&ality sensitive asin5 (te fastest otermetod# is 0 times slo6er and as 6orse

    pre&isionre&all &urves+

    $eneratin5 te parts of an o?e&t

  • 8/13/2019 Jul09 Hinton Deeplearn

    96/126

    $eneratin5 te parts of an o?e&t

    * Ene 6ay to maintain te&onstraints et6een te parts isto 5enerate ea& part verya&&urately

    , But tis 6ould re;uire a lot of&ommuni&ation and6idt+

    * "loppy topdo6n spe&ifi&ation ofte parts is less demandin5

    , ut it messes up relationsipset6een features

    , so use redundant featuresand use lateral intera&tions to&lean up te mess+

    * Ha& transformed feature elpsto lo&ate te oters

    , Tis allo6s a noisy &annel

    sloppy topdo6n

    a&tivation of parts

    &leanup usin53no6n intera&tions

    pose parameters

    features 6ittopdo6n

    support

    Ks;uare

    R

    ts li3e soldiers on

    a parade 5round

    "emirestri&ted Bolt>mann 8a&ines

  • 8/13/2019 Jul09 Hinton Deeplearn

    97/126

    * e restri&t te &onne&tivity to ma3e

    learnin5 easier+* Contrastive diver5en&e learnin5 re;uires

    te idden units to e in &onditional

    e;uilirium 6it te visiles+

    , But it does not re;uire te visile unitsto e in &onditional e;uilirium 6it te

    iddens+

    ,All 6e re;uire is tat te visile units

    are &loser to e;uilirium in tere&onstru&tions tan in te data+

    * "o 6e &an allo6 &onne&tions et6een

    te visiles+

    idden

    i

    ?

    visile

    Learnin5 a semirestri&ted Bolt>mann 8a&ine

  • 8/13/2019 Jul09 Hinton Deeplearn

    98/126

    0>< jihv

    1>< jihv

    i

    ?

    i

    ?

    t J 0 t J .

    )( 10

    >mann

  • 8/13/2019 Jul09 Hinton Deeplearn

    99/126

    5

    8a&ines

    * 8etod .:To form a re&onstru&tion &y&letrou5 te visile units updatin5 ea& in turn

    usin5 te topdo6n input from te iddens plus

    te lateral input from te oter visiles+

    * 8etod 2:Use Kmean field visile units tat

    ave real values+ Update tem all in parallel+

    , Use dampin5 to prevent os&illations

    )()(11 iti

    ti xpp +=+

    total input to idampin5

    'esults on modelin5 natural ima5e pat&es

  • 8/13/2019 Jul09 Hinton Deeplearn

    100/126

    5 5 p

    usin5 a sta&3 of 'B8@s (Esindero and %inton#

    * "ta&3 of 'B8@s learned one at a time+* /00 $aussian visile units tat see

    6itened ima5e pat&es, Derived from .00000 an %ateren

    ima5e pat&es ea& 20x20* Te idden units are all inary+

    , Te lateral &onne&tions arelearned 6en tey are te visileunits of teir 'B8+

    * 'e&onstru&tion involves lettin5 tevisile units of ea& 'B8 settle usin5meanfield dynami&s+, Te already de&ided states in te

    level aove determine te effe&tiveiases durin5 meanfield settlin5+

    Dire&ted Conne&tions

    Dire&ted Conne&tions

    Undire&ted Conne&tions

    /00

    $aussian

    units

    %idden

    8'9 6it

    2000 units

    %idden

    8'9 6it00 units

    .000 toplevel units+

    No 8'9+

    itout lateral &onne&tions

  • 8/13/2019 Jul09 Hinton Deeplearn

    101/126

    real data samples from model

    it lateral &onne&tions

  • 8/13/2019 Jul09 Hinton Deeplearn

    102/126

    real data samples from model

    A funny 6ay to use an 8'9

  • 8/13/2019 Jul09 Hinton Deeplearn

    103/126

    A funny 6ay to use an 8'9

    * Te lateral &onne&tions form an 8'9+* Te 8'9 is used durin5 learnin5 and 5eneration+

    * Te 8'9 is notused for inferen&e+

    , Tis is a novel idea so vision resear&ers don@t li3e it+

    * Te 8'9 enfor&es &onstraints+ Durin5 inferen&e&onstraints do not need to e enfor&ed e&ause te dataoeys tem+

    , Te &onstraints only need to e enfor&ed durin55eneration+

    * Unoserved idden units &annot enfor&e &onstraints+

    , To enfor&e &onstraints re;uires lateral &onne&tions oroserved des&endants+

    y do 6e 6iten data

  • 8/13/2019 Jul09 Hinton Deeplearn

    104/126

    y do 6e 6iten data

    * ma5es typi&ally ave stron5 pair6ise &orrelations+* Learnin5 i5er order statisti&s is diffi&ult 6en tere are

    stron5 pair6ise &orrelations+

    , "mall &an5es in parameter values tat improve te

    modelin5 of i5erorder statisti&s may e re?e&tede&ause tey form a sli5tly 6orse model of te mu&

    stron5er pair6ise statisti&s+

    * "o 6e often remove te se&ondorder statisti&s efore

    tryin5 to learn te i5erorder statisti&s+

    itenin5 te learnin5 si5nal instead

  • 8/13/2019 Jul09 Hinton Deeplearn

    105/126

    of te data

    * Contrastive diver5en&e learnin5 &an remove te effe&tsof te se&ondorder statisti&s on te learnin56itouta&tually &an5in5 te data+

    , Te lateral &onne&tions model te se&ond order

    statisti&s, f a pixel &an e re&onstru&ted &orre&tly usin5 se&ond

    order statisti&s its 6ill e te same in tere&onstru&tion as in te data+

    , Te idden units &an ten fo&us on modelin5 i5order stru&ture tat &annot e predi&ted y te lateral&onne&tions+

    * 9or example a pixel &lose to an ed5e 6ere interpolationfrom neary pixels &auses in&orre&t smootin5+

    To6ards a more po6erful multilinear

  • 8/13/2019 Jul09 Hinton Deeplearn

    106/126

    sta&3ale learnin5 module

    * "o far te states of te units in one layer ave only eenused to determine te effe&tive iases of te units in te

    layer elo6+

    * t 6ould e mu& more po6erful to modulate te pair6ise

    intera&tions in te layer elo6+,A 5ood 6ay to desi5n a ierar&i&al system is to allo6

    ea& level to determine te o?e&tive fun&tion of te level

    elo6+

    * To modulate pair6ise intera&tions 6e need i5erorderBolt>mann ma&ines+

    %i5er order Bolt>mann ma&ines("e?no6s3i

  • 8/13/2019 Jul09 Hinton Deeplearn

    107/126

    ("e?no6s3i

  • 8/13/2019 Jul09 Hinton Deeplearn

    108/126

    model ima5e transformations(te unfa&tored version#

    * A 5loal transformation spe&ifies 6i& pixel

    5oes to 6i& oter pixel+

    * Conversely ea& pair of similar intensity pixels

    one in ea& ima5e votes for a parti&ular 5loaltransformation+

    ima5e(t# ima5e(tR.#

    ima5e transformation

    9a&torin5 tree6ay

  • 8/13/2019 Jul09 Hinton Deeplearn

    109/126

    5 y

    multipli&ative intera&tions

    =

    =

    fhfjfifhj

    hjii

    ijhhj

    hji

    i

    wwwsssE

    wsssE

    ,,

    ,,

    fa&tored6it linearly

    many parameters

    per fa&tor+

    unfa&tored6it &ui&ally

    many parameters

    A pi&ture of te lo6ran3 tensor

    &ontri ted fa&tor f

  • 8/13/2019 Jul09 Hinton Deeplearn

    110/126

    &ontriuted y fa&tor f

    ifw

    jfw

    hfw

    Ha& layer is a s&aled version

    of te same matrix+

    Te asis matrix is spe&ified

    as an outer produ&t 6ittypi&al term

    "o ea& a&tive idden unit

    &ontriutes a s&alartimes te matrix spe&ified y

    fa&tor f +

    jfifww

    hfw

    nferen&e 6it fa&tored tree6ay

  • 8/13/2019 Jul09 Hinton Deeplearn

    111/126

    multipli&ative intera&tions

    [ ]

    =

    =

    ==

    j

    jfjif

    i

    ihfhfhf

    hfjfifhj

    hji

    if

    wswswsEsE

    wwwsssE

    )()( 10

    ,,

    %o6 &an5in5 te inary state

    of unit &an5es te ener5y

    &ontriuted y fa&tor f+

    at unit needs

    to 3no6 in order to

    do $is samplin5

    Te ener5y

    &ontriuted y

    fa&tor f+

    Belief propa5ation

  • 8/13/2019 Jul09 Hinton Deeplearn

    112/126

    p p 5

    ifw jfw

    hfw

    f

    i j

    h

    Te out5oin5 messa5e

    at ea& vertex of te

    fa&tor is te produ&t of

    te 6ei5ted sums atte oter t6o verti&es+

    Learnin5 6it fa&tored tree6ay

  • 8/13/2019 Jul09 Hinton Deeplearn

    113/126

    multipli&ative intera&tions

    delmodata

    modeldata

    hfh

    hfh

    hf

    f

    hf

    f

    hf

    j

    jfjif

    i

    ihf

    msms

    w

    E

    w

    Ew

    wswsm

    =

    = messa5e

    from fa&tor fto unit

    'oland data

  • 8/13/2019 Jul09 Hinton Deeplearn

    114/126

    'oland data

    8odelin5 te &orrelational stru&ture of a stati& ima5ey usin5 t6o &opies of te ima5e

  • 8/13/2019 Jul09 Hinton Deeplearn

    115/126

    ifw jfw

    hfw

    f

    i j

    h

    Ha& fa&tor sends te

    s;uared output of a linearfilter to te idden units+

    t is exa&tly te standard

    model of simple and&omplex &ells+ t allo6s

    &omplex &ells to extra&t

    oriented ener5y+

    Te standard model dropsout of doin5 elief

    propa5ation for a fa&tored

    tirdorder ener5y fun&tion+Copy . Copy 2

    An advanta5e of modelin5 &orrelations

    t i l t t i l

  • 8/13/2019 Jul09 Hinton Deeplearn

    116/126

    et6een pixels rater tan pixels

    * Durin5 5eneration a Kverti&al ed5e unit &an turn offte ori>ontal interpolation in a re5ion 6itout6orryin5 aout exa&tly 6ere te intensitydis&ontinuity 6ill e+

    , Tis 5ives some translational invarian&e, t also 5ives a lot of invarian&e to ri5tness and

    &ontrast+

    , "o te Kverti&al ed5e unit is li3e a &omplex &ell+

    * By modulatin5 te &orrelations et6een pixels ratertan te pixel intensities te 5enerative model &anstill allo6 interpolation parallel to te ed5e+

    A prin&iple of ierar&i&al systems

  • 8/13/2019 Jul09 Hinton Deeplearn

    117/126

    A prin&iple of ierar&i&al systems

    * Ha& level in te ierar&y sould not try tomi&romana5e te level elo6+

    * nstead it sould &reate an o?e&tive fun&tion for

    te level elo6 and leave te level elo6 tooptimi>e it+

    , Tis allo6s te fine details of te solution to

    e de&ided lo&ally 6ere te detailed

    information is availale+* E?e&tive fun&tions are a 5ood 6ay to do

    astra&tion+

    Time series models

  • 8/13/2019 Jul09 Hinton Deeplearn

    118/126

    Time series models

    * nferen&e is diffi&ult in dire&ted models of timeseries if 6e use nonlinear distriuted

    representations in te idden units+

    , t is ard to fit Dynami& Bayes Nets to i5

    dimensional se;uen&es (e+5 motion &apture

    data#+

    * "o people tend to avoid distriuted

    representations and use mu& 6ea3er metods(e+5+ %88@s#+

    Time series models

  • 8/13/2019 Jul09 Hinton Deeplearn

    119/126

    Time series models

    * f 6e really need distriuted representations (6i& 6enearly al6ays do# 6e &an ma3e inferen&e mu& simplery usin5 tree tri&3s:

    , Use an 'B8 for te intera&tions et6een idden andvisile variales+ Tis ensures tat te main sour&e of

    information 6ants te posterior to e fa&torial+, 8odel sortran5e temporal information y allo6in5

    several previous frames to provide input to te iddenunits and to te visile units+

    * Tis leads to a temporal module tat &an e sta&3ed, "o 6e &an use 5reedy learnin5 to learn deep models

    of temporal stru&ture+

    An appli&ation to modelin5motion &apture data

  • 8/13/2019 Jul09 Hinton Deeplearn

    120/126

    motion &apture data(Taylor 'o6eis ) %inton 2007#

    * %uman motion &an e &aptured y pla&in5refle&tive mar3ers on te ?oints and ten usin5lots of infrared &ameras to tra&3 te -Dpositions of te mar3ers+

    * $iven a s3eletal model te -D positions of temar3ers &an e &onverted into te ?oint an5lesplus 4 parameters tat des&rie te -D positionand te roll pit& and ya6 of te pelvis+, e only represent &an5esin ya6 e&ause pysi&s

    doesn@t &are aout its value and 6e 6ant to avoid&ir&ular variales+

    Te &onditional 'B8 model( ti ll d C'9#

  • 8/13/2019 Jul09 Hinton Deeplearn

    121/126

    (a partially oserved C'9#

    * "tart 6it a 5eneri& 'B8+* Add t6o types of &onditionin5

    &onne&tions+

    * $iven te data te idden unitsat time t are &onditionallyindependent+

    * Te autore5ressive 6ei5ts &anmodel most sortterm temporalstru&ture very 6ell leavin5 teidden units to model nonlinearirre5ularities (su& as 6en tefoot its te 5round#+ t2 t. t

    i

    j

    v

    Causal 5eneration from a learned model

  • 8/13/2019 Jul09 Hinton Deeplearn

    122/126

    5

    * Feep te previous visile states fixed+, Tey provide a timedependent

    ias for te idden units+

    * !erform alternatin5 $is samplin5

    for a fe6 iterations et6een teidden units and te most re&ent

    visile units+

    , Tis pi&3s ne6 idden and visile

    states tat are &ompatile 6itea& oter and 6it te re&ent

    istory+

    i

    j

    %i5er level models

  • 8/13/2019 Jul09 Hinton Deeplearn

    123/126

    5

    * En&e 6e ave trained te model 6e &anadd layers li3e in a Deep Belief Net6or3+

    * Te previous layer C'B8 is 3ept and itsoutput 6ile driven y te data is treatedas a ne6 3ind of Kfully oserved data+

    * Te next level C'B8 as te samear&ite&ture as te first (tou5 6e &analter te numer of units it uses# and istrained te same 6ay+

    * Upper levels of te net6or3 model moreKastra&t &on&epts+

    * Tis 5reedy learnin5 pro&edure &an e?ustified usin5 a variational ound+

    i

    j

    k

    t2 t. t

    Learnin5 6it Kstyle laels

  • 8/13/2019 Jul09 Hinton Deeplearn

    124/126

    5 y

    * As in te 5enerative model of

    and6ritten di5its (%inton et al+

    2004# style laels &an e

    provided as part of te input to

    te top layer+

    * Te laels are represented y

    turnin5 on one unit in a 5roup of

    units ut tey &an also e

    lended+

    i

    j

    t2 t. t

    k

    l

  • 8/13/2019 Jul09 Hinton Deeplearn

    125/126

    "o6 demo@s of multiple styles of

    6al3in5

    These can be foun atwww.cs.toronto.eu/!gwta"lor/

    'eadin5s on deep elief nets

  • 8/13/2019 Jul09 Hinton Deeplearn

    126/126

    5 p

    A readin5 list (tat is still ein5 updated# &an efound at

    666+&s+toronto+eduO