1. MOTIVATION - UCSBdonganh/files/teaching/cs290d... · 2014. 5. 29. · Agenda 1. Mo9vaon(2....

Post on 16-Oct-2020

0 views 0 download

Transcript of 1. MOTIVATION - UCSBdonganh/files/teaching/cs290d... · 2014. 5. 29. · Agenda 1. Mo9vaon(2....

Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning  Anh  Nguyen,  Bay-­‐yuan  Hsu  CS290D  –  Data  Mining  (Spring  2014)  University  of  California,  Santa  Barbara  Slide  adapted  from  Andrew  Ng  (Stanford),  Nando  de  Freitas  (UBC)   1  

Agenda  1.  Mo9va9on  2.  Approach  1.  Sparse  Deep  Auto-­‐encoder  2.  Local  Recep9ve  Field  3.  L2  Pooling  4.  Local  contrast  normaliza9on  5.  Overall  Model  

3.  Parallelism  4.  Evalua9on  5.  Discussion  

2  

1. MOTIVATION

3  

Mo9va9on  

•  Feature  learning  •  Supervised  learning  

•  Need  large  number  of  labeled  data  • Unsupervised  learning  

•  Example:  Build  face  detector  without  having  labeled  face  images  

• Building  high-­‐level  features  using  unlabeled  data.  

4  

Mo9va9on  

• Previous  works  •  Auto  encoder  •  Sparse  coding  

• Result:  Only  learns  low  level  features  • Reason:  Computa9onal  constraints  • Approach  

•  Dataset  • Model  •  ComputaIonal  resources   5  

2.  APPROACH  

6  

Sparse  Deep  Auto-­‐encoder  

• Auto-­‐encoder  •  Neural  network  •  Unsupervised  learning  •  Back-­‐propaga9on  

7  

Sparse  Deep  Auto-­‐encoder  (cnt’d)  

•  Sparse  Coding  •  Input:  Images  x(1),  x(2)  ...  x(m)    •  Learn:  Bases  (features)  f1,  f2,  ...,  fk,  so  that  each  input  x  can  be  approximately  decomposed  as:  x=∑ajfj      s.t.  aj’s  are  mostly  zero  (“sparse”)      

8  

Sparse  Deep  Auto-­‐encoder  (cnt’d)  

9  

Sparse  Deep  Auto-­‐encoder  (cnt’d)  

•  Sparse  Coding  • Regularizer  

10  

Sparse  Deep  Auto-­‐encoder  (cnt’d)  

•  Sparse  Deep  Auto-­‐encoder  • Mul9ple  hidden  layers  to  achieve  par9cular  characteris9c  in  learning  features  

11  

Local  Recep9ve  Field  

• Defini9on:  Each  feature  in  the  autoencoder  can  connect  only  to  a  small  region  of  the  lower  layer  

• Goal:    •  Learn  feature  efficiently  •  Parallelism  

•  Training  on  small  image  patches  

12  

L2  Pooling  

• Goal:  Robust  to  local  distorIon  • Approach:  Group  similar  features  together  to  achieve  invariance  

13  

L2  Pooling  

• Goal:  Robust  to  local  distorIon    • Approach:  Group  similar  features  together  to  achieve  invariance  

14  

L2  Pooling  

• Goal:  Robust  to  local  distorIon    • Approach:  Group  similar  features  together  to  achieve  invariance  

15  

L2  Pooling  

• Goal:  Robust  to  local  distorIon    • Approach:  Group  similar  features  together  to  achieve  invariance  

16  

Local  Contrast  Normaliza9on  

• Goal:  Robust  to  variaIon  in  light  intensity  • Approach:  Normalize  contrast  

17  

Local  Contrast  Normaliza9on  

• Goal:  Robust  to  variaIon  in  light  intensity  • Approach:  Normalize  contrast  

18  

Overall  Model  

•  3  layers  •  Simple:  18x18  px  

•  8  neurons/patch  • Complex:  5x5  px  •  LCN:  5x5  px  

19  

Overall  Model  

20  

Overall  Model  

•  Train:  •  Reconstruct  input  of  each  layer  

• Op9miza9on  func9on  

21  

Overall  Model  

•  Complex  model?  

22  

3.  PARALLELISM  

23  

Asynchronous  SGD  

n   Two  recent  lines  of  research  in  speeding  up            large  learning  problems:  • Parallel/distributed  compu9ng  • Online  (and  mini-­‐batch)  learning  algorithms:        stochas9c  gradient  descent,  perceptron,  MIRA,          stepwise  EM  n How  can  we  bring  together  the  benefits  of          parallel  compu9ng  and  online  learning?      

24  

Asynchronous  SGD

n SGD:  Stochas9c  Gradient  Descent:  • Choose  an  ini9al  vector  of  parameters  W  and  learning  rate  α  

• Repeat  un9l  an  approximate  minimum  is  obtained:  •  Randomly  shuffle  examples  in  the  training  set

25  

26  

27  

28  

Model  Parallelism  

• Weights  divided  according  to  locality  of  image  and  store  on  different  machine  

29  

5.  EVALUATION  

30  

Evalua9on  

• 10M  Youtube  unlabeled  frames  of  size  200x200  

• 1B  parameters  • 1000  machines  • 16,000  cores  

31  

Experiment  on  Faces  

• Test  set  •  37,000  images  •  13,026  face  images  

• Best  neuron  

32  

Experiment  on  Faces  (cnt’d)  

• Visualiza9on  •  Top  s9mulus  (images)  for  face  neuron  • Op9mal  s9mulus  for  face  neuron  

33  

Experiment  on  Faces  (cnt’d)  

•  Invariances  Proper9es  

34  

Experiment  on  Faces  (cnt’d)  

•  Invariances  Proper9es  

35  

Experiment  on  Cat/Human  body  

• Test  set  • Cat:  10,000  posi9ve,  18,409  nega9ve  • Human  body:  13,026  posi9ve,  23,974  nega9ve  

• Accuracy    

36  

ImageNet  classifica9on  

• Recognizing  images  • Dataset  

•  20,000  categories  •  14M  images  

• Accuracy  •  15.8%  •  State  of  art:  9.3%  

37  

5.  DISCUSSION  

38  

Discussion  

• Deep  learning  • Unsupervised  feature  learning  •  Learning  mul9ple  layers  of  representa9on  

•  Increase  accuracy:  Invariance,  contrast  normaliza9on  

• Scalability  

39  

6.  REFERENCES  

40  

References  1.  Quoc  Le  et  al.,  “Building  High-­‐level  Features  using  Large  Scale  Unsupervised  

Learning”  2.  Nando  de  Freitas,  “Deep  Learning”,  URL:  hops://www.youtube.com/watch?

v=g4ZmJJWR34Q  3.  Andrew  Ng,  “Sparse  autoencoder”,  URL:  hop://www.stanford.edu/class/archive/

cs/cs294a/cs294a.1104/sparseAutoencoder.pdf  4.  Andrew  Ng,  “Machine  Learning  and  AI  via  Brain  Simula9ons”,  URL:  hops://

forum.stanford.edu/events/2011slides/plenary/2011plenaryNg.pdf  5.  Andrew  Ng,  “Deep  Learning”,  URL:  hop://www.ipam.ucla.edu/publica9ons/

gss2012/gss2012_10595.pdf    

41