Ml intro

56

Click here to load reader

Transcript of Ml intro

Page 1: Ml intro

Introduc)on  to  Machine  Learning  

Integrated  Knowledge  Solu)ons  h7ps://iksinc.wordpress.com/home/  

[email protected]  [email protected]  

 

 

Page 2: Ml intro

Agenda  

•  What  is  machine  learning?  •  Why  machine  learning  and  why  now?  •  Machine  learning  terminology  •  Overview  of  machine  learning  methods  •  Machine  learning  to  deep  learning  •  Summary  and  Q  &  A  

[email protected]  

Page 3: Ml intro

What  is  machine  learning?  

[email protected]  

Page 4: Ml intro

What  is  Machine  Learning?  

•  Machine  learning  deals  with  making  computers  learn  to  make  predic)ons/decisions  without  explicitly  programming  them.  Rather  a  large  number  of  examples  of  the  underlying  task  are  shown  to  op)mize  a  performance  criterion  to  achieve  learning.  

[email protected]  

Page 5: Ml intro

An  Example  of  Machine  Learning:  Credit  Default  Predic)on  

We  have  historical  data  about  businesses  and  their  delinquency.  The  data  consists  of  100  businesses.  Each  business  is  characterized  via  two  a7ributes:  business  age  in  months  and  number  of  days  delinquent  in  payment.  We  also  know  whether  a  business  defaulted  or  not.  Using  machine  learning,  we  can  build  a  model  to  predict  the  probability  whether  a  given  business  will  default  or  not.    

0  

20  

40  

60  

80  

100  

0   100   200   300   400   500  

[email protected]  

Page 6: Ml intro

Logis)c  Regression  

•  The  model  that  is  used  here  is  called  the  logis&c  regression  model.    Lets  look  at  the  following  expression  

             ,  where  x1,  x2,…,  xk  are  the  a7ributes.    •  In  our  example,  the  a7ributes  are  business  age  and  number  

of  days  of  delinquency.  •  The  quan)ty  p  will  always  lie  in  the  range  0-­‐1  and  thus  can  be  

interpreted  as  the  probability  of  outcome  being  default  or  no  default.    

p = e(a0+a1x1...+akxk )

1+ e(a0+a1x1...+akxk )

[email protected]  

Page 7: Ml intro

Logis)c  Regression  

•  By  simple  rewri)ng,  we  get:  

 log(p/(1-­‐p))  =  a0  +  a1x1  +  a2x2  +·∙·∙·∙  +  akxk    •  This  ra)o  is  called  log  odds  •  The  parameters  of  the  logis)c  model,  a0  ,  a1,…,  ak,    are  learned  via  an  op)miza)on  procedure  

•  The  learned  parameters  can  then  be  deployed  in  the  field  to  make  predic)ons  

[email protected]  

Page 8: Ml intro

0  

0.2  

0.4  

0.6  

0.8  

1  

1.2  

1   5   9   13   17   21   25   29   33   37   41   45   49   53   57   61   65   69   73   77   81   85   89   93   97  

Only  in  rare  cases,  we  get  a  100%  accurate  model.  

Model  Details  and  Performance  

Plot  of  predicted  default  probability  

[email protected]  

Page 9: Ml intro

Using  the  Model  

•  What  is  the  probability  of  a  business  defaul)ng  given  that  business  has  been  with  the  bank  for  26  months  and  is  delinquent  for  58  days?              

e0.008*26+0.102*58-­‐5.706/(1+e0.008*26+0.102*58-­‐5.706)  

0.603  

Plug  the  model  parameters  to  calculate  p  

BUSAGE:  0.008;  DAYSDELQ:  0.102;  Intercept:  -­‐5.076  

[email protected]  

Page 10: Ml intro

Why  Machine  Learning  and  Why  Now?  

[email protected]  

Page 11: Ml intro

Why  Machine  Learning?  

[email protected]  

Page 12: Ml intro

Buzz  about  Machine  Learning  

"Every  company  is  now  a  data  company,  capable  of  using  machine  learning  in  the  cloud  to  deploy  intelligent  apps  at  scale,  thanks  to  

three  machine  learning  trends:  data  flywheels,  the  algorithm  economy,  and  cloud-­‐hosted  

intelligence."  

Three  factors  are  making  machine  learning  hot.  These  are  cheap  data,  algorithmic  economy,  and  cloud-­‐based  solu)ons.  

[email protected]  

Page 13: Ml intro

Data  is  gemng  cheaper  

For  example,  Tesla  has  780  million  miles  of  driving  data,  and  adds  another  million  every  10  hours  [email protected]  

Page 14: Ml intro

Algorithmic  Economy  

[email protected]  

Page 15: Ml intro

Algorithm  Economy  Players  in  ML  

[email protected]  

Page 16: Ml intro

Cloud-­‐Based  Intelligence  

Emerging  machine  intelligence  plaoorms  hos)ng  pre-­‐trained  machine  

learning  models-­‐as-­‐a-­‐service  are  making  it  easy  for  companies  to  get  started  with  ML,  allowing  them  to  rapidly  take  their  applica)ons  from  

prototype  to  produc)on.  

Many  open  source  machine  learning  and  deep  learning  frameworks  running  in  the  

cloud  allow  easy  leveraging  of  pre-­‐trained,  hosted  models  to  tag  images,  recommend  products,  and  do  general  natural  language  processing  tasks.  

[email protected]  

Page 17: Ml intro

An  Example  

[email protected]  

Page 18: Ml intro

Apps  for  Excel    

[email protected]  

Page 19: Ml intro

Machine  Learning  Terminology  

[email protected]  

Page 20: Ml intro

Feature  Vectors  in  ML  •  A  machine  learning  system  builds  models  using  proper)es  of  objects  being  

modeled.  These  proper)es  are  called    features  or  a@ributes  and  the  process  of  measuring/obtaining  such  proper)es  is  called  feature  extrac&on.  It  is  common  to  represent  the  proper)es  of  objects  as  feature  vectors.  

Sepal  width    Sepal  length    

Petal  width    

Petal  length  

x =

2

664

x1

x2

x3

x4

3

775

[email protected]  

Page 21: Ml intro

Learning  Styles  •  Supervised  Learning  –  Training  data  comes  with  answers,  called  labels  –  The  goal  is  to  produce  labels  for  new  data  

[email protected]  

Page 22: Ml intro

Supervised  Learning  Models  

•  Classifica)on  models  – Predict  whether  a  customer  is  likely  to  be  lost  to  compe)tor  

– Tag  objects  in  a  given  image  

– Determine  whether  an  incoming  email  is  spam  or  not  

[email protected]  

Page 23: Ml intro

Supervised  Learning  Models  

•  Regression  models  – Predict  credit  card  balance  of  customers  

– Predict  the  number  of  'likes'  for  a  pos)ng  

– Predict  peak  load  for  a  u)lity  given  weather  informa)on  

[email protected]  

Page 24: Ml intro

Learning  Styles  

•  Unsupervised  Learning  –  Training  data  comes  without  labels  –  The  goal  is  to  group  data  into  different  categories  based  on  similari)es  

Grouped  Data  

[email protected]  

Page 25: Ml intro

Unsupervised  Learning  Models  

•  Segment/  cluster  customers  into  different  groups  

•  Organize  a  collec)on  of  documents  based  on  their  content  

•  Make  Recommenda)ons  for  products  

[email protected]  

Page 26: Ml intro

Learning  Styles  

•  Reinforcement  Learning  –  Training  data  comes  without  labels  –  The  learning  system  receives  feedback  from  its  opera)ng  environment  to  know  how  well  it  is  doing  

–  The  goal  is  to  perform  be7er  

[email protected]  

Page 27: Ml intro

Overview  of  Machine  Learning  Methods  

[email protected]  

Page 28: Ml intro

Walk  Through  An  Example:  Flower  Classifica)on  

•  Build  a  classifica)on  model  to  differen)ate  between  two  classes  of  flower  

[email protected]  

Page 29: Ml intro

How  Do  We  Go  About  It?  

•  Collect  a  large  number  of  both  types  of  flowers  with  the  help  of  an  expert  

•  Measure  some  a7ributes  that  can  help  differen)ate  between  the  two  types  of  flowers.  Let  those  a7ributes  be  petal  area  and  sepal  area.    

[email protected]  

Page 30: Ml intro

Sca7er  plot  of  100  examples  of  flowers  

[email protected]  

Page 31: Ml intro

We  can  separate  the  flower  types  using  the  linear  boundary  shown  above.  The  parameters  of  the  line  represent  the  learned  classifica)on  model.   [email protected]  

Page 32: Ml intro

Another  possible  boundary.  This  boundary  cannot  be  expressed  via  an  equa)on.  However,  a  tree  structure  can  be  used  to  express  this  boundary.  Note,  this  boundary  does  be7er  predic)on  of  the  collected  data  [email protected]  

Page 33: Ml intro

Yet  another  possible  boundary.  This  boundary  does  predic)on  without  any  error.  Is  this  a  be7er  boundary?  

[email protected]  

Page 34: Ml intro

Model  Complexity  

•  There  are  tradeoffs  between    the  complexity  of    models  and    their    performance    in  the  field.  A  good  design  (model  choice)  weighs  these  tradeoffs.  

•  A  good  design  should  avoid  overfimng.  How?  –  Divide  the  en)re  data  into  three  sets  

•  Training  set  (about  70%  of  the  total  data).  Use  this  set  to  build  the  model  •  Test  set  (about  20%  of  the  total  data).  Use  this  set  to  es)mate  the  model  accuracy  auer  deployment  

•  Valida)on  set  (remaining  10%  of  the  total  data).  Use  this  set  to  determine  the  appropriate  semngs  for  free  parameters  of  the  model.  May  not  be  required  in  some  cases.  

 [email protected]  

Page 35: Ml intro

Measuring  Model  Performance  •  True  Posi)ve:  Correctly  iden)fied  as  relevant  •  True  Nega)ve:  Correctly  iden)fied  as  not  relevant  •  False  Posi)ve:  Incorrectly  labeled  as  relevant  •  False  Nega)ve:  Incorrectly  labeled  as  not  relevant  

 

Image:  

True  Posi)ve  

True    Nega)ve  

Cat  vs.  No  Cat  

False    Nega)ve  

False    Posi)ve  

[email protected]  

Page 36: Ml intro

Precision,  Recall,  and  Accuracy  

•  Precision  –  Percentage  of  posi)ve  labels  that  are  correct  –  Precision  =  (#  true  posi)ves)  /  (#  true  posi)ves  +  #  false  posi)ves)  

•  Recall  –  Percentage  of  posi)ve  examples  that  are  correctly  labeled  –  Recall  =  (#  true  posi)ves)  /  (#  true  posi)ves  +  #  false  nega)ves)  

•  Accuracy  –  Percentage  of  correct  labels  –  Accuracy  =  (#  true  posi)ves  +  #  true  nega)ves)  /  (#  of  samples)  

[email protected]  

Page 37: Ml intro

Sum-­‐of-­‐Squares  Error  for  Regression  Models  

For  regression  model,  the  error  is  measured  by  taking  the  square  of  the  difference  between  the  predicted  output  value  and  the  target  value  for  each  training  (test)  example  and  adding  this  number  over  all  examples  as  shown  

[email protected]  

Page 38: Ml intro

Bias  and  Variance  

•  Bias:  expected  difference  between  model’s  predic)on  and  truth  

•  Variance:  how  much  the  model  differs  among  training  sets  

•  Model  Scenarios  –  High  Bias:  Model  makes  inaccurate  predic)ons  on  training  data  

–  High  Variance:  Model  does  not  generalize  to  new  datasets  –  Low  Bias:  Model  makes  accurate  predic)ons  on  training  data  

–  Low  Variance:  Model  generalizes  to  new  datasets  

[email protected]  

Page 39: Ml intro

The  Guiding  Principle  for  Model  Selec)on:  Occam’s  Razor  

[email protected]  

Page 40: Ml intro

Model  Building  Algorithms  

•  Supervised  learning  algorithms  – Linear  methods  – k-­‐NN  classifiers  – Neural  networks  – Support  vector  machines  – Decision  trees  – Ensemble  methods  

[email protected]  

Page 41: Ml intro

Illustra)on  of  k-­‐NN  Model  

Predicted  label  of  test  example  with  1-­‐NN  model  :  Versicolor  Predicted  label  of  text  example  with  3-­‐NN  model:  Virginica  

Test  example  

[email protected]  

Page 42: Ml intro

Illustra)on  of  Decision  Tree  Model  

Petal  width  <=  0.8  

Setosa  

Yes  

Petal  length  <=  4.75  

Versicolor   Virginica  

Yes   No  

No  

The  decision  tree  is  automa)cally  generated  by  a  machine  learning  algorithm.  [email protected]  

Page 43: Ml intro

Model  Building  Algorithms  

•  Unsupervised  learning  – k-­‐means  clustering  – Agglomera)ve  clustering  – Self  organiza)on  feature  maps  – Recommenda)on  system  

[email protected]  

Page 44: Ml intro

K-­‐means  Clustering  

Clustering

1

K-means“by far the most popular

clustering tool used nowadays in scientific and

industrial applications”

- Berkhin 2006

Choose  the  number  of  clusters,  k,  and  ini)al  cluster  centers  

Page 45: Ml intro

K-­‐means  Clustering  

Clustering

1

K-means“by far the most popular

clustering tool used nowadays in scientific and

industrial applications”

- Berkhin 2006

K-means

K-means clustering problem

2

K-means

K-means clustering problem

2

K-means

K-means clustering problem

2

Assign  data  points  to  clusters  based  on  distance  to  cluster  centers  

Page 46: Ml intro

K-­‐means  Clustering  

Clustering

1

K-means“by far the most popular

clustering tool used nowadays in scientific and

industrial applications”

- Berkhin 2006

K-means

K-means clustering problem

2

K-means

K-means clustering problem

2

K-means

K-means clustering problem

2

K-means

K-means clustering problem

(sum of square distances from data points to cluster

centers)

minimizeN�

n=1

⇥xn � centern⇥2

3

Update  cluster  centers  and  reassign  data  points.    

K-means

K-means clustering problem

(sum of square distances from data points to cluster

centers)

minimizeN�

n=1

⇥xn � centern⇥2

3

Page 47: Ml intro

Illustra)on  of  Recommenda)on  System  

[email protected]  

Page 49: Ml intro

Steps  Towards  a  Machine  Learning  Project  

•  Collect  data  •  Explore  data  via  sca7er  plots,  histograms.  Remove  duplicates  and  data  records  with  missing  values  

•  Check  for  dimensionality  reduc)on  •  Build  model  (itera)ve  process)  •  Transport/Integrate  with  an  applica)on  

[email protected]  

Page 50: Ml intro

Machine  Learning  to  Deep  Learning  

[email protected]  

Page 51: Ml intro

Machine  Learning  Limita)on  

•  Machine  learning  methods  operate  on  manually  designed  features.    

•  The  design  of  such  features  for  tasks  involving  computer  vision,  speech  understanding,  natural  language  processing  is  extremely  difficult.  This  puts  a  limit  on  the  performance  of  the  system.  

[email protected]  

Feature  Extractor   Trainable  Classifier  

Page 52: Ml intro

Processing  Sensory  Data  is  Hard  

How  do  we  bridge  this  gap  between  the  pixels  and  meaning  via  machine  learning?  

Page 53: Ml intro

Sensory  Data  Processing  is  Challenging  

So  why  not  build  integrated  learning  systems  that  perform  end-­‐to-­‐end  learning,  i.e.  learn  the  representa)on  as  well  as  classifica)on  from  raw  data  without  any  engineered  features.  

Feature  Learner   Trainable  Classifier  

An  approach  performing  end-­‐to-­‐end  learning,  typically  performed  through  a  series  of  successive  abstrac)ons,  is  in  a  nutshell  deep  learning  

Page 54: Ml intro

SegNet  is  a  deep  learning  architecture  for  pixel  wise  seman)c  segmenta)on  from  the  University  of  Cambridge.  

An  example  of  deep  learning  Capability  

Page 55: Ml intro

Summary  

•  We  have  just  skimmed  machine  learning  at  surface  •  Web  is  full  of  reading  resources  (free  books,  lecture  notes,  blogs,  videos)  to  dig  into  machine  learning  

•  Several  open  source  souware  resources  (R,  Rapid  Miner,  and  Scikit-­‐learn  etc.)  to  learn  via  experimenta)on  

•  Applica)ons  based  on  vision,  speech,  and  natural  language  processing  are  excellent  candidates  for  deep  learning  

[email protected]  

Page 56: Ml intro

[email protected]  h7ps://iksinc.wordpress.com/home/  

[email protected]  [email protected]