Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds...

44
Machine Learning Tom Maiaroto @shift8creative

Transcript of Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds...

Page 1: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Machine Learning

Tom Maiaroto@shift8creative

Page 2: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

What is Machine Learning?

Page 3: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Algorithms & Approaches

Decision trees 

Random forests 

Artificial neural networks 

 k-NN (nearest neighbour) 

 Naive Bayesian classifier

Page 4: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Algorithms & Approaches

Decision trees 

Random forests 

Artificial neural networks 

 k-NN (nearest neighbour) 

 Naive Bayesian classifier

Page 5: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

So could machines one day rulethe earth?

Page 6: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

So could machines one day rulethe earth?

 Maybe  (ok probably not)

Page 7: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

What can Machine Learning do for Apps?

 Spam filtering

Page 8: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

What can Machine Learning do for Apps?

Auto-tagging

Page 9: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

What can Machine Learning do for Apps?

All Sorts of Categorization

Page 10: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

What can Machine Learning do for Apps?

Sentiment Analysis

Page 11: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Languages Commonly Used

• Javao Java-ML, WEKA, Apache Mahout, many more...

• Pythono NLTK, scikit-learn, PyML, a good deal more...

• C++o libDAI, Armadillo, Orange, tons more...

  

and then some others...

Page 12: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Languages Commonly Used

  

http://www.mloss.org

Page 13: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

MongoDB Too!

• Map/Reduce

• Stored JavaScript

• Geo-spatial Indexing

• Replication

Page 14: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Geo-spatial Indexing

Did someone say nearest neighbour?

Page 15: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Geo-spatial Indexing

Did someone say nearest neighbour?

Design geeks, imagine the visualizations...

Page 16: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Replication

• Store massive amounts of data

• Distributed performance benefits

• Dedicated databases for calculations   

All the obvious benefits.

Page 17: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Map/Reduce

It's the brain.

Page 18: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Map/Reduce

It's the brain.

It's not just for aggregation.

Page 19: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Map/Reduce

It's the brain.

It's not just for aggregation.  

 It's faster than you might think.

Page 20: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Map/Reduce

It's the brain.

It's not just for aggregation.  

 It's faster than you might think.

It runs in the database.

Page 21: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Map/Reduce

In the computer...

Page 22: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Example Time!It's simple...Just take this...

Page 23: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Example Time!It's simple...Just take this...

Page 24: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Example Time!

Just kidding...   

Let's Break Down a Naive Bayes Classifier

Page 25: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Classification/Naive BayesTraining the System

Page 26: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Classification/Naive BayesTraining the System

Simple...

$inc

Page 27: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Classification/Naive Bayes

Just Keep Count of Words per Category

Page 28: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Classification/Naive BayesReduce:

Page 29: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Classification/Naive BayesReduce:

Page 30: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Classification/Naive BayesFinalize:

Page 31: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Classification/Naive BayesFinalize:

Page 32: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Classification/Naive BayesCall the Command:

Page 33: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Classification/Naive BayesResults:

Can see total words.

Can also see word counts per category.

Page 34: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Classification/Naive BayesResults:

...and of course the scores per category...cae = arts and entertainment

cs = science...

Page 35: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Classification/Naive Bayes

• Accurate even with little training

• MongoDB on a small VMTook 1.7 seconds

• Compared to say PHP 33 seconds and timed out

• More training data == exponentially fasterthan PHP

Page 36: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Classification/Naive Bayes

• This wasn't even a full map/reduce

• Your mileage will vary based on formula

• You can cache certain values for speed

• Don't forget about stored JavaScript(but use it wisely)

Page 37: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Porter Stemming Algorithm

 Thank You Martin Porter

http://tartarus.org/martin/PorterStemmer

Page 38: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Porter Stemming Algorithm • Exists for nearly every language

• MongoDB will use JavaScript of course

• Decent execution time

Page 39: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Porter Stemming Algorithm • About 2.5x faster than PHP class

• 663x faster than a web browser

Page 40: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Porter Stemming Algorithm • About 2.5x faster than PHP class

• 663x faster than a web browser

• 7x slower than PHP PECL extension

Page 41: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Real World Application

Social Harvest

Analyzes social data from the internet to determine languages spoken, gender, age, sentiment analysis, and categories.  

www.social-harvest.com

Page 42: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Real World Application

Social Harvest

Who doesn't like pie charts?

Page 43: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive
Page 44: Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP. Classification/Naive

Follow [email protected]

www.social-harvest.com  

www.union-of-rad.com

 Thank You!