Apache Mahout Algorithms
-
Upload
mozgkarakaya -
Category
Software
-
view
489 -
download
0
description
Transcript of Apache Mahout Algorithms
Mahout AlgorithmsMahmut Karakaya
Agenda- Introduction- Collaborative Filtering- Map/Reduce- Clustering- Demo
What mahout meansElephant rider in Hindi
What Apache Mahout is- Java, Hadoop- Collaborative Filtering- Mahout In Action- [email protected] 0.9 (1-Feb-2014)
Who uses Mahout
Mahout in Apache Foundation
overstock.com saves $2m a year
Judd Bagley Saum Noursalehi
Others- Weka (Machine Learning Library)- Lenskit (Grouplens)- EasyRec (RestAPI)- Write yourself:)
Need to know ML?
Need to know ML?hadoop.jar mahout-core-0.8-job.jar \org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \-Dmapred.input.dir=input/input.txt \-Dmapred.output.dir=output --usersFile input/users.txt --booleanData
Data Model (u,i,r)
Similarity
Cosine Similarity
Cosine Similarity
Collaborative Filtering- Data format = userId, itemId, rating- Create Model + Predict
Item Based - Similarity Matrix (Item-Item)
Item Based - Predict- Weighted Sum:
r^(3,1) = 2 * 0.91 + ...
Item Based
Item Based.. Why in Mahout
- Generic recommender like User Based- User Based similarity matrix is heavier
Singular Value Decomposition (SVD)
SVDRecommeder
Factorization
Factorizer
Singular Value Decomposition (SVD)
m * n → m * k + n * k 10M → 100K + 10K
Lets say; m=10Kn = 1Kk=10
Singular Value Decomposition (SVD)
SVD k=3 λ=0.1 a=40 c.a=1
SVD k=3 λ=0.1 a=40 c.a=1
SVD k=3 λ=0.1 a=40 c.a=10
SVD.. Why in Mahout- Won Netflix Prize- Parallelizable by row, column
Map / Reduce Mapper1.txt 2.txtHello HelloHello
Map / Reduce Mapper
Map / Reduce MapperMap1 Map2
Hello,1 Hello,1Hello,1
Map / Reduce Reducer
Map / Reduce ReducerHello,3
Map / Reduce ItemBased
Map / Reduce ItemBasedhadoop.jar mahout-core-0.8-job.jar \org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \-Dmapred.input.dir=input/input.txt \-Dmapred.output.dir=output --usersFile input/users.txt --booleanData
Map / Reduce ItemBased
Map / Reduce ItemBased
Map / Reduce ItemBasedMap 1
Map / Reduce ItemBasedReduce 1
Map / Reduce ItemBasedReduce 1
Map / Reduce ItemBasedMap 2
Map / Reduce ItemBasedReduce 2
Map / Reduce ItemBased
Map / Reduce.. Why in Mahout
Clustering- KMeans Clustering (SM,MR)- Fuzzy kMeans (SM,MR)- Canopy Clustering (SM,MR)- Dirichlet (SM,MR)
Kmeans
Kmeans
Clustering Evaluation
Clustering Intra Distance
Clustering Inter Distance
Clustering.. Why in Mahout- Sparsity
- ~10m of 11m users registered 1 Sony product
Clustering.. Why in Mahout- Group Recommendation- Cluster Based Recommendation
Create WishList Experience
- Mahout (SVD)- Play- Heroku- MongoLab- Resthttp://recommenderplaybbs.herokuapp.com/
Thank you