Download - Apache Mahout - Meetupfiles.meetup.com/3168962/...Ted_Dunning_Mahout.pdf · ©MapR Technologies 2013- Confidential 33 Mahout Classifiers Naïve Bayes –high quality implementation

1 ©MapR Technologies 2013- Confidential

Apache Mahout

How it's good, how it's awesome, and where it falls short


What is Mahout?

“Scalable machine learning” – not just Hadoop-oriented machine learning

– not entirely, that is. Just mostly.

Components – math library

– clustering

– classification

– decompositions

– recommendations


What is Right and Wrong with Mahout?

Components – recommendations

– math library

– clustering

– classification

– decompositions

– other stuff




– math library

– clustering

– classification

– decompositions

– other stuff




– math library

– clustering

– classification

– decompositions

– other stuff

All the stuff that isn’t there

All the stuff that isn’t there


Mahout Math


Mahout Math

Goals are – basic linear algebra,

– and statistical sampling,

– and good clustering,

– decent speed,

– extensibility,

– especially for sparse data

But not – totally badass speed

– comprehensive set of algorithms

– optimization, root finders, quadrature


Matrices and Vectors

At the core: – DenseVector, RandomAccessSparseVector

– DenseMatrix, SparseRowMatrix

Highly composable API

Important ideas: – view*, assign and aggregate

– iteration

m.viewDiagonal().assign(v) m.viewDiagonal().assign(v)


Assign

Matrices

Vectors

Matrix assign(double value);

Matrix assign(double[][] values);

Matrix assign(Matrix other);

Matrix assign(DoubleFunction f);

Matrix assign(Matrix other, DoubleDoubleFunction f);

Matrix assign(double value);

Matrix assign(double[][] values);

Matrix assign(Matrix other);

Matrix assign(DoubleFunction f);

Matrix assign(Matrix other, DoubleDoubleFunction f);

Vector assign(double value);

Vector assign(double[] values);

Vector assign(Vector other);

Vector assign(DoubleFunction f);

Vector assign(Vector other, DoubleDoubleFunction f);

Vector assign(DoubleDoubleFunction f, double y);

Vector assign(double value);

Vector assign(double[] values);

Vector assign(Vector other);

Vector assign(DoubleFunction f);

Vector assign(Vector other, DoubleDoubleFunction f);

Vector assign(DoubleDoubleFunction f, double y);


Views

Matrices

Vectors

Matrix viewPart(int[] offset, int[] size);

Matrix viewPart(int row, int rlen, int col, int clen);

Vector viewRow(int row);

Vector viewColumn(int column);

Vector viewDiagonal();

Matrix viewPart(int[] offset, int[] size);

Matrix viewPart(int row, int rlen, int col, int clen);

Vector viewRow(int row);

Vector viewColumn(int column);

Vector viewDiagonal();

Vector viewPart(int offset, int length); Vector viewPart(int offset, int length);


Examples

The trace of a matrix

Random projection

Low rank random matrix


Examples


Random projection


m.viewDiagonal().zSum() m.viewDiagonal().zSum()


Examples


Random projection


m.viewDiagonal().zSum() m.viewDiagonal().zSum()

m.times(new DenseMatrix(1000, 3).assign(new Normal())) m.times(new DenseMatrix(1000, 3).assign(new Normal()))


Recommenders


Examples of Recommendations

Customers buying books (Linden et al)

Web visitors rating music (Shardanand and Maes) or movies (Riedl, et al), (Netflix)

Internet radio listeners not skipping songs (Musicmatch)

Internet video watchers watching >30 s (Veoh)

Visibility in a map UI (new Google maps)


Recommendation Basics

History:

User Thing

1 3

2 4

3 4

2 3

3 2

1 1

2 1


Recommendation Basics

History as matrix:

(t1, t3) cooccur 2 times,

(t1, t4) once,

(t2, t4) once,

(t3, t4) once

t1 t2 t3 t4

u1 1 0 1 0

u2 1 0 1 1

u3 0 1 0 1


A Quick Simplification

Users who do h

Also do r

Ah

ATAh( )

ATA( )h

User-centric recommendations

Item-centric recommendations


Clustering


An Example


Diagonalized Cluster Proximity


Parallel Speedup?

1 2 3 4 5 20

10

100

20

30

40

50

200

Threads

Tim

e p

er

po

int

(μs) 2

3

4

56

8

10

12

14

16

Threaded version

Non- threaded

Perfect Scaling

✓


Lots of Clusters Are Fine


Decompositions


Low Rank Matrix

Or should we see it differently?

Are these scaled up versions of all the same column?

1 2 5

2 4 10

10 20 50

20 40 100


Low Rank Matrix

Matrix multiplication is designed to make this easy

We can see weighted column patterns, or weighted row patterns

All the same mathematically

1

2

10

20

1 2 5 x

Column pattern (or weights)

Column pattern (or weights)

Weights (or row pattern)

Weights (or row pattern)


Low Rank Matrix

What about here?

This is like before, but there is one exceptional value

1 2 5

2 4 10

10 100 50

20 40 100


Low Rank Matrix

OK … add in a simple fixer upper

1

2

10

20

1 2 5 x

0

0

10

0

0 8 0 x

Which row Which row

Exception pattern

Exception pattern

+ [

[

]

]


Random Projection


SVD Projection


Classifiers


Mahout Classifiers

Naïve Bayes – high quality implementation

– uses idiosyncratic input format

– … but it is naïve

SGD – sequential, not parallel

– auto-tuning has foibles

– learning rate annealing has issues

– definitely not state of the art compared to Vowpal Wabbit

Random forest – scaling limits due to decomposition strategy

– yet another input format

– no deployment strategy


The stuff that isn’t there


What Mahout Isn’t

Mahout isn’t R, isn’t SAS

It doesn’t aim to do everything

It aims to scale some few problems of practical interest

The stuff that isn’t there is a feature, not a defect


Contact: – [email protected]

– @ted_dunning

– @apachemahout

– @[email protected]

Slides and such http://www.slideshare.net/tdunning

Hash tags: #mapr #apachemahout

mailto:[email protected]

http://www.slideshare.net/tdunning