Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By...

110
Scalable and Fast Machine Learning By Niketan Pansare ([email protected] ) Thursday, March 20, 14

Transcript of Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By...

Page 1: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Scalable and Fast Machine Learning

By Niketan Pansare ([email protected])

Thursday, March 20, 14

Page 2: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

TerroristAlert

Why is it important ?

Thursday, March 20, 14

Page 3: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

TerroristAlert

Why is it important ?

Machine LearningAlgorithm (ML)

Thursday, March 20, 14

Page 4: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

TerroristAlert

Challenges

Machine LearningAlgorithm (ML)

Thursday, March 20, 14

Page 5: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

TerroristAlert

Challenges

Machine LearningAlgorithm (ML)

Thursday, March 20, 14

Page 6: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

TerroristAlert

Challenges

Machine LearningAlgorithm (ML)

Thursday, March 20, 14

Page 7: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

TerroristAlert

Challenges

Machine LearningAlgorithm (ML)

...

100 hours of videos uploaded on Youtube per minute !!!

Thursday, March 20, 14

Page 8: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

TerroristAlert

Challenges

Machine LearningAlgorithm (ML)

...

100 hours of videos uploaded on Youtube per minute !!!

Needs to be scalable & fast

Thursday, March 20, 14

Page 9: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

TerroristAlert

Challenges

Machine LearningAlgorithm (ML)

...

100 hours of videos uploaded on Youtube per minute !!!

Needs to be scalable & fast

Thursday, March 20, 14

Page 10: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

TerroristAlert

Challenges

Machine LearningAlgorithm (ML)

...

100 hours of videos uploaded on Youtube per minute !!!

Needs to be scalable & fast

Improve performance of ML algorithms on MapReduce

Thursday, March 20, 14

Page 11: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Machine Learning in little more detail

Machine LearningAlgorithm (ML)

Thursday, March 20, 14

Page 12: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Machine Learning in little more detail

Machine LearningAlgorithm (ML)

Thursday, March 20, 14

Page 13: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Machine Learning in little more detail

Machine LearningAlgorithm (ML)

Thursday, March 20, 14

Page 14: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Machine Learning in little more detail

Machine LearningAlgorithm (ML)

Classification OutputInput

Thursday, March 20, 14

Page 15: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Machine Learning in little more detail

Machine LearningAlgorithm (ML)

Classification OutputInputxi yig : x ! y

Key idea of ML: Learn predictor g such that the loss over the data is minimum.

L(y, g(x))

Thursday, March 20, 14

Page 16: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Machine Learning in little more detail

Machine LearningAlgorithm (ML)

Classification OutputInput

=> One way to do this is using gradient descent algorithm.

xi yig : x ! y

Key idea of ML: Learn predictor g such that the loss over the data is minimum.

L(y, g(x))

Thursday, March 20, 14

Page 17: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Machine Learning in little more detail, but with example

Thursday, March 20, 14

Page 18: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Machine Learning in little more detail, but with exampleConsider “Linear regression”, so learn w using gradient descent such that

predictor g = w

Tx

loss L = (y � w

Tx)

2

rL(wi;xj , yj) = 2(y � w

Tx)w

Thursday, March 20, 14

Page 19: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Machine Learning in little more detail, but with example

Gradient descent setup:Initialize while not converged {

}

w0

Consider “Linear regression”, so learn w using gradient descent such that

predictor g = w

Tx

loss L = (y � w

Tx)

2

rL(wi;xj , yj) = 2(y � w

Tx)w

Thursday, March 20, 14

Page 20: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Machine Learning in little more detail, but with example

Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

wi+1 = wi � ↵ E[rLj2B ]

Consider “Linear regression”, so learn w using gradient descent such that

predictor g = w

Tx

loss L = (y � w

Tx)

2

rL(wi;xj , yj) = 2(y � w

Tx)w

Thursday, March 20, 14

Page 21: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Machine Learning in little more detail, but with example

Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

wi+1 = wi � ↵ E[rLj2B ]

Consider “Linear regression”, so learn w using gradient descent such that

predictor g = w

Tx

loss L = (y � w

Tx)

2

rL(wi;xj , yj) = 2(y � w

Tx)w

For regularized version, replace wi by wi(1� ↵�)

Thursday, March 20, 14

Page 22: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Examples other than linear regression:

SVM: wi+1 = wi � ↵i�wi if yjwT�(xj) > 1

= wi � ↵iyj�(xj) otherwise

Guassian mixture model:w = [⇡;µ0...µK ;�0...�K ]

rw = [r⇡;rµ0...rµK ;r�0...r�K ]

rµk[j] =�1

N

N�1X

n=0

frac ⇤ (Xn[j]� µk[j])

(�k[j])2

r�k[j] =�1

N

N�1X

n=0

frac ⇤⇢(Xn[j]� µk[j])2

(�k[j])3� 1

�k[j]

r⇡k =1

N

8>>>>><

>>>>>:

N�1X

n=0

|frac| ⇤

2

666664

1

|⇡k|� 1

KX

k=1

|⇡k|

3

777775

9>>>>>=

>>>>>;

�2 ⇤ (

KX

k=1

|⇡k|� 1) ⇤ ⇡k

|⇡k|

where frac = getProbOfClusterK(k,Xn, w)

Thursday, March 20, 14

Page 23: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Examples other than linear regression:

Perceptron: wi+1 = wi + ↵iyj�(xj) if yjwT�(xj) 0

= wi otherwise

Adaline: wi+1 = wi + ↵i(yj � w

T�(xj))�(xj)

K-means: k⇤ = arg mink(zi � wk)2

nk⇤ = nk⇤ + 1

wk⇤ = wk⇤ +1

nk⇤(zi � wk⇤)

Lasso: ... regularized least squaresL = �|w|1 +

1

2(y � w

T�(x))2

Thursday, March 20, 14

Page 24: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 25: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 26: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 27: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- Constant learning rate

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 28: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- Constant learning rate

- Start with small learning rate

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 29: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- Constant learning rate

- Start with small learning rate => Increase if successive gradients in same direction, else decrease.wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 30: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- Constant learning rate

- Start with small learning rate => Increase if successive gradients in same direction, else decrease.

- Monitor loss on validation set and increase/decrease rate accordingly

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 31: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- Constant learning rate

- Start with small learning rate => Increase if successive gradients in same direction, else decrease.

- Monitor loss on validation set and increase/decrease rate accordingly

- Find learning rate through line search on sample dataset

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 32: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- Constant learning rate

- Start with small learning rate => Increase if successive gradients in same direction, else decrease.

- Monitor loss on validation set and increase/decrease rate accordingly

- Find learning rate through line search on sample dataset

- Decrease rate according to heuristic formula:

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 33: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- Constant learning rate

- Start with small learning rate => Increase if successive gradients in same direction, else decrease.

- Monitor loss on validation set and increase/decrease rate accordingly

- Find learning rate through line search on sample dataset

- Decrease rate according to heuristic formula:

↵i =↵0

1 + ↵0�i

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 34: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 35: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- HogWild [Niu 2011]

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 36: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- HogWild [Niu 2011] => If data is sparse, just keep updating w without any lock.

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 37: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- HogWild [Niu 2011] => If data is sparse, just keep updating w without any lock.

- Distributed SGD [Gemulla 2011] wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 38: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 39: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- B = entire dataset => Batch GD

wi+1 = wi � ↵

8<

:1

n

nX

j=1

rL(wi;xj , yj)

9=

;wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 40: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

- B = 1 random block

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- B = entire dataset => Batch GD

wi+1 = wi � ↵

8<

:1

n

nX

j=1

rL(wi;xj , yj)

9=

;wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 41: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

- B = 1 random block => sum over b data items instead of n

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- B = entire dataset => Batch GD

wi+1 = wi � ↵

8<

:1

n

nX

j=1

rL(wi;xj , yj)

9=

;wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 42: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

- B = 1 random block => sum over b data items instead of n => Mini-batch GD

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- B = entire dataset => Batch GD

wi+1 = wi � ↵

8<

:1

n

nX

j=1

rL(wi;xj , yj)

9=

;wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 43: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

- B = 1 random block => sum over b data items instead of n => Mini-batch GD

- B = 1 random datapoint

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- B = entire dataset => Batch GD

wi+1 = wi � ↵

8<

:1

n

nX

j=1

rL(wi;xj , yj)

9=

;wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 44: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

- B = 1 random block => sum over b data items instead of n => Mini-batch GD

- B = 1 random datapoint => Stochastic GD

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- B = entire dataset => Batch GD

wi+1 = wi � ↵

8<

:1

n

nX

j=1

rL(wi;xj , yj)

9=

;wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 45: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

- B = 1 random block => sum over b data items instead of n => Mini-batch GD

- B = 1 random datapoint => Stochastic GD => Not suitable for MapReduce

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- B = entire dataset => Batch GD

wi+1 = wi � ↵

8<

:1

n

nX

j=1

rL(wi;xj , yj)

9=

;wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 46: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Related work (variants of GD)Gradient descent setup:Initialize while not converged {

}

w0

learning rateEstimate of gradient over the loss using datapoints in set B

- B = 1 random block => sum over b data items instead of n => Mini-batch GD

- B = 1 random datapoint => Stochastic GD => Not suitable for MapReduce

3 key areas:1. Vary learning rate2. Update different dimensions of w in parallel3. Vary B

- B = entire dataset => Batch GD

wi+1 = wi � ↵

8<

:1

n

nX

j=1

rL(wi;xj , yj)

9=

;wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 47: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

Thursday, March 20, 14

Page 48: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

Use entire dataset

Use one block

Thursday, March 20, 14

Page 49: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 50: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 51: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 52: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 53: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 54: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 55: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 56: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 57: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 58: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 59: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 60: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 61: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 62: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 63: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 64: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 65: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 66: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 67: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 68: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 69: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 70: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 71: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 72: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 73: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 74: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 75: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 76: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

Thursday, March 20, 14

Page 77: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Effect of #blocks on performance

Batch GD Mini-batch GD

Gradient descent setup:Initialize while not converged {

}

w0

wi+1 = wi � ↵ E[rLj2B ]

w0w0 w0

Use entire dataset

Use one block

Hybrid approach

Use blockski

My contribution: Model this stopping criteria in

principled approach

Thursday, March 20, 14

Page 78: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

Thursday, March 20, 14

Page 79: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

Eg: avg(employee salary)

Thursday, March 20, 14

Page 80: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

Thursday, March 20, 14

Page 81: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

Thursday, March 20, 14

Page 82: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600]10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

Thursday, March 20, 14

Page 83: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100]

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

Thursday, March 20, 14

Page 84: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100][999, 1001]

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

Thursday, March 20, 14

Page 85: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100][999, 1001]

...

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

Thursday, March 20, 14

Page 86: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100][999, 1001]

...[999.5, 1000.4]

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

Thursday, March 20, 14

Page 87: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100][999, 1001]

...[999.5, 1000.4][999.8, 1000.3]

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

Thursday, March 20, 14

Page 88: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100][999, 1001]

...[999.5, 1000.4][999.8, 1000.3][999.9, 1000.1]

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

Thursday, March 20, 14

Page 89: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100][999, 1001]

...[999.5, 1000.4][999.8, 1000.3][999.9, 1000.1]

...

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

Thursday, March 20, 14

Page 90: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100][999, 1001]

...[999.5, 1000.4][999.8, 1000.3][999.9, 1000.1]

...[1000, 1000]

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

Thursday, March 20, 14

Page 91: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100][999, 1001]

...[999.5, 1000.4][999.8, 1000.3][999.9, 1000.1]

...[1000, 1000]

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

Online

Thursday, March 20, 14

Page 92: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100][999, 1001]

...[999.5, 1000.4][999.8, 1000.3][999.9, 1000.1]

...[1000, 1000]

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

OnlineAggregation

Thursday, March 20, 14

Page 93: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100][999, 1001]

...[999.5, 1000.4][999.8, 1000.3][999.9, 1000.1]

...[1000, 1000]

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

OnlineAggregation

My previous work

Thursday, March 20, 14

Page 94: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100][999, 1001]

...[999.5, 1000.4][999.8, 1000.3][999.9, 1000.1]

...[1000, 1000]

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

OnlineAggregation

My previous work

Idea: If acceptable accuracy reached early, query can be stopped

Thursday, March 20, 14

Page 95: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

High-level intuition about “principled approach”

User issues aggregate query MapReduce

OLA returns[500, 1600][900, 1100][999, 1001]

...[999.5, 1000.4][999.8, 1000.3][999.9, 1000.1]

...[1000, 1000]

10 seconds

10 hours

2 minutes

Typical MapReduce pgm returns

1000

Eg: avg(employee salary)

OnlineAggregation

My previous work

Idea: If acceptable accuracy reached early, query can be stopped

✓j+1 = ✓j � ↵bX

i=0

rfi

Thursday, March 20, 14

Page 96: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Summary: Scalable & Fast Machine Learning

- Work in Progress

- Use OLA to implement GD on MapReduce

=> Improve performance of GD

=> Improve performance of Machine Learning

- Scalability using MapReduce

Thursday, March 20, 14

Page 97: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Modeling challenges:

- Modeling surface- Modeling random walk

Thursday, March 20, 14

Page 98: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Vary B ... little more detail

Thursday, March 20, 14

Page 99: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Vary B ... little more detail

Batch GD

}

InitializeWhile not converged {

✓0

✓j+1 = ✓j � ↵nX

i=0

rfi ✓j+1 = ✓j � ↵bX

i=0

rfi

}

InitializeWhile not converged {

✓0

Mini-Batch GD

b ⌧ nRandom block b

Random block

Entire dataset

}

InitializeWhile not converged {

✓0

Stochastic GD

Random datapoint r

✓j+1 = ✓j � ↵rfr

* Ignoring divisor & decreasing learning rate for readabilityAlso, I will be using the variable “f” to represent loss.

Thursday, March 20, 14

Page 100: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Vary B ... little more detail

Batch GD

}

InitializeWhile not converged {

✓0

✓j+1 = ✓j � ↵nX

i=0

rfi ✓j+1 = ✓j � ↵bX

i=0

rfi

}

InitializeWhile not converged {

✓0

Mini-Batch GD

b ⌧ nRandom block b

Random block

Entire dataset

Not suitable for MapReduce

}

InitializeWhile not converged {

✓0

Stochastic GD

Random datapoint r

✓j+1 = ✓j � ↵rfr

* Ignoring divisor & decreasing learning rate for readabilityAlso, I will be using the variable “f” to represent loss.

Thursday, March 20, 14

Page 101: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Vary B ... little more detail

Batch GD

}

InitializeWhile not converged {

✓0

✓j+1 = ✓j � ↵nX

i=0

rfi ✓j+1 = ✓j � ↵bX

i=0

rfi

}

InitializeWhile not converged {

✓0

Mini-Batch GD

b ⌧ nRandom block b

Random block

Entire dataset

}

InitializeWhile not converged {

✓0

Stochastic GD

Random datapoint r

✓j+1 = ✓j � ↵rfr

* Ignoring divisor & decreasing learning rate for readabilityAlso, I will be using the variable “f” to represent loss.

Thursday, March 20, 14

Page 102: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Vary B ... little more detail

Batch GD

}

InitializeWhile not converged {

✓0

✓j+1 = ✓j � ↵nX

i=0

rfi ✓j+1 = ✓j � ↵bX

i=0

rfi

}

InitializeWhile not converged {

✓0

Mini-Batch GD

b ⌧ nRandom block b

Random block

Entire dataset

* Ignoring divisor & decreasing learning rate for readabilityAlso, I will be using the variable “f” to represent loss.

Thursday, March 20, 14

Page 103: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Vary B ... little more detail

Batch GD

}

InitializeWhile not converged {

✓0

✓j+1 = ✓j � ↵nX

i=0

rfi ✓j+1 = ✓j � ↵bX

i=0

rfi

}

InitializeWhile not converged {

✓0

Mini-Batch GD

b ⌧ nRandom block b

Random block

Entire dataset

* Ignoring divisor & decreasing learning rate for readabilityAlso, I will be using the variable “f” to represent loss.

Thursday, March 20, 14

Page 104: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Vary B ... little more detail

Batch GD

}

InitializeWhile not converged {

✓0

✓j+1 = ✓j � ↵nX

i=0

rfi ✓j+1 = ✓j � ↵bX

i=0

rfi

}

InitializeWhile not converged {

✓0

Mini-Batch GDb ⌧ n

Random blockEntire dataset

Thursday, March 20, 14

Page 105: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Vary B ... little more detail

Batch GD

}

InitializeWhile not converged {

✓0

✓j+1 = ✓j � ↵nX

i=0

rfi ✓j+1 = ✓j � ↵bX

i=0

rfi

}

InitializeWhile not converged {

✓0

Mini-Batch GDb ⌧ n

Random blockEntire dataset

Thursday, March 20, 14

Page 106: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Vary B ... little more detail

Batch GD

}

InitializeWhile not converged {

✓0

✓j+1 = ✓j � ↵nX

i=0

rfi ✓j+1 = ✓j � ↵bX

i=0

rfi

}

InitializeWhile not converged {

✓0

Mini-Batch GDb ⌧ n

Random blockEntire dataset

Thursday, March 20, 14

Page 107: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Vary B ... little more detail

Batch GD

}

InitializeWhile not converged {

✓0

✓j+1 = ✓j � ↵nX

i=0

rfi ✓j+1 = ✓j � ↵bX

i=0

rfi

}

InitializeWhile not converged {

✓0

Mini-Batch GDb ⌧ n

Random blockEntire dataset- More accurate “f”- Less iterations of while loops- Each iteration takes long time

- Less accurate “f”- More iterations of while loops- Each iteration takes much less time

Thursday, March 20, 14

Page 108: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Vary B ... little more detail

Batch GD

}

InitializeWhile not converged {

✓0

✓j+1 = ✓j � ↵nX

i=0

rfi ✓j+1 = ✓j � ↵bX

i=0

rfi

}

InitializeWhile not converged {

✓0

Mini-Batch GDb ⌧ n

Random blockEntire dataset- More accurate “f”- Less iterations of while loops- Each iteration takes long time

- Less accurate “f”- More iterations of while loops- Each iteration takes much less time

Key idea: Proceed to next iteration if at least “k” datapoints processed

Thursday, March 20, 14

Page 109: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

Vary B ... little more detail

Batch GD

}

InitializeWhile not converged {

✓0

✓j+1 = ✓j � ↵nX

i=0

rfi ✓j+1 = ✓j � ↵bX

i=0

rfi

}

InitializeWhile not converged {

✓0

Mini-Batch GDb ⌧ n

Random blockEntire dataset- More accurate “f”- Less iterations of while loops- Each iteration takes long time

- Less accurate “f”- More iterations of while loops- Each iteration takes much less time

Key idea: Proceed to next iteration if at least “k” datapoints processedwhere k is found using a bayesian model (Stopping criteria)

Thursday, March 20, 14

Page 110: Scalable and Fast Machine Learning - Rice University · Scalable and Fast Machine Learning By Niketan Pansare (np6@rice.edu) Thursday, March 20, 14

References- http://www.eetimes.com/document.asp?doc_id=1273834- http://www.youtube.com/

Thursday, March 20, 14