Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

66
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark Zoltán Zvara [email protected] .hu Gábor Hermann [email protected] This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 688191.

Transcript of Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Page 1: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Building a Large-Scale, Adaptive Recommendation Engine with

Apache Flink and SparkZoltán Zvara

[email protected]ábor Hermann

[email protected]

This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 688191.

Page 2: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

About us• Institute for Computer Science and Control, Hungarian Academy of

Sciences (MTA SZTAKI)• Informatics Laboratory• „Big Data – Momemtum” research group• „Data Mining and Search” research group

• Research group with strong industry ties• Ericsson, Rovio, Portugal Telekom, etc.

Page 3: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Agenda1. Recommendation systems and matrix factorization2. Batch vs. online3. Matrix factorization

1. Online2. Batch + online

4. Solution in Spark & Flink5. Conclusions

Page 4: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Recommendation systems

Page 5: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Recommendation systems

ghermann
coll.filt. kiemlése
Page 6: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Recommendation with matrix factorization

5

1

3

5

2

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

Zoltán rated Rogue One with 5 stars

Page 7: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Recommendation with matrix factorization

𝑈𝐼

𝑈 ∙ 𝐼 ≈𝑅

item vector325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of dramaX factor

0

0

0

0

Latent factors

Zoltán

Gábor

Rogue One Interstellar

Zoltán rated Rogue One with 5 stars

Page 8: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Recommendation with matrix factorization

𝑈𝐼

𝑈 ∙ 𝐼 ≈𝑅

item vector325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of dramaX factor

0

0

0

0

Latent factors

Zoltán

Gábor

Rogue One Interstellar

min𝑢∗ ,𝑖∗

∑(𝑝 ,𝑞 )∈ 𝜅𝑅

(𝑟𝑝𝑞−𝜇−𝑏𝑝−𝑏𝑞−𝑢𝑝 𝑖𝑞)2+¿+𝜆 ∑𝑝∈𝜅𝑈

(‖𝑢𝑝‖2¿+𝑏𝑝

2 )+𝜆 ∑𝑞∈𝜅𝐼

(¿‖𝑖𝑞‖2+𝑏𝑞

2 )

¿¿

Zoltán rated Rogue One with 5 stars

Page 9: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Recommendation with matrix factorization

𝑈𝐼

𝑈 ∙ 𝐼 ≈𝑅

item vector325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of dramaX factor

?

0

0

0

0

Latent factors

Zoltán

Gábor

Rogue One Interstellar

Zoltán rated Rogue One with 5 stars

Would Gábor like Interstellar?

Page 10: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Recommendation with matrix factorization

𝑈𝐼

𝑈 ∙ 𝐼 ≈𝑅

item vector325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of dramaX factor

?

0

0

0

0

Latent factors

Zoltán

Gábor

Rogue One Interstellar

Zoltán rated Rogue One with 5 stars

Would Gábor like Interstellar?

Page 11: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Recommendation with matrix factorization

𝑈𝐼

𝑈 ∙ 𝐼 ≈𝑅

item vector325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of dramaX factor

?

0

0

0

0

Latent factors

Zoltán

Gábor

Rogue One Interstellar

Zoltán rated Rogue One with 5 stars

Would Gábor like Interstellar?

5 4 -4

325

Page 12: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Recommendation with matrix factorization

𝑈𝐼

𝑈 ∙ 𝐼 ≈𝑅

item vector325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of dramaX factor

?

0

0

0

0

Latent factors

Zoltán

Gábor

Rogue One Interstellar

Zoltán rated Rogue One with 5 stars

Would Gábor like Interstellar?

5 4 -4

325

3

Page 13: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Recommendation with matrix factorization

𝑈𝐼

𝑈 ∙ 𝐼 ≈𝑅

item vector325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

Level of actionLevel of dramaX factor

3

0

0

0

0

Latent factors

Zoltán

Gábor

Rogue One Interstellar

Zoltán rated Rogue One with 5 stars

Would Gábor like Interstellar?

5 4 -4

325

3

Page 14: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

[user; item; time; rating]

𝑅Batch training

𝑈𝐼item vector

5

1

3

uservector

5

2

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

PERSISTENT STORAGE

Page 15: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

[user; item; time; rating]

𝑅Batch training

𝑈𝐼item vector

5

1

3

uservector

5

2

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

PERSISTENT STORAGE

Page 16: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

[user; item; time; rating]

𝑅Batch training

𝑈𝐼item vector

325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5

2

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

PERSISTENT STORAGE

Page 17: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Online training

𝑈𝐼item vector

325

532

5 -6 -1

5 4 -4

5

1

3

uservector

5 3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

[user; item; time; rating]

2 5 4 2 4

Page 18: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Online training

𝑈𝐼item vector

326

532

5 -6 -2

5 4 -4

5

1

3

uservector

5

2

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

[user; item; time; rating]

5 4 2 4

Page 19: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Online training

𝑈𝐼item vector

135

532

4 -5 -1

5 4 -4

5

1

3

uservector

5

2

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

[user; item; time; rating]

5 4 2 4

Page 20: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Batch + online combination

Page 21: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

But how to scale?• Spotify streamed 20 billion hours of music in 2015• YouTube over a billion users, billions of video views every day• Use distributed data-analytics frameworks• How can we combine batch + online?

Page 22: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Apache Spark vs. Apache Flink

Page 23: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Distributed online matrix factorization

𝑈𝐼item vector

326

532

5 -6 -2

5 4 -4

1

3

uservector

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

[user; item; time; rating]

2 5 4 2 4

Page 24: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Distributed online matrix factorization

𝑈𝐼item vector

326

532

5 -6 -2

5 4 -4

1

3

uservector

2

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

[user; item; time; rating]

5 4 2 4

Page 25: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Distributed online matrix factorization

𝑈𝐼item vector

326

532

5 -6 -2

5 4 -4

1

3

uservector

2

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

[user; item; time; rating]

5 4 2 4

326

25 -6 -2

need to co-locate

Page 26: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Distributed online matrix factorization

𝑈𝐼item vector

326

532

5 -6 -2

5 4 -4

1

3

uservector

2

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

[user; item; time; rating]

5 4 2 4

135

24 -3 -1

need to co-locatethen update

Page 27: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Distributed online matrix factorization

𝑈𝐼item vector

135

532

4 -5 -1

5 4 -4

1

3

uservector

2

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

[user; item; time; rating]

5 4 2 4

135

24 -3 -1

need to co-locatethen updatesend updates

Page 28: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Distributed online matrix factorization

𝑈𝐼item vector

135

532

4 -5 -1

5 4 -4

5

1

3

uservector

5

2

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

5 4 2 4

process two ratings in parallel

Page 29: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Distributed online matrix factorization

𝑈𝐼item vector

135

532

4 -5 -1

5 4 -4

5

1

3

uservector

5

2

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

5 4 2 4

process two ratings in parallel

Page 30: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

𝑅Distributed online matrix factorization

𝑈𝐼item vector

135

532

4 -5 -1

5 4 -4

5

1

3

uservector

5

2

3

0

0

0

0

Zoltán

Gábor

Rogue One Interstellar

5 4 2 4

process two ratings in parallel

• Concurrent modification• Similar problem with batch SGD• Distributed SGD

(Gemulla et al. 2011)

Page 31: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Spark

val ratings: DStream[Rating] = ...

we have our input

Page 32: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Spark

val ratings: DStream[Rating] = ...

val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] =

we have our input

would like to have output like this

Page 33: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Spark

val ratings: DStream[Rating] = ...

val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] =

we have our input

would like to have output like this

updateStateByKey?

Page 34: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Spark

val ratings: DStream[Rating] = ...

val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] =

we have our input

would like to have output like this

updateStateByKey?Use batch DSGD for online updates!(discussion issue SPARK-6407)

Page 35: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Spark

val ratings: DStream[Rating] = ...

var users: RDD[(UserId, Vector)] = ...var items: RDD[(ItemId, Vector)] = ...

val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] =

we have our input

would like to have output like this

need to represent factor matrices

Page 36: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Spark

val ratings: DStream[Rating] = ...

var users: RDD[(UserId, Vector)] = ...var items: RDD[(ItemId, Vector)] = ...

val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] = ratings.transform { (rs: RDD[Rating]) =>

we have our input

would like to have output like this

use transform to allow RDD operations

need to represent factor matrices

Page 37: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Spark

val ratings: DStream[Rating] = ...

var users: RDD[(UserId, Vector)] = ...var items: RDD[(ItemId, Vector)] = ...

val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] = ratings.transform { (rs: RDD[Rating]) => val updates = batchDSGD(rs, users, items)

we have our input

would like to have output like this

use transform to allow RDD operations

need to represent factor matrices

compute updates

Page 38: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Spark

val ratings: DStream[Rating] = ...

var users: RDD[(UserId, Vector)] = ...var items: RDD[(ItemId, Vector)] = ...

val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] = ratings.transform { (rs: RDD[Rating]) => val updates = batchDSGD(rs, users, items) users = applyUserUpdates(users, updates) items = applyItemUpdates(items, updates) updates }

we have our input

would like to have output like this

use transform to allow RDD operations

need to represent factor matrices

compute updates

apply updates to get updated matrices

Page 39: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Spark• Performance decreases by time

Page 40: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Spark• Performance decreases by time

• Problem: tracking lineage graph• Solution: use checkpointing

Page 41: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Spark• Performance decreases by time

• Problem: tracking lineage graph• Solution: use checkpointing

Page 42: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Flink

uservectors

itemvectors

long-running operators with state

Page 43: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Flink

uservectors

itemvectors

long-running operators with state

backward edge in dataflow (stream loop)

Page 44: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Flink

1. rating event

2

uservectors

itemvectors

Page 45: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Flink

1. rating event 2. rating event & user vector

25 -6 -22

uservectors

itemvectors

Page 46: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Flink

1. rating event 2. rating event & user vector 25 -6 -2

326

25 -6 -22

uservectors

itemvectors

Page 47: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Flink

1. rating event 2. rating event & user vector

3. apply update

225 -6 -22

uservectors

itemvectors

4 -3 -1

135

Page 48: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in Flink

1. rating event 2. rating event & user vector

4. user vector update

3. apply update

225 -6 -22

uservectors

itemvectors

4 -3 -1

135

4 -3 -1

Page 49: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF in FlinkWARNING!Loops API (iterative streams) not mature enough yet,but there is ongoing effort

1. rating event 2. rating event & user vector

4. user vector update

3. apply update

225 -6 -22

uservectors

itemvectors

4 -3 -1

135

4 -3 -1

Page 50: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF: Spark vs. Flink

Page 51: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Combining batch + online in Spark• Easy: can run batch training periodically on whole dataset

Page 52: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Combining batch + online in Flink• Combining Flink Batch API with Streaming API• Could only do it with an external system

Page 53: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Combining batch + online in Flink• Combining Flink Batch API with Streaming API• Could only do it with an external system

• Batch with Streaming API• Feasible!• Asynchronous training

(Schelter et al. 2014)

Page 54: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Combining batch + online in Flink• Combining Flink Batch API with Streaming API• Could only do it with an external system

• Batch with Streaming API• Feasible!• Asynchronous training

(Schelter et al. 2014)

• Batch + online• Both with Streaming API• Share matrices in common state• Parameter Server approach

Page 55: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Lessons learned

Page 56: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Lessons learnedFlink Spark

Implementation More complex solution,harder to implement

Easier to use:could use batch for streaming

Page 57: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Lessons learnedFlink Spark

Implementation More complex solution,harder to implement

Easier to use:could use batch for streaming

Generality Can express finer grained updates Updates limited by mini-batch

Page 58: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Lessons learnedFlink Spark

Implementation More complex solution,harder to implement

Easier to use:could use batch for streaming

Generality Can express finer grained updates Updates limited by mini-batch

Code stability Some parts are not mature enough (e.g. Loops API)

More mature

Page 59: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Lessons learnedFlink Spark

Implementation More complex solution,harder to implement

Easier to use:could use batch for streaming

Generality Can express finer grained updates Updates limited by mini-batch

Code stability Some parts are not mature enough (e.g. Loops API)

More mature

Performance Optimal for online learning,can perform well on batch

Not always optimal for online learning (e.g. online MF)

Page 60: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Lessons learnedFlink Spark

Implementation More complex solution,harder to implement

Easier to use:could use batch for streaming

Generality Can express finer grained updates Updates limited by mini-batch

Code stability Some parts are not mature enough (e.g. Loops API)

More mature

Performance Optimal for online learning,can perform well on batch

Not always optimal for online learning (e.g. online MF)

Handlingdata skew

Currently hard to relocatelong-running operators

Periodic scheduling enables easier modification of partitioning

Page 61: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Lessons learnedFlink Spark

Implementation More complex solution,harder to implement

Easier to use:could use batch for streaming

Generality Can express finer grained updates Updates limited by mini-batch

Code stability Some parts are not mature enough (e.g. Loops API)

More mature

Performance Optimal for online learning,can perform well on batch

Not always optimal for online learning (e.g. online MF)

Handlingdata skew

Currently hard to relocatelong-running operators

Periodic scheduling enables easier modification of partitioning

Machine learning Non-complete ML libraryand other efforts for ML in Flink

Spark MLlib is matureand used in production

Page 62: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Thank you for your attention

Zoltán [email protected]

Gábor [email protected]

Source code:https://github.com/gaborhermann/large-scale-recommendation

Page 63: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Measurements

Page 64: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Batch + online combination• 30M music listening Last.fm dataset• Weekly batch training• Evaluation weekly average• on every incoming listening

• Around 45.000 users

Page 65: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Online MF: Spark vs. Flink• 30M music listening Last.fm dataset read from 12 Kafka partitions• Spark batch duration: 5 sec• Time of processing X ratings• DSGD algorithm

• Using 6 nodes, 4 cores each• Spark 2.1.0, Flink 1.2.0

Page 66: Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark

Batch on Flink Streaming• Movielens 1M movie rating dataset• Using 6 nodes, 4 cores each