FlinkML: Large Scale Machine Learning with Apache Flink
-
Upload
theodoros-vasiloudis -
Category
Technology
-
view
646 -
download
4
Transcript of FlinkML: Large Scale Machine Learning with Apache Flink
![Page 1: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/1.jpg)
FlinkML: Large-scale Machine Learning with Apache FlinkTheodore Vasiloudis, SICS
SICS Data Science DayOctober 21st, 2015
![Page 2: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/2.jpg)
Apache Flink
![Page 3: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/3.jpg)
What is Apache Flink?
● Large-scale data processing engine● Easy and powerful APIs for batch and real-time streaming analysis● Backed by a very robust execution backend
○ true streaming dataflow engine○ custom memory manager○ native iterations○ cost-based optimizer
![Page 4: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/4.jpg)
What is Apache Flink?
![Page 5: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/5.jpg)
What does Flink give us?
● Expressive APIs● Pipelined stream processor● Closed loop iterations
![Page 6: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/6.jpg)
Expressive APIs
● Main distributed data abstraction: DataSet● Program using functional-style transformations, creating a Dataflow.
case class Word(word: String, frequency: Int)
val lines: DataSet[String] = env.readTextFile(...)
lines.flatMap(line => line.split(“ “).map(word => Word(word, 1)).groupBy(“word”).sum(“frequency”).print()
![Page 7: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/7.jpg)
Pipelined Stream Processor
![Page 8: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/8.jpg)
Iterate in the Dataflow
![Page 9: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/9.jpg)
Iterate by looping
● Loop in client submits one job per iteration step● Reuse data by caching in memory or disk
![Page 10: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/10.jpg)
Iterate in the Dataflow
![Page 11: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/11.jpg)
Delta iterations
![Page 12: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/12.jpg)
Delta iterations
Learn more in Vasia’s Gelly talk!
![Page 13: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/13.jpg)
Large-scale Machine Learning
![Page 14: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/14.jpg)
What do we mean?
![Page 15: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/15.jpg)
What do we mean?
● Small-scale learning ● Large-scale learning
Source: Léon Bottou
![Page 16: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/16.jpg)
What do we mean?
● Small-scale learning○ We have a small-scale learning problem
when the active budget constraint is the number of examples.
● Large-scale learning
Source: Léon Bottou
![Page 17: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/17.jpg)
What do we mean?
● Small-scale learning○ We have a small-scale learning problem
when the active budget constraint is the number of examples.
● Large-scale learning○ We have a large-scale learning problem
when the active budget constraint is the computing time.
Source: Léon Bottou
![Page 18: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/18.jpg)
What do we mean?
● What about the complexity of the problem?
![Page 19: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/19.jpg)
What do we mean?
● What about the complexity of the problem?
Source: Wired Magazine
![Page 20: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/20.jpg)
Deep learning
![Page 21: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/21.jpg)
What do we mean?
● What about the complexity of the problem?
“When you get to a trillion [parameters], you’re getting to something that’s got a chance of really understanding some stuff.” - Hinton, 2013
Source: Wired Magazine
![Page 22: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/22.jpg)
What do we mean?
● We have a large-scale learning problem when the active budget constraint is the computing time and/or the model complexity.
![Page 23: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/23.jpg)
FlinkML
![Page 24: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/24.jpg)
FlinkML
● New effort to bring large-scale machine learning to Flink
![Page 25: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/25.jpg)
FlinkML
● New effort to bring large-scale machine learning to Flink● Goals:
○ Truly scalable implementations○ Keep glue code to a minimum○ Ease of use
![Page 26: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/26.jpg)
FlinkML: Overview
● Supervised Learning○ Optimization framework○ SVM○ Multiple linear regression
![Page 27: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/27.jpg)
FlinkML: Overview
● Supervised Learning○ Optimization framework○ SVM○ Multiple linear regression
● Recommendation○ Alternating Least Squares (ALS)
![Page 28: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/28.jpg)
FlinkML: Overview
● Supervised Learning○ Optimization framework○ SVM○ Multiple linear regression
● Recommendation○ Alternating Least Squares (ALS)
● Pre-processing○ Polynomial features○ Feature scaling
![Page 29: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/29.jpg)
FlinkML: Overview
● Supervised Learning○ Optimization framework○ SVM○ Multiple linear regression
● Recommendation○ Alternating Least Squares (ALS)
● Pre-processing○ Polynomial features○ Feature scaling
● sklearn-like ML pipelines
![Page 30: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/30.jpg)
FlinkML API
// LabeledVector is a feature vector with a label (class or real value)val trainingData: DataSet[LabeledVector] = ...val testingData: DataSet[Vector] = ...
![Page 31: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/31.jpg)
FlinkML API
// LabeledVector is a feature vector with a label (class or real value)val trainingData: DataSet[LabeledVector] = ...val testingData: DataSet[Vector] = ...
val mlr = MultipleLinearRegression() .setStepsize(0.01) .setIterations(100) .setConvergenceThreshold(0.001)
![Page 32: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/32.jpg)
FlinkML API
// LabeledVector is a feature vector with a label (class or real value)val trainingData: DataSet[LabeledVector] = ...val testingData: DataSet[Vector] = ...
val mlr = MultipleLinearRegression() .setStepsize(0.01) .setIterations(100) .setConvergenceThreshold(0.001)
mlr.fit(trainingData)
![Page 33: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/33.jpg)
FlinkML API
// LabeledVector is a feature vector with a label (class or real value)val trainingData: DataSet[LabeledVector] = ...val testingData: DataSet[Vector] = ...
val mlr = MultipleLinearRegression() .setStepsize(0.01) .setIterations(100) .setConvergenceThreshold(0.001)
mlr.fit(trainingData)
// The fitted model can now be used to make predictionsval predictions: DataSet[LabeledVector] = mlr.predict(testingData)
![Page 34: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/34.jpg)
FlinkML Pipelines
val scaler = StandardScaler()val polyFeatures = PolynomialFeatures().setDegree(3)val mlr = MultipleLinearRegression()
![Page 35: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/35.jpg)
FlinkML Pipelines
val scaler = StandardScaler()val polyFeatures = PolynomialFeatures().setDegree(3)val mlr = MultipleLinearRegression()
// Construct pipeline of standard scaler, polynomial features and multiple linear // regressionval pipeline = scaler.chainTransformer(polyFeatures).chainPredictor(mlr)
![Page 36: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/36.jpg)
FlinkML Pipelines
val scaler = StandardScaler()val polyFeatures = PolynomialFeatures().setDegree(3)val mlr = MultipleLinearRegression()
// Construct pipeline of standard scaler, polynomial features and multiple linear // regressionval pipeline = scaler.chainTransformer(polyFeatures).chainPredictor(mlr)
// Train pipelinepipeline.fit(trainingData)
// Calculate predictionsval predictions = pipeline.predict(testingData)
![Page 37: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/37.jpg)
State of the art in large-scale ML
![Page 38: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/38.jpg)
Alternating Least Squares
R ≅ X Y✕Users
Items
![Page 39: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/39.jpg)
Naive Alternating Least Squares
![Page 40: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/40.jpg)
Blocked Alternating Least Squares
![Page 41: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/41.jpg)
Blocked ALS performance
FlinkML blocked ALS performance
![Page 42: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/42.jpg)
Going beyond SGD in large-scale optimization
![Page 43: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/43.jpg)
● Beyond SGD → Use Primal-Dual framework
● Slow updates → Immediately apply local updates
● Average over batch size → Average over K (nodes) << batch size
CoCoA: Communication Efficient Coordinate Ascent
![Page 44: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/44.jpg)
Primal-dual framework
Source: Smith (2014)
![Page 45: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/45.jpg)
Primal-dual framework
Source: Smith (2014)
![Page 46: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/46.jpg)
Immediately Apply Updates
Source: Smith (2014)
![Page 47: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/47.jpg)
Immediately Apply Updates
Source: Smith (2014)Source: Smith (2014)
![Page 48: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/48.jpg)
Average over nodes (K) instead of batches
Source: Smith (2014)
![Page 49: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/49.jpg)
CoCoA: Communication Efficient Coordinate Ascent
![Page 50: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/50.jpg)
CoCoA performance
Source:Jaggi (2014)
![Page 51: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/51.jpg)
CoCoA performance
Available on FlinkML
SVM
![Page 52: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/52.jpg)
Achieving model parallelism:The parameter server
● The parameter server is essentially a distributed key-value store with two
basic commands: push and pull○ push updates the model
○ pull retrieves a (lazily) updated model
● Allows us to store a model into multiple nodes, read and update it as
needed.
![Page 53: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/53.jpg)
Architecture of a parameter server communicating with groups of workers.
Source: Li (2014)
![Page 54: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/54.jpg)
Comparison with other large-scale learning systems.
Source: Li (2014)
![Page 55: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/55.jpg)
Dealing with stragglers: SSP Iterations
![Page 56: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/56.jpg)
● BSP: Bulk Synchronous parallel○ Every worker needs to wait for the others to finish before starting the next iteration
Dealing with stragglers: SSP Iterations
![Page 57: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/57.jpg)
● BSP: Bulk Synchronous parallel○ Every worker needs to wait for the others to finish before starting the next iteration
● ASP: Asynchronous parallel○ Every worker can work individually, update model as needed.
Dealing with stragglers: SSP Iterations
![Page 58: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/58.jpg)
● BSP: Bulk Synchronous parallel○ Every worker needs to wait for the others to finish before starting the next iteration
● ASP: Asynchronous parallel○ Every worker can work individually, update model as needed.○ Can be fast, but can often diverge.
Dealing with stragglers: SSP Iterations
![Page 59: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/59.jpg)
● BSP: Bulk Synchronous parallel○ Every worker needs to wait for the others to finish before starting the next iteration
● ASP: Asynchronous parallel○ Every worker can work individually, update model as needed.○ Can be fast, but can often diverge.
● SSP: State Synchronous parallel○ Relax constraints, so slowest workers can be up to K iterations behind fastest ones.
Dealing with stragglers: SSP Iterations
![Page 60: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/60.jpg)
● BSP: Bulk Synchronous parallel○ Every worker needs to wait for the others to finish before starting the next iteration
● ASP: Asynchronous parallel○ Every worker can work individually, update model as needed.○ Can be fast, but can often diverge.
● SSP: State Synchronous parallel○ Relax constraints, so slowest workers can be up to K iterations behind fastest ones.○ Allows for progress, while keeping convergence guarantees.
Dealing with stragglers: SSP Iterations
![Page 61: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/61.jpg)
Dealing with stragglers: SSP Iterations
Source: Ho et al. (2013)
![Page 62: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/62.jpg)
SSP Iterations in Flink: Lasso Regression
Source: Peel et al. (2015)
![Page 63: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/63.jpg)
SSP Iterations in Flink: Lasso Regression
Source: Peel et al. (2015)
To be merged soon
into FlinkML
![Page 64: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/64.jpg)
Current and future work on FlinkML
![Page 65: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/65.jpg)
Coming soon
● Tooling○ Evaluation & cross-validation framework○ Predictive Model Markup Language
● Algorithms○ Quad-tree kNN search○ Efficient streaming decision trees○ k-means and extensions○ Colum-wise statistics, histograms
![Page 66: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/66.jpg)
FlinkML Roadmap
● Hyper-parameter optimization● More communication-efficient optimization algorithms● Generalized Linear Models● Latent Dirichlet Allocation
![Page 67: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/67.jpg)
Future of Machine Learning on Flink
● Streaming ML○ Flink already has SAMOA bindings.○ We plan to kickstart the streaming ML library of Flink, and develop new algorithms.
![Page 68: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/68.jpg)
Future of FlinkML
● Streaming ML○ Flink already has SAMOA bindings.○ We plan to kickstart the streaming ML library of Flink, and develop new algorithms.
● “Computation efficient” learning○ Utilize hardware and develop novel systems and algorithms to achieve large-scale learning
with modest computing resources.
![Page 69: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/69.jpg)
Recent large-scale learning systems
Source: Xing (2015)
![Page 70: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/70.jpg)
Recent large-scale learning systems
Source: Xing (2015)
How to get here?
![Page 71: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/71.jpg)
Demo?
![Page 72: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/72.jpg)
Thank you.
![Page 73: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/73.jpg)
References
● Flink Project: flink.apache.org● FlinkML Docs: https://ci.apache.org/projects/flink/flink-docs-master/libs/ml/● Leon Botou: Learning with Large Datasets● Wired: Computer Brain Escapes Google's X Lab to Supercharge Search● Smith: CoCoA AMPCAMP Presentation● CMU Petuum: Petuum Project● Jaggi (2014): “Communication-efficient distributed dual coordinate ascent." NIPS 2014.● Li (2014): "Scaling distributed machine learning with the parameter server." OSDI 2014.● Ho (2013): "More effective distributed ML via a stale synchronous parallel parameter server." NIPS
2013.● Peel (2015): “Distributed Frank-Wolfe under Pipelined Stale Synchronous Parallelism”, IEEE BigData
2015● Xing (2015): “Petuum: A New Platform for Distributed Machine Learning on Big Data”, KDD 2015
I would like to thank professor Eric Xing for his permission to use parts of the structure from his great tutorial on large-scale machine learning: A New Look at the System, Algorithm and Theory Foundations of Distributed Machine Learning
![Page 74: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/74.jpg)
“Demo”
![Page 75: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/75.jpg)
![Page 76: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/76.jpg)
“Demo”
![Page 77: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/77.jpg)
“Demo”
![Page 78: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/78.jpg)
“Demo”
![Page 79: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/79.jpg)
“Demo”
![Page 80: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/80.jpg)
“Demo”
![Page 81: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/81.jpg)
“Demo”
![Page 82: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/82.jpg)
![Page 83: FlinkML: Large Scale Machine Learning with Apache Flink](https://reader031.fdocuments.us/reader031/viewer/2022021502/587138d61a28abf0568b6469/html5/thumbnails/83.jpg)
“Demo”