Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
-
Upload
gianmario-spacagna -
Category
Documents
-
view
787 -
download
0
Transcript of Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
![Page 1: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/1.jpg)
Parallel auto-tuning of machine learning algorithms Gianmario Spacagna [email protected]
16 October 2012
(877) 769-3047 (408) 404-0152 fax [email protected]
AgilOne, Inc. 1091 N Shoreline Blvd. #250 Mountain View, CA 94043
![Page 2: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/2.jpg)
Motivation • Increase revenue of cloud service providers à Keep cost curve linear w.r.t. the expected exponential income growth.
• Technically achievable through Scalability: • Scalability in terms of resources à Distributed Parallel
Computing (Hadoop). • Scalability in terms of multi-tenancy à Same system
running for several customers. • Scalability in terms of auto-configuration à Avoiding manual tuning up operations.
2
Income Cost
![Page 3: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/3.jpg)
Good Work Flow
3
Good data
ML Algorithm
Good results!
Tuning (Adjusting configuration)
![Page 4: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/4.jpg)
General Tuning diagram
4
Test Data
Run algorithm with conf. X
Are results good?
Tuned
Change configuration
X
yes
no
![Page 5: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/5.jpg)
Tuning of Machine Learning Algorithms
• We need tuning when: • New algorithm or version is released. • We want to improve accuracy and/or performance. • New customer comes and the system must be customized for the
new dataset and requirements.
We need to make it smart, automatic and scalable!
5
![Page 6: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/6.jpg)
Vision
6
Magic Box
Request: • Data set • Application
(prediction, clustering, classification…) • Algorithm
(ANN, LR, K-means…) • Fitness metrics
(Std. dev, Prob. of false true, clustering coeff., randomness…)
• Goal constraints (x> 0.9 & 0.3<y<0.5)
Response: • Best algorithm • Optimal
configuration • Metrics
evaluation
![Page 7: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/7.jpg)
Architecture Design
7
Initializer
Upper Applications API
Controller
Scheduler
Executor ANN
Hadoop Cloud Service
Executor LR
Executor K-Means
Evaluator
Data Sampler
Evaluator
Data Sampler
Evaluator
Data Sampler
Local
![Page 8: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/8.jpg)
Upper Applications API
Tasks: • Interfaces the communication between the system and the upper applications layer.
• Parse requests and results and generates the related output domain object.
Possible data format: • JSON • STDIN/OUT
8
![Page 9: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/9.jpg)
Initializer
Tasks: • Generates the initial set of configuration.
Possible implementations: • Random points • Latin Hyper Cube
• Dataset similarity
9
![Page 10: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/10.jpg)
Controller
Tasks: • Compares and generates configurations.
• Decides the convergence of the tuning.
• Adapt the data sampling request.
Possible implementations: • Random search • Grid search
• Stochastic Kriging • Genetic Algorithms
10
![Page 11: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/11.jpg)
Scheduler
Tasks: • Checks if the requests are covered by the available services.
• Schedules and parallelizes requests executions.
• Optimizes resources.
• Collects evaluated results.
Possible implementations: • First available • Oldest idle
• Load balanced • Serialized (single node)
11
![Page 12: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/12.jpg)
Executor
Tasks: • Executes the providing algorithm with the specified configuration.
Possible implementations: • Local execution • Hadoop cluster
• Cloud service
12
Sub components: • Evaluator: Evaluates results
standing to the specified fitness metrics.
• Data Sampler: Down and Up sampling of data.
![Page 13: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/13.jpg)
Tuning diagram
13
Test Data
Run algorithm with conf. X
Are results good?
Tuned
Change configuration
X
yes
no
Scheduler, Executor Initializer,
Controller
Test execution Test control
![Page 14: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/14.jpg)
SUNS: Simple, Unclever and Not Scalable
14
Random Points
STDIN/OUT
Random Search – Grid Search
Serialized
Executor K-Means
Local
Evaluator
![Page 15: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/15.jpg)
15
Latin Hyper Cube
STDIN/OUT or JSON
Genetic Algorithm / Stochastic Kriging
Serialized
Executor K-Means
Local
Evaluator
SNS: Smart but Not Scalable
![Page 16: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/16.jpg)
16
Dataset Similarity
STDIN/OUT or JSON
Genetic Algorithm / Stochastic Kriging
Serialized
Executor K-Means
Local
Evaluator
VSNS: Very Smart but Not Scalable
![Page 17: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/17.jpg)
17
Dataset Similarity
STDIN/OUT or JSON
Genetic Algorithm or Stochastic Kriging
First Available
Executor K-Means
Hadoop
Evaluator
VSS: Very Smart and Scalable
![Page 18: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/18.jpg)
18
Dataset Similarity
STDIN/OUT or JSON
Genetic Algorithm or Stochastic Kriging
Load Balanced
Executor K-Means
Hadoop
Evaluator
VSVSO: Very Smart, Very Scalable and Optimized
Data Sampler
![Page 19: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/19.jpg)
Thesis
It is possible to build an intelligent system based on Genetic Algorithm/Stochastic
Kriging that automatically selects and tunes machine learning algorithms, such
as K-Means and LR, parallelizing the work on an Hadoop cluster to scale in a
cost-efficient manner.
19
![Page 20: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/20.jpg)
Project Plan 1. Design the entire application in Scala in a testable and expandable
way.
2. Implement the Genetic Algorithm or the Stochastic Kriging controller. 3. Implement the Latin Hyper Cube initializer.
4. Test with local instance algorithms (K-Means and/or LR).
5. Develop and test at least one algorithm in MapReduce fashion using Hadoop.
6. Test with real AgilOne cluster of servers. 7. Implement the Dataset Similarity initializer.
8. Implement the Dataset Sampler.
20
Order of priorities:
![Page 21: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/21.jpg)
Questions, feedbacks, suggestions?
21
![Page 22: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal](https://reader034.fdocuments.us/reader034/viewer/2022042700/559752131a28abfb5b8b45cf/html5/thumbnails/22.jpg)
Thank you!
22