A business level introduction to Artificial Intelligence - Louis Dorard @ PAPIs Connect
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs...
-
Upload
papisio -
Category
Technology
-
view
411 -
download
1
Transcript of Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs...
![Page 2: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/2.jpg)
Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
![Page 3: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/3.jpg)
Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
![Page 4: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/4.jpg)
Machine Learning as a System Service
The goal
Machine Learning as a systemlevel service
The means
I APIs: ML building blocks
I Abstraction layer overfeature engineering
I Abstraction layer overalgorithms
I Automation
![Page 5: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/5.jpg)
Machine Learning Workflows
Dr. Natalia Konstantinova (http://nkonst.com/machine-learning-explained-simple-words/)
![Page 6: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/6.jpg)
Machine Learning Workflows for real
Jeannine Takaki, Microsoft Azure Team
![Page 7: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/7.jpg)
Machine Learning Automation Todayfrom bigml.api import BigML
api = BigML()
project = api.create_project({’name’: ’ToyBoost’})
orig_source =
api.create_source(source,
{"name": "ToyBoost",
"project": project[’resource’]})
api.ok(orig_source)
orig_dataset =
api.create_dataset(orig_source, {"name": "Boost"})
api.ok(orig_dataset)
trainset = api.get_dataset(trainset)
for loop in range(0,10):
api.ok(trainset)
model = api.create_model(trainset, {
"name": "ToyBoost - Model%d" % loop,
"objective_fields": ["letter"],
"excluded_fields": ["weight"],
"weight_field": "100011"})
api.ok(model)
batchp =
api.create_batch_prediction(model, trainset, {
"name": "ToyBoost - Result%d" % loop,
"all_fields": True,
"header": True})
api.ok(batchp)
batchp = api.get_batch_prediction(batchp)
batchp_dataset =
api.get_dataset(batchp[’object’])
trainset = api.create_dataset(batchp_dataset, {})
![Page 8: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/8.jpg)
Machine Learning Automation Today
Problems of current solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows hard to optimize
Not enough abstraction
![Page 9: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/9.jpg)
Machine Learning Automation Today
Problems of current solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows hard to optimize
Not enough abstraction
![Page 10: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/10.jpg)
Machine Learning Automation Tomorrow
Solution: Domain-specific languages
![Page 11: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/11.jpg)
Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
![Page 12: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/12.jpg)
Domain-specific Expressions (sexps)
(if (missing? "height")
(random-value "height")
(field "height"))
(window "income" 10)
(within-percentiles? "age" 0.5 0.95)
(cond (> (field "score") (mean "score")) "above average"
(= (field "score") (mean "score")) "below average"
"mediocre")
![Page 13: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/13.jpg)
Domain-specific Expressions (JSON)
["if", ["missing?", "height"],
["random-value", "height"],
["field", "height"]]
["window", "income", 10]
["within-percentiles?", "age", 0.5, 0.95]
["cond", [">", ["field", "score"], ["mean", "score"]], "above average",
["=", ["field", "score"], ["mean", "score"]], "below average",
"mediocre"]
![Page 14: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/14.jpg)
Domain-specific Expressions (sexps)
(if (missing? "height")
(random-value "height")
(field "height"))
(window "income" 10)
(within-percentiles? "age" 0.5 0.95)
(cond (> (field "score") (mean "score")) "above average"
(= (field "score") (mean "score")) "below average"
"mediocre")
![Page 15: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/15.jpg)
Abstraction via the Language
;; (if (missing? "height")
;; (random-value "height")
;; (field "height"))
(ensure-value "height")
(window "income" 10)
(within-percentiles? "age" 0.5 0.95)
;; (cond (> (field "score") (mean "score")) "above average"
;; (= (field "score") (mean "score")) "below average"
;; "mediocre")
(discretize "score" "above above" "below average" "mediocre")
![Page 16: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/16.jpg)
Abstraction via the User Interface
![Page 17: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/17.jpg)
Remote for efficiency and reuse, local for discoverability
![Page 18: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/18.jpg)
Flatline: A DSL for Feature Enginering
I Domain-specific: new fields from an input sliding window asdeclarative expressions
I Simple syntax: JSON → s-expressions
I Efficient: full server-side implementation
I Discoverable: in-browser client-side implementation
I Reusable: the same expressions usable from any languagebinding.
I Bonus: applicable to filtering
![Page 19: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/19.jpg)
Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
![Page 20: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/20.jpg)
Machine Learning Workflows
A DSL for Machine LearningWorkflows?
![Page 21: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/21.jpg)
Machine Learning Workflows
A DSL for Machine LearningWorkflows? Absolutely!
![Page 22: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/22.jpg)
Machine Learning Workflows
Same problems, only worse. . .
Complexity Hairy logic and control-flow
Reuse More complex algorithms and behaviour very hard toport to other languages
Scalability Lots of iterations and intermediate resources veryhard to make efficient on the client side
![Page 23: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/23.jpg)
Machine Learning Workflows
WhizzML, same solution, only better. . .
![Page 24: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/24.jpg)
WhizzML: A sexp-based, domain-specific language
(define apple
"https://s3.amazonaws.com/bigml-public/csv/nasdaq_aapl.csv")
(define source (create-and-wait-source {"remote" apple
"name" "whizz"}))
(define dataset (create-and-wait-dataset {"source" source}))
(define anomaly (create-and-wait-anomaly {"dataset" dataset}))
(define input {"Open" 275 "High" 300 "Low" 250})
(define score
(create-and-wait-anomalyscore {"anomaly" anomaly
"input_data" input}))
(get (fetch score) "score")
![Page 25: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/25.jpg)
WhizzML vs Flatline (as languages)
A better language:
I Better data structures (dictionaries, sets. . . )
I Better control-flow: (tail) recursion, iteration, loops
I Better abstraction: procedures
![Page 26: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/26.jpg)
WhizzML: Lambda Abstraction
Abstraction
(define (score-stock name input)
(let (base "https://s3.amazonaws.com/bigml-public/csv"
stock (str base "/" name)
source (create-and-wait-source {"remote" stock})
dataset (create-and-wait-dataset {"source" source})
anomaly (create-and-wait-anomaly {"dataset" dataset}))
(create-and-wait-anomalyscore {"anomaly" anomaly
"input_data" input})))
![Page 27: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/27.jpg)
WhizzML: Reusable Procedures
Abstraction
(score-stock "aapl" {"Open" 275 "High" 300 "Low" 250})
![Page 28: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/28.jpg)
WhizzML: Server-side fortes
A better server-side:
I Better reusability: scripts, executions and libraries asfirst-class ML resources
I Higher efficiency gains: automatic parallelism
I More opportunities for UI extensions
![Page 29: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/29.jpg)
WhizzML Source Code as a Machine Learning Resource
{"library":{
"imports":["12343addb343f2890f23492d"],
"source_code": "(define (mu2) (mu (g 3 8)))",
"exports": [{"name": "mu2", "signature": []}]}}
{"script":{
"parameters": [{"name": "remote_uri", "type": "string"},
{"name": "timeout", "type": "number",
"default": 10000}],
"source_code":
"(define id (create-source {\"remote\" remote_uri}))
(wait id timeout)",
"outputs": [{"name": "id", "type": "source-id"}]}}
Rich metadata, reuse and shareability of WhizzML code
![Page 30: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/30.jpg)
Executions as a Machine Learning Resource
{"execution": {"script_id": "1a2232bf3498f95dde",
"username": "bittwidler",
"tlp": 4,
"resource_limits": {"total": 50,
"source": 10,
"dataset": 5,
"model": 10},
"max_exection_time": 3600,
"max_execution_steps": 10000,
"max_recursion_depth": 1024}}
![Page 31: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/31.jpg)
Executions as a Machine Learning Resource
{"execution": {"script_id": "1a2232bf3498f95dde","username": "bittwidler","tlp": 4,"resource_limits": {"total": 50,
"source": 10,"dataset": 5,"model": 10},
"max_exection_time": 3600,"max_execution_steps": 10000,"max_recursion_depth": 1024}}
![Page 32: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/32.jpg)
WhizzML: Client-side fortes
A better client-side:
I Better interactive experience: read-eval-print loop
I Scripts usable from the user’s machine
I Interoperability: Java, JavaScript and NodeJS REPLs
I Challenge: behaviourial coherence between server and clientsides
![Page 33: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/33.jpg)
Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
![Page 34: Automating Machine Learning Workflows: A Report from the Trenches - Jose A. Ortega Ruiz @ PAPIs Connect](https://reader031.fdocuments.us/reader031/viewer/2022030305/58720df11a28ab176b8b7eab/html5/thumbnails/34.jpg)
Challenges
Solved
I Local REPL and remote shared implementation
I Automatic parallelization
I Error reporting
I Traceability: stack traces and stepwise execution
Open
I Better error management (dynamic typing, type inferencer)
I Resumable workflows
I Data locality: optimizing repeated access to the same datasets