DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine...
Transcript of DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine...
![Page 1: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/1.jpg)
DAWN: Infrastructure for Usable Machine LearningPeter Bailis, Kunle Olukotun, Chris Ré, Matei Zaharia
![Page 2: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/2.jpg)
It’s the Golden Age of DataIncredible advances in image recognition, natural language processing, planning, info retrieval
Society-scale impact: autonomous vehicles, personalized medicine, human trafficking
No end in sight for advances in ML
*
*for the best-funded, best-trained engineering teams
![Page 3: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/3.jpg)
Building ML Products is Too Hard
Major successes (e.g., AlphaGo, ImageNet) require hundreds to thousands of engineers
Huge effort in data preparation, model tuning, experimentation, and productionizing
Domain experts cannot easily or cheaply build ML products
![Page 4: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/4.jpg)
“Only a fraction of real-world ML systemsis composed of ML code”
![Page 5: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/5.jpg)
The DAWN QuestionWhat if anyone with domain expertise could build their own production-quality ML products?• Without a PhD in machine learning• Without being an expert in systems• Without understanding the latest hardware
It’s happened before
![Page 6: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/6.jpg)
It’s happened before: SearchBefore: Decades of research on information retrieval, indexes, ranking, etc
After: any developer can add search to an application by linking a library (e.g. Solr, Lucene); everyone (i.e., non-expert users) uses search
![Page 7: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/7.jpg)
It’s happened before: SQLBefore: raw access to disk, manual layout of records, network databases (CODASYL)
After: SQL forms basis for transactional engines, data warehousing, business intelligence tools
Key idea: end-to-end systems that tackle the barriers to access & production use
![Page 8: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/8.jpg)
The DAWN StackData Acquisition Feature Engineering Model Training Productionizing
Inte
rface
sAl
gorit
hms
Syst
ems
Hard
war
e
…
Snorkel
DeepDive
MacroBase (Streaming Data)
NoScope (Video)
AutoRec, SimDex (Recommendation)
Data Fusion
Mulligan (SQL+graph+ML)
CPU GPU FPGA Cluster Mobile
New Hardware: FuzzyBit, Plasticine CGRA
End-to-End Compilers: Weld, Delite
ModelQAModelSnap
![Page 9: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/9.jpg)
Example: MacroBasefor Continuous Analytics
End-to-end system to prioritize user attention
MacroBasemulti-dimensionaldata streams
anomalies &explanations
![Page 10: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/10.jpg)
Too much data for manual inspectionEven harder when data is streaming
![Page 11: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/11.jpg)
![Page 12: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/12.jpg)
![Page 13: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/13.jpg)
![Page 14: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/14.jpg)
MacroBase SummaryEnd-to-end system to prioritize user attention• No ML expertise needed: MacroBase uses general
models and tunes them automatically• No separate step for production use• Co-design from algorithms to HW
Open source: github.com/stanford-futuredata/macrobase
Early users: automotive, cloud, mobile apps, manufacturing
![Page 15: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/15.jpg)
The DAWN StackData Acquisition Feature Engineering Model Training Productionizing
Inte
rface
sAl
gorit
hms
Syst
ems
Hard
war
e
…
Snorkel
DeepDive
MacroBase (Streaming Data)
NoScope (Video)
AutoRec, SimDex (Recommendation)
Data Fusion
Mulligan (SQL+graph+ML)
CPU GPU FPGA Cluster Mobile
New Hardware: FuzzyBit, Plasticine CGRA
End-to-End Compilers: Weld, Delite
ModelQAModelSnap
![Page 16: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/16.jpg)
Weld: Rethinking the Interface to Data Analytics Libraries
Standard approach: users combine libraries using function calls that pass data via memory
Problem: for data-intensive apps, data movementcost dominates on modern hardware!
5-30x slowdowns in NumPy, Spark, TensorFlow, …
func1
func2
…
![Page 17: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/17.jpg)
machine learningSQL graph
algorithms
DiverseAnalytics Tasks
CPUs GPUs FPGAsDiverseHardwarePlatforms
Weld IRCommonRuntime
…
Weld’s Approach
Open source: weld.stanford.edu
![Page 18: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/18.jpg)
TPC-H Logistic RegressionVector Sum
Results: Existing Frameworks
0 5
10 15 20 25 30 35 40 45
TPC-H Q1 TPC-H Q6
Runtim
e [se
cs]
Workload
SparkSQLWeld
0 0.02 0.04 0.06 0.08 0.1
0.12 0.14 0.16 0.18 0.2
Runtim
e [se
cs]
NPNExpr
Weld
Integration effort: 500 lines glue, 30 lines/operator
0.1
1
10
100
1000
LR (1T) LR (12T)
Runtim
e [se
cs; lo
g10]
Workload
TFHand-opt
Weld
1 Core 12 Cores
![Page 19: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/19.jpg)
Results: Cross-Library Optimization
0.01
0.1
1
10
100
Runt
ime
(sec,
log1
0)
CurrentWeld, no CLOWeld, CLOWeld, 12 core
Pandas + NumPy
290x
31x
0.0
0.5
1.0
1.5
2.0
Runt
ime
(sec)
Scala UDFWeld
Spark SQL UDF
14x
CLO = cross-library optimizationOpen source: weld.stanford.edu
![Page 20: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/20.jpg)
The DAWN StackData Acquisition Feature Engineering Model Training Productionizing
Inte
rface
sAl
gorit
hms
Syst
ems
Hard
war
e
…
Snorkel
DeepDive
MacroBase (Streaming Data)
NoScope (Video)
AutoRec, SimDex (Recommendation)
Data Fusion
Mulligan (SQL+graph+ML)
CPU GPU FPGA Cluster Mobile
New Hardware: FuzzyBit, Plasticine CGRA
End-to-End Compilers: Weld, Delite
ModelQAModelSnap
![Page 21: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/21.jpg)
NoScope: Fast CNN-BasedVideo Queries
Opportunity: CNNs allow more accurate queries on visual data than ever
Challenge : processing 1 video in real time requires a $1000 GPU
Result: same accuracy but100-3000x faster through:• Scene-specific distillation• Temporal + spatial locality
bit.ly/NoScopeArxiv
![Page 22: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/22.jpg)
The DAWN StackData Acquisition Feature Engineering Model Training Productionizing
Inte
rface
sAl
gorit
hms
Syst
ems
Hard
war
e
…
Snorkel
DeepDive
MacroBase (Streaming Data)
NoScope (Video)
AutoRec, SimDex (Recommendation)
Data Fusion
Mulligan (SQL+graph+ML)
CPU GPU FPGA Cluster Mobile
New Hardware: FuzzyBit, Plasticine CGRA
End-to-End Compilers: Weld, Delite
ModelQAModelSnap
![Page 23: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/23.jpg)
Training data is key enabler, barrier to entry
How can we leverage data that’s expensive to label at scale?
![Page 24: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/24.jpg)
github.com/HazyResearch/snorkel
Snorkel’s Approach:Weak Supervision
1) User writes labeling functions: short programs that may not always give right label• E.g. regex to search in text
2) Snorkel simultaneously learns noise in LFs and a noise-aware target model (e.g. LSTM)
4 hours LF coding with bio experts: match months of hand-labeling
high-quality models from low-quality, scalable labeling functions
System NCBI Disease (F1)
CDR Disease(F1)
CDR Chem. (F1)
TaggerOne (Dogan, 2012)* 81.5 79.6 88.4Snorkel: Logistic Regression 79.1 79.6 88.4Snorkel: LSTM + Embeddings 79.2 80.4 88.2
![Page 25: DAWN: Infrastructure for Usable Machine Learning · DAWN: Infrastructure for Usable Machine Learning Peter Bailis, KunleOlukotun, Chris Ré, MateiZaharia. It’s the Golden Age of](https://reader034.fdocuments.us/reader034/viewer/2022042022/5e79cc43cb93d64e5e1b6f8b/html5/thumbnails/25.jpg)
DAWN: machine learning for everyone via novel techniques and interfaces that span hardware, systems, and algorithms
Find out more at dawn.cs.stanford.edu
Peter Bailis Chris Ré Kunle Olukotun Matei Zaharia