Scaling Distributed Machine Learning with the · Overview of machine learning raw data training...

88
Scaling Distributed Machine Learning with the Mu Li [email protected]

Transcript of Scaling Distributed Machine Learning with the · Overview of machine learning raw data training...

Page 1: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Scaling Distributed Machine Learning with the

Mu Li [email protected]

Page 2: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient
Page 3: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

query: “osdi” search engine

👆

user

Page 4: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

query: “osdi” search engine

advertisercharge

👆

user

Page 5: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Machine learning is concerned with systems that can learn from data

Page 6: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Machine learning is concerned with systems that can learn from data

0

20

40

60

80

0

175

350

525

700

Year

2010 2011 2012 2013 2014

Training data

size (TB)

Ad click prediction

Page 7: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Machine learning is concerned with systems that can learn from data

0

20

40

60

80

0

175

350

525

700

Year

2010 2011 2012 2013 2014

Training data

size (TB)

Annual revenue

(Billion $)

Ad click prediction

Page 8: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Machine learning is concerned with systems that can learn from data

0

20

40

60

80

0

175

350

525

700

Year

2010 2011 2012 2013 2014

Training data

size (TB)

Annual revenue

(Billion $)

We will report results using 1,000 machines!

Ad click prediction

Page 9: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Overview of machine learningraw data training data machine learning system model

(key,value) pairs

Page 10: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

example by feature matrix

osdi www.usenix.org user_mu_li

Overview of machine learningraw data training data machine learning system model

(key,value) pairs

1 1 1

Page 11: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

example by feature matrix

osdi www.usenix.org user_mu_li

Overview of machine learningraw data training data machine learning system model

(key,value) pairs

1 1 1 ✦ 100 billion examples ✦ 10 billion features ✦ 1T —1P training data ✦ 100—1000 machines

Scale of Industry problems

Page 12: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

example by feature matrix

osdi www.usenix.org user_mu_li

Overview of machine learningraw data training data machine learning system model

(key,value) pairs

✦ scale to industry problems ✦ efficient communication ✦ fault tolerance ✦ easy to use

1 1 1 ✦ 100 billion examples ✦ 10 billion features ✦ 1T —1P training data ✦ 100—1000 machines

Scale of Industry problems

Page 13: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Industry size machine learning problems

Page 14: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Data and model partition

Training data

Page 15: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Data and model partition

Training data Worker machines

Page 16: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Data and model partition

Training data Worker machines

Model

Page 17: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Data and model partition

Training data Worker machines

Server machines

Model

Page 18: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Data and model partition

Training data

push

Worker machines

Server machines

Model

Page 19: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Data and model partition

Training data

push

Worker machines

Server machines

pull

Model

Page 20: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Example: distributed gradient descent

Worker machines

Server machines

Page 21: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Example: distributed gradient descent

Workers pull the working set of model

Worker machines

Server machines

Page 22: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Example: distributed gradient descent

Workers pull the working set of modelIterate until stop

workers compute gradients

Worker machines

Server machines

Page 23: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Example: distributed gradient descent

Workers pull the working set of modelIterate until stop

workers compute gradientsworkers push gradients

Worker machines

Server machines

Page 24: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Example: distributed gradient descent

Workers pull the working set of modelIterate until stop

workers compute gradientsworkers push gradientsupdate model

Worker machines

Server machines

Page 25: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Example: distributed gradient descent

Workers pull the working set of modelIterate until stop

workers compute gradientsworkers push gradientsupdate modelworkers pull updated model

Worker machines

Server machines

Page 26: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Industry size machine learning problems

Efficient communication

Page 27: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Challenges for data synchronization

Page 28: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Challenges for data synchronization

✦ Massive communication traffic ★ frequent access to the shared model

Page 29: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Challenges for data synchronization

✦ Massive communication traffic ★ frequent access to the shared model

✦ Expensive global barriers ★between iterations

Page 30: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Task

Page 31: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Task

✦ a push / pull / user defined function (an iteration)

Page 32: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Task

✦ a push / pull / user defined function (an iteration)✦ “execute-after-finished” dependency

Page 33: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Task

✦ a push / pull / user defined function (an iteration)✦ “execute-after-finished” dependency

CPU intensive Network intensive

gradient push & pulliter 0

iter 1 gradient push & pull

Synchronous

Page 34: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Task

✦ a push / pull / user defined function (an iteration)✦ “execute-after-finished” dependency

✦ executed asynchronously

CPU intensive Network intensive

gradient push & pulliter 0

iter 1 gradient push & pull

Synchronous

Page 35: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Task

✦ a push / pull / user defined function (an iteration)✦ “execute-after-finished” dependency

✦ executed asynchronously

CPU intensive Network intensive

gradient push & pulliter 0

iter 1 gradient push & pull

iter 1 gradient push & pull

Synchronous

gradient push & pulliter 0Asynchronous

Page 36: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Flexible consistency

✦ Trade-off between algorithm efficiency and system performance

Page 37: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Flexible consistency

✦ Trade-off between algorithm efficiency and system performance

1 2 3Sequential 4

Page 38: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Flexible consistency

✦ Trade-off between algorithm efficiency and system performance

Eventual 1 2 3 4

1 2 3Sequential 4

Page 39: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Flexible consistency

✦ Trade-off between algorithm efficiency and system performance

1-bounded delay 1 2 3 4 5

Eventual 1 2 3 4

1 2 3Sequential 4

Page 40: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Results for bounded delay

Page 41: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Results for bounded delay

Ad click prediction

0

0.45

0.9

1.35

1.8

Bounded delay

0 1 2 4 8 16

computing waiting

time (hour)

Page 42: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Results for bounded delay

Ad click prediction

0

0.45

0.9

1.35

1.8

Bounded delay

0 1 2 4 8 16

computing waiting

time (hour)

sequential

Page 43: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Results for bounded delay

Ad click prediction

0

0.45

0.9

1.35

1.8

Bounded delay

0 1 2 4 8 16

computing waiting

time (hour)

sequential

Page 44: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Results for bounded delay

Ad click prediction

0

0.45

0.9

1.35

1.8

Bounded delay

0 1 2 4 8 16

computing waiting

time (hour)

sequential

best trade-off

Page 45: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

User-defined filters

Page 46: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

User-defined filters

✦ Selectively communicate (key, value) pairs

Page 47: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

User-defined filters

✦ Selectively communicate (key, value) pairs✦ E.g. , the KKT filter ★send pairs if they are likely to affect the model ★>95% keys are filtered in the ad click prediction task

Page 48: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Industry size machine learning problems

Efficient communication

Fault tolerance

Page 49: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Machine learning job logs in a three-month period:

Page 50: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Machine learning job logs in a three-month period:

0

6.5

13

19.5

26

#machine x time (hour)

100 1000 10000

failure rate %

Page 51: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Machine learning job logs in a three-month period:

0

6.5

13

19.5

26

#machine x time (hour)

100 1000 10000

failure rate %

Page 52: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Machine learning job logs in a three-month period:

0

6.5

13

19.5

26

#machine x time (hour)

100 1000 10000

failure rate %

Page 53: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Machine learning job logs in a three-month period:

0

6.5

13

19.5

26

#machine x time (hour)

100 1000 10000

failure rate %

Page 54: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Fault tolerance

Page 55: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Fault tolerance

✦ Model is partitioned by consistent hashing

Page 56: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Fault tolerance

✦ Model is partitioned by consistent hashing✦ Default replication: Chain replication (consistent, safe)

Page 57: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Fault tolerance

worker 0 server 0 server 1push x push f(x)

ackack

✦ Model is partitioned by consistent hashing✦ Default replication: Chain replication (consistent, safe)

Page 58: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Fault tolerance

worker 0 server 0 server 1push x push f(x)

ackack

push x

push y

worker 0

server 0 server 1

worker 1

✦ Model is partitioned by consistent hashing✦ Default replication: Chain replication (consistent, safe)

✦ Option: Aggregation reduces backup traffic (algo specific)

Page 59: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Fault tolerance

worker 0 server 0 server 1push x push f(x)

ackack

push f(x+y)

ack

push x

push yack

ackworker 0

server 0 server 1

worker 1

✦ Model is partitioned by consistent hashing✦ Default replication: Chain replication (consistent, safe)

✦ Option: Aggregation reduces backup traffic (algo specific)

Page 60: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Fault tolerance

worker 0 server 0 server 1push x push f(x)

ackack

push f(x+y)

ack

push x

push yack

ackworker 0

server 0 server 1

worker 1

✦ Model is partitioned by consistent hashing✦ Default replication: Chain replication (consistent, safe)

✦ Option: Aggregation reduces backup traffic (algo specific)implemented by efficient

vector clock

Page 61: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Industry size machine learning problems

Efficient communication

Fault tolerance Easy to use

Page 62: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

(Key, value) vectors for the shared parameters

math sparse vector (key, value) store

i1 i2 i3

(i1, ) (i2, ) (i3, )

Page 63: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

(Key, value) vectors for the shared parameters

✦ Good for programmers: Matches mental model ✦ Good for system: Expose optimizations based upon

structure of data

math sparse vector (key, value) store

i1 i2 i3

(i1, ) (i2, ) (i3, )

Page 64: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

(Key, value) vectors for the shared parameters

✦ Good for programmers: Matches mental model ✦ Good for system: Expose optimizations based upon

structure of data

math sparse vector (key, value) store

i1 i2 i3

(i1, ) (i2, ) (i3, )

Example: computing gradient gradient = dataT × ( − label × 1 / ( 1 + exp (label × data × model))

Page 65: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Industry size machine learning problems

Efficient communication

Fault tolerance Easy to use Evaluation

Page 66: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Sparse Logistic Regression

✦ Predict ads will be clicked or not

Page 67: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Sparse Logistic Regression

Method Consistency LOC

System-A L-BFGS Sequential 10K

System-B Block PG Sequential 30K

Parameter Server Block PG Bounded Delay + KKT 300

✦ Predict ads will be clicked or not✦ Baseline: two systems in production

Page 68: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Sparse Logistic Regression

Method Consistency LOC

System-A L-BFGS Sequential 10K

System-B Block PG Sequential 30K

Parameter Server Block PG Bounded Delay + KKT 300

✦ Predict ads will be clicked or not✦ Baseline: two systems in production

✦ 636T real ads data ★170 billions of examples, 65 billions of features

✦ 1,000 machines with 16,000 cores

Page 69: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Progress

0.1 1 10

time (hour)

large

small

error

Page 70: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Progress

System A

0.1 1 10

time (hour)

large

small

error

Page 71: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Progress

System ASystem B

0.1 1 10

time (hour)

large

small

error

Page 72: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Progress

System ASystem BParameter Server

0.1 1 10

time (hour)

large

small

error

Page 73: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Time decomposition

time (hour)

Page 74: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Time decomposition

0

1.25

2.5

3.75

5

System-A System-B Parameter Server

computing waiting

time (hour)

Page 75: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Time decomposition

0

1.25

2.5

3.75

5

System-A System-B Parameter Server

computing waiting

time (hour)

Page 76: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Time decomposition

0

1.25

2.5

3.75

5

System-A System-B Parameter Server

computing waiting

time (hour)

Page 77: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Time decomposition

0

1.25

2.5

3.75

5

System-A System-B Parameter Server

computing waiting

time (hour)

Page 78: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Topic Modeling (“LDA”)

✦ Gradient descent with eventual consistency ✦ 5B users’ click logs, Group users into 1,000

groups based on URLs they clicked

Page 79: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Topic Modeling (“LDA”)

✦ Gradient descent with eventual consistency ✦ 5B users’ click logs, Group users into 1,000

groups based on URLs they clicked

time (hour)0 10 20 30 40 50 60 70

large

small

error

Page 80: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Topic Modeling (“LDA”)

✦ Gradient descent with eventual consistency ✦ 5B users’ click logs, Group users into 1,000

groups based on URLs they clicked

1,000 machines

time (hour)0 10 20 30 40 50 60 70

large

small

error

Page 81: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Topic Modeling (“LDA”)

✦ Gradient descent with eventual consistency ✦ 5B users’ click logs, Group users into 1,000

groups based on URLs they clicked

1,000 machines6,000 machines

time (hour)0 10 20 30 40 50 60 70

4x speed up

large

small

error

Page 82: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Largest experiments of related systems

101 102 103 104 10510410510610710810910101011

model size

#cores

Data were collected on April’14

Page 83: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Largest experiments of related systems

101 102 103 104 10510410510610710810910101011

REEF (LR)

VW (LR)

Naiad (LR)

Petuum (Lasso)

MLbase (LR)

model size

#cores

Sparse LR

Parameter Server Data were collected on April’14

Page 84: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Largest experiments of related systems

101 102 103 104 10510410510610710810910101011

REEF (LR)

VW (LR)

Naiad (LR)

Petuum (Lasso)

MLbase (LR)Graphlab (LDA)

YahooLDA (LDA)model

size

#cores

Sparse LR LDA

Parameter Server Data were collected on April’14

Page 85: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Largest experiments of related systems

101 102 103 104 10510410510610710810910101011

REEF (LR)

VW (LR)

Naiad (LR)

Petuum (Lasso)

MLbase (LR)Graphlab (LDA)

YahooLDA (LDA)

Distbelief (DNN)Adam (DNN)

model size

#cores

Sparse LR LDA

Parameter Server Data were collected on April’14

Page 86: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Industry size machine learning problems

Efficient communication

Fault tolerance Easy to use Evaluation

Page 87: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Industry size machine learning problems

Efficient communication

Fault tolerance Easy to use

Code available at parameterserver.org

Evaluation

Page 88: Scaling Distributed Machine Learning with the · Overview of machine learning raw data training data machine learning system model (key,value) pairs scale to industry problems efficient

Q&A