Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation...
Transcript of Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation...
![Page 1: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/1.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Fast, General Parallel Computation forMachine Learning
Robin Elizabeth Yancey and Norm MatloffUniversity of California at Davis
P2PS Workshop, ICPP 2018
![Page 2: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/2.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Outline
• Motivation.
• Software Alchemy.
• Theoretical foundations.
• Empirical investigation.
![Page 3: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/3.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Outline
• Motivation.
• Software Alchemy.
• Theoretical foundations.
• Empirical investigation.
![Page 4: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/4.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Motivation
Characteristics of machine learning (ML) algorithms:
• Big Data: in n × p (cases × features) dataset, both nAND p large.
• Compute-intensive algorithms: sorting, k-NN, matrixinversion, iteration.
• Not generally embarrassingly parallel (EP). (An exception:Random Forests – grow different trees within differentprocesses.)
• Memory problems: The computation may not fit on asingle machine (esp. in R or GPUs).
![Page 5: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/5.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Motivation
Characteristics of machine learning (ML) algorithms:
• Big Data: in n × p (cases × features) dataset, both nAND p large.
• Compute-intensive algorithms: sorting, k-NN, matrixinversion, iteration.
• Not generally embarrassingly parallel (EP). (An exception:Random Forests – grow different trees within differentprocesses.)
• Memory problems: The computation may not fit on asingle machine (esp. in R or GPUs).
![Page 6: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/6.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Motivation
Characteristics of machine learning (ML) algorithms:
• Big Data: in n × p (cases × features) dataset, both nAND p large.
• Compute-intensive algorithms: sorting, k-NN, matrixinversion, iteration.
• Not generally embarrassingly parallel (EP). (An exception:Random Forests – grow different trees within differentprocesses.)
• Memory problems: The computation may not fit on asingle machine (esp. in R or GPUs).
![Page 7: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/7.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Motivation
Characteristics of machine learning (ML) algorithms:
• Big Data: in n × p (cases × features) dataset, both nAND p large.
• Compute-intensive algorithms: sorting, k-NN, matrixinversion, iteration.
• Not generally embarrassingly parallel (EP). (An exception:Random Forests – grow different trees within differentprocesses.)
• Memory problems: The computation may not fit on asingle machine (esp. in R or GPUs).
![Page 8: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/8.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Motivation
Characteristics of machine learning (ML) algorithms:
• Big Data: in n × p (cases × features) dataset, both nAND p large.
• Compute-intensive algorithms: sorting, k-NN, matrixinversion, iteration.
• Not generally embarrassingly parallel (EP).
(An exception:Random Forests – grow different trees within differentprocesses.)
• Memory problems: The computation may not fit on asingle machine (esp. in R or GPUs).
![Page 9: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/9.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Motivation
Characteristics of machine learning (ML) algorithms:
• Big Data: in n × p (cases × features) dataset, both nAND p large.
• Compute-intensive algorithms: sorting, k-NN, matrixinversion, iteration.
• Not generally embarrassingly parallel (EP). (An exception:Random Forests – grow different trees within differentprocesses.)
• Memory problems: The computation may not fit on asingle machine (esp. in R or GPUs).
![Page 10: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/10.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Motivation
Characteristics of machine learning (ML) algorithms:
• Big Data: in n × p (cases × features) dataset, both nAND p large.
• Compute-intensive algorithms: sorting, k-NN, matrixinversion, iteration.
• Not generally embarrassingly parallel (EP). (An exception:Random Forests – grow different trees within differentprocesses.)
• Memory problems: The computation may not fit on asingle machine (esp. in R or GPUs).
![Page 11: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/11.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Parallel ML: Desired Properties
• Simple, easily implementable. (And easily understood bynon-techies.)
• As general in applicability as possible.
![Page 12: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/12.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Parallel ML: Desired Properties
• Simple, easily implementable.
(And easily understood bynon-techies.)
• As general in applicability as possible.
![Page 13: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/13.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Parallel ML: Desired Properties
• Simple, easily implementable. (And easily understood bynon-techies.)
• As general in applicability as possible.
![Page 14: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/14.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Parallel ML: Desired Properties
• Simple, easily implementable. (And easily understood bynon-techies.)
• As general in applicability as possible.
![Page 15: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/15.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy
alchemy:
The medieval forerunner of chemistry...concernedparticularly with attempts to convert base metals intogold... a seemingly magical process oftransformation...
![Page 16: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/16.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemyalchemy:
The medieval forerunner of chemistry...concernedparticularly with attempts to convert base metals intogold... a seemingly magical process oftransformation...
![Page 17: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/17.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy (cont’d.)
• “Alchemical”: Converts non-EP problems to statisticallyequivalent EP problems.
• Developed independently by (Matloff, JSS, 2013) andseveral others. EP: No programming challenge. :-)
• Not just Embarrassingly Parallel but also EmbarrassinglySimple. :-)
![Page 18: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/18.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy (cont’d.)
• “Alchemical”: Converts non-EP problems to statisticallyequivalent EP problems.
• Developed independently by (Matloff, JSS, 2013) andseveral others. EP: No programming challenge. :-)
• Not just Embarrassingly Parallel but also EmbarrassinglySimple. :-)
![Page 19: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/19.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy (cont’d.)
• “Alchemical”: Converts non-EP problems to statisticallyequivalent EP problems.
• Developed independently by (Matloff, JSS, 2013) andseveral others.
EP: No programming challenge. :-)
• Not just Embarrassingly Parallel but also EmbarrassinglySimple. :-)
![Page 20: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/20.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy (cont’d.)
• “Alchemical”: Converts non-EP problems to statisticallyequivalent EP problems.
• Developed independently by (Matloff, JSS, 2013) andseveral others. EP: No programming challenge. :-)
• Not just Embarrassingly Parallel but also EmbarrassinglySimple. :-)
![Page 21: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/21.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy (cont’d.)
• “Alchemical”: Converts non-EP problems to statisticallyequivalent EP problems.
• Developed independently by (Matloff, JSS, 2013) andseveral others. EP: No programming challenge. :-)
• Not just Embarrassingly Parallel but also EmbarrassinglySimple. :-)
![Page 22: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/22.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy (cont’d)
• Break the data into chunks, one chunk per process.
• Apply the procedure, e.g. neural networks (NNs), to eachchunk, using off-the-shelf SERIAL algorithms.
• In regression case (continuous response variable) take finalestimate as average of the chunked estimates.
• In classification case (categorical response variable), do“voting.”
• If have some kind of parametric model (incl. NNs), canaverage the parameter values across chunks.
![Page 23: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/23.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy (cont’d)
• Break the data into chunks, one chunk per process.
• Apply the procedure, e.g. neural networks (NNs), to eachchunk, using off-the-shelf SERIAL algorithms.
• In regression case (continuous response variable) take finalestimate as average of the chunked estimates.
• In classification case (categorical response variable), do“voting.”
• If have some kind of parametric model (incl. NNs), canaverage the parameter values across chunks.
![Page 24: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/24.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy (cont’d)
• Break the data into chunks, one chunk per process.
• Apply the procedure, e.g. neural networks (NNs), to eachchunk,
using off-the-shelf SERIAL algorithms.
• In regression case (continuous response variable) take finalestimate as average of the chunked estimates.
• In classification case (categorical response variable), do“voting.”
• If have some kind of parametric model (incl. NNs), canaverage the parameter values across chunks.
![Page 25: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/25.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy (cont’d)
• Break the data into chunks, one chunk per process.
• Apply the procedure, e.g. neural networks (NNs), to eachchunk, using off-the-shelf SERIAL algorithms.
• In regression case (continuous response variable) take finalestimate as average of the chunked estimates.
• In classification case (categorical response variable), do“voting.”
• If have some kind of parametric model (incl. NNs), canaverage the parameter values across chunks.
![Page 26: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/26.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy (cont’d)
• Break the data into chunks, one chunk per process.
• Apply the procedure, e.g. neural networks (NNs), to eachchunk, using off-the-shelf SERIAL algorithms.
• In regression case (continuous response variable) take finalestimate as average of the chunked estimates.
• In classification case (categorical response variable), do“voting.”
• If have some kind of parametric model (incl. NNs), canaverage the parameter values across chunks.
![Page 27: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/27.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy (cont’d)
• Break the data into chunks, one chunk per process.
• Apply the procedure, e.g. neural networks (NNs), to eachchunk, using off-the-shelf SERIAL algorithms.
• In regression case (continuous response variable) take finalestimate as average of the chunked estimates.
• In classification case (categorical response variable), do“voting.”
• If have some kind of parametric model (incl. NNs), canaverage the parameter values across chunks.
![Page 28: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/28.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Software Alchemy (cont’d)
• Break the data into chunks, one chunk per process.
• Apply the procedure, e.g. neural networks (NNs), to eachchunk, using off-the-shelf SERIAL algorithms.
• In regression case (continuous response variable) take finalestimate as average of the chunked estimates.
• In classification case (categorical response variable), do“voting.”
• If have some kind of parametric model (incl. NNs), canaverage the parameter values across chunks.
![Page 29: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/29.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory
• Theorem:Say rows of data matrix are i.i.d., output ofprocedure asymptotically normal. Then theSoftware Alchemy estimator is fully statisticallyefficient, i.e. has the same asymptotic variance.
• Conditions of theorem could be relaxed.
• Can do some informal analysis of speedup (next slide).
![Page 30: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/30.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory
• Theorem:
Say rows of data matrix are i.i.d., output ofprocedure asymptotically normal. Then theSoftware Alchemy estimator is fully statisticallyefficient, i.e. has the same asymptotic variance.
• Conditions of theorem could be relaxed.
• Can do some informal analysis of speedup (next slide).
![Page 31: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/31.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory
• Theorem:Say rows of data matrix are i.i.d., output ofprocedure asymptotically normal. Then theSoftware Alchemy estimator is fully statisticallyefficient, i.e. has the same asymptotic variance.
• Conditions of theorem could be relaxed.
• Can do some informal analysis of speedup (next slide).
![Page 32: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/32.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory
• Theorem:Say rows of data matrix are i.i.d., output ofprocedure asymptotically normal. Then theSoftware Alchemy estimator is fully statisticallyefficient, i.e. has the same asymptotic variance.
• Conditions of theorem could be relaxed.
• Can do some informal analysis of speedup (next slide).
![Page 33: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/33.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory
• Theorem:Say rows of data matrix are i.i.d., output ofprocedure asymptotically normal. Then theSoftware Alchemy estimator is fully statisticallyefficient, i.e. has the same asymptotic variance.
• Conditions of theorem could be relaxed.
• Can do some informal analysis of speedup (next slide).
![Page 34: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/34.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory (cont’d.)
Say original algorithm has time complexity O(nc).
• Then Software Alchemy time for q processes isO((n/q)c) = O(nc/qc), a speedup of qc .
• If c > 1, get a superlinear speedup!
• In fact, even if the chunked computation is done serially,time is O(q(n/q)c) = O(nc/qc−1), a speedup of qc−1, awin if c > 1.
![Page 35: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/35.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory (cont’d.)
Say original algorithm has time complexity O(nc).
• Then Software Alchemy time for q processes isO((n/q)c) = O(nc/qc), a speedup of qc .
• If c > 1, get a superlinear speedup!
• In fact, even if the chunked computation is done serially,time is O(q(n/q)c) = O(nc/qc−1), a speedup of qc−1, awin if c > 1.
![Page 36: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/36.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory (cont’d.)
Say original algorithm has time complexity O(nc).
• Then Software Alchemy time for q processes isO((n/q)c) = O(nc/qc), a speedup of qc .
• If c > 1, get a superlinear speedup!
• In fact, even if the chunked computation is done serially,time is O(q(n/q)c) = O(nc/qc−1), a speedup of qc−1, awin if c > 1.
![Page 37: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/37.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory (cont’d.)
Say original algorithm has time complexity O(nc).
• Then Software Alchemy time for q processes isO((n/q)c) = O(nc/qc), a speedup of qc .
• If c > 1, get a superlinear speedup!
• In fact, even if the chunked computation is done serially,time is O(q(n/q)c) = O(nc/qc−1), a speedup of qc−1, awin if c > 1.
![Page 38: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/38.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory (cont’d.)
Say original algorithm has time complexity O(nc).
• Then Software Alchemy time for q processes isO((n/q)c) = O(nc/qc), a speedup of qc .
• If c > 1, get a superlinear speedup!
• In fact, even if the chunked computation is done serially,time is O(q(n/q)c) = O(nc/qc−1), a speedup of qc−1, awin if c > 1.
![Page 39: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/39.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory (cont.d)
Although...
• SA time is technically maxi chunktimei . If large variance,this would may result in speedup of < qc .
• If number of features p is a substantial fraction of n, theasymptotic convergence may not have quite kicked in yet.
• If full algorithm time is not just O(f (n))) but O(g(n, p),e.g. need p × p matrix inversion, then speedup is limited.
• Above analysis ignores overhead time for distributing thedata. However, we advocate permanently distributed dataanyway (Hadoop, Spark, our partools package).
![Page 40: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/40.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory (cont.d)
Although...
• SA time is technically maxi chunktimei . If large variance,this would may result in speedup of < qc .
• If number of features p is a substantial fraction of n, theasymptotic convergence may not have quite kicked in yet.
• If full algorithm time is not just O(f (n))) but O(g(n, p),e.g. need p × p matrix inversion, then speedup is limited.
• Above analysis ignores overhead time for distributing thedata. However, we advocate permanently distributed dataanyway (Hadoop, Spark, our partools package).
![Page 41: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/41.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory (cont.d)
Although...
• SA time is technically maxi chunktimei . If large variance,this would may result in speedup of < qc .
• If number of features p is a substantial fraction of n, theasymptotic convergence may not have quite kicked in yet.
• If full algorithm time is not just O(f (n))) but O(g(n, p),e.g. need p × p matrix inversion, then speedup is limited.
• Above analysis ignores overhead time for distributing thedata. However, we advocate permanently distributed dataanyway (Hadoop, Spark, our partools package).
![Page 42: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/42.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory (cont.d)
Although...
• SA time is technically maxi chunktimei . If large variance,this would may result in speedup of < qc .
• If number of features p is a substantial fraction of n, theasymptotic convergence may not have quite kicked in yet.
• If full algorithm time is not just O(f (n))) but O(g(n, p),e.g. need p × p matrix inversion, then speedup is limited.
• Above analysis ignores overhead time for distributing thedata. However, we advocate permanently distributed dataanyway (Hadoop, Spark, our partools package).
![Page 43: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/43.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory (cont.d)
Although...
• SA time is technically maxi chunktimei . If large variance,this would may result in speedup of < qc .
• If number of features p is a substantial fraction of n, theasymptotic convergence may not have quite kicked in yet.
• If full algorithm time is not just O(f (n))) but O(g(n, p),e.g. need p × p matrix inversion, then speedup is limited.
• Above analysis ignores overhead time for distributing thedata. However, we advocate permanently distributed dataanyway (Hadoop, Spark, our partools package).
![Page 44: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/44.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory (cont.d)
Although...
• SA time is technically maxi chunktimei . If large variance,this would may result in speedup of < qc .
• If number of features p is a substantial fraction of n, theasymptotic convergence may not have quite kicked in yet.
• If full algorithm time is not just O(f (n))) but O(g(n, p),e.g. need p × p matrix inversion, then speedup is limited.
• Above analysis ignores overhead time for distributing thedata.
However, we advocate permanently distributed dataanyway (Hadoop, Spark, our partools package).
![Page 45: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/45.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Theory (cont.d)
Although...
• SA time is technically maxi chunktimei . If large variance,this would may result in speedup of < qc .
• If number of features p is a substantial fraction of n, theasymptotic convergence may not have quite kicked in yet.
• If full algorithm time is not just O(f (n))) but O(g(n, p),e.g. need p × p matrix inversion, then speedup is limited.
• Above analysis ignores overhead time for distributing thedata. However, we advocate permanently distributed dataanyway (Hadoop, Spark, our partools package).
![Page 46: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/46.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Other Issues
• How many chunks? Having too many means chunks aretoo small for the asymptotics.
• Impact of tuning parameters.
• E.g. in neural nets, user must choose number of hiddenlayers, number of units per layer, etc. (Feng, 2016) has somany tuning parameters that the paper has a separatetable to summarize them.
• Performance may depend crucially on the settings for thoseparameters.
• What if best tuning parameter settings for chunks are notthe same as the best for the full data?
![Page 47: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/47.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Other Issues
• How many chunks? Having too many means chunks aretoo small for the asymptotics.
• Impact of tuning parameters.
• E.g. in neural nets, user must choose number of hiddenlayers, number of units per layer, etc. (Feng, 2016) has somany tuning parameters that the paper has a separatetable to summarize them.
• Performance may depend crucially on the settings for thoseparameters.
• What if best tuning parameter settings for chunks are notthe same as the best for the full data?
![Page 48: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/48.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Other Issues
• How many chunks? Having too many means chunks aretoo small for the asymptotics.
• Impact of tuning parameters.
• E.g. in neural nets, user must choose number of hiddenlayers, number of units per layer, etc.
(Feng, 2016) has somany tuning parameters that the paper has a separatetable to summarize them.
• Performance may depend crucially on the settings for thoseparameters.
• What if best tuning parameter settings for chunks are notthe same as the best for the full data?
![Page 49: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/49.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Other Issues
• How many chunks? Having too many means chunks aretoo small for the asymptotics.
• Impact of tuning parameters.
• E.g. in neural nets, user must choose number of hiddenlayers, number of units per layer, etc. (Feng, 2016) has somany tuning parameters that the paper has a separatetable to summarize them.
• Performance may depend crucially on the settings for thoseparameters.
• What if best tuning parameter settings for chunks are notthe same as the best for the full data?
![Page 50: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/50.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Other Issues
• How many chunks? Having too many means chunks aretoo small for the asymptotics.
• Impact of tuning parameters.
• E.g. in neural nets, user must choose number of hiddenlayers, number of units per layer, etc. (Feng, 2016) has somany tuning parameters that the paper has a separatetable to summarize them.
• Performance may depend crucially on the settings for thoseparameters.
• What if best tuning parameter settings for chunks are notthe same as the best for the full data?
![Page 51: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/51.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Other Issues
• How many chunks? Having too many means chunks aretoo small for the asymptotics.
• Impact of tuning parameters.
• E.g. in neural nets, user must choose number of hiddenlayers, number of units per layer, etc. (Feng, 2016) has somany tuning parameters that the paper has a separatetable to summarize them.
• Performance may depend crucially on the settings for thoseparameters.
• What if best tuning parameter settings for chunks are notthe same as the best for the full data?
![Page 52: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/52.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Empirical Investigation
• Recommender systems
• Famous example: Predict rating user i would give to moviej , based on what i has said about other movies, and whatratings j got from other users.
• Maximum Likelihood• Matrix factorization• k-NN model
• General ML applications
• Logistic• Neural networks• Random forests• k-NN
![Page 53: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/53.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Empirical Investigation
• Recommender systems
• Famous example: Predict rating user i would give to moviej , based on what i has said about other movies, and whatratings j got from other users.
• Maximum Likelihood• Matrix factorization• k-NN model
• General ML applications
• Logistic• Neural networks• Random forests• k-NN
![Page 54: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/54.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Recommender Systems Datasets
• Movie Lens: User ratings of movies. We used the 1million- and 20 million-record versions.
• Book Crossings: Book reviews, about 1 million records.
• Jester: Joke reviews, about 6 million records.
• No optimization of tuning parameters; focus is just on runtime.
• No data cleaning.
• Timings on a quad core machine with hyperthreading.
![Page 55: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/55.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Recommender Systems Datasets
• Movie Lens: User ratings of movies. We used the 1million- and 20 million-record versions.
• Book Crossings: Book reviews, about 1 million records.
• Jester: Joke reviews, about 6 million records.
• No optimization of tuning parameters; focus is just on runtime.
• No data cleaning.
• Timings on a quad core machine with hyperthreading.
![Page 56: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/56.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Prediction Methods
• MLE: Rating of item i by user j is
Yij = µ+ γ′Xi + αi + βj + εij
where Xi is a vector of covariates for user i (e.g. age), andµ+ αi and µ+ βj are overall means.
• Nonnegative matrix factorization: Find low-rankmatrices W and H such that the matrix A of all Yij ,observed or not, is approx. WH. Fill in missing valuesfrom the latter.
• k-Nearest Neighbor: The k users with ratings patternsclosest to that of user i and who have rated item j arecollected, and the average of their item-j ratingscomputed.
Report: Scatter, train and test times, MAPE or prop. correctclass.
![Page 57: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/57.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Prediction Methods
• MLE: Rating of item i by user j is
Yij = µ+ γ′Xi + αi + βj + εij
where Xi is a vector of covariates for user i (e.g. age), andµ+ αi and µ+ βj are overall means.
• Nonnegative matrix factorization: Find low-rankmatrices W and H such that the matrix A of all Yij ,observed or not, is approx. WH. Fill in missing valuesfrom the latter.
• k-Nearest Neighbor: The k users with ratings patternsclosest to that of user i and who have rated item j arecollected, and the average of their item-j ratingscomputed.
Report: Scatter, train and test times, MAPE or prop. correctclass.
![Page 58: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/58.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
NMF, MovieLens 20M
chunks scatter train. pred. mean abs. error
full - 34.046 0.346 0.649
2 13.49 18.679 0.647 0.647
4 21.86 10.444 1.113 0.656
Table: NMF Model, MovieLens Data, 20M
Approaching linear speedup.
![Page 59: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/59.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
NMF, MovieLens 20M
chunks scatter train. pred. mean abs. error
full - 34.046 0.346 0.649
2 13.49 18.679 0.647 0.647
4 21.86 10.444 1.113 0.656
Table: NMF Model, MovieLens Data, 20M
Approaching linear speedup.
![Page 60: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/60.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
k-NN, Jester Data
# of chunks time (sec) mean abs. error
full 259.601 4.79
2 76.440 4.60
4 58.133 4.36
8 81.185 3.89
Table: k-NN Model, Jester Data
Superlinear speedup for 2, 4 chunks. Note improved accuracy,probably due to nonoptimal k in full set.
![Page 61: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/61.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
k-NN, Jester Data
# of chunks time (sec) mean abs. error
full 259.601 4.79
2 76.440 4.60
4 58.133 4.36
8 81.185 3.89
Table: k-NN Model, Jester Data
Superlinear speedup for 2, 4 chunks.
Note improved accuracy,probably due to nonoptimal k in full set.
![Page 62: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/62.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
k-NN, Jester Data
# of chunks time (sec) mean abs. error
full 259.601 4.79
2 76.440 4.60
4 58.133 4.36
8 81.185 3.89
Table: k-NN Model, Jester Data
Superlinear speedup for 2, 4 chunks. Note improved accuracy,probably due to nonoptimal k in full set.
![Page 63: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/63.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
MLE, Book Crossings
chunks scatter train. pred. mean abs. error
full - 1114.155 0.455 2.67
2 5.101 685.757 0.455 2.72
4 11.134 423.018 1.173 2.77
8 10.918 246.668 1.470 2.82
Table: MLE Model, Book Crossings Data
Sublinear speedup due to matrix inversion, but still faster at 8chunks.
![Page 64: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/64.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
MLE, Book Crossings
chunks scatter train. pred. mean abs. error
full - 1114.155 0.455 2.67
2 5.101 685.757 0.455 2.72
4 11.134 423.018 1.173 2.77
8 10.918 246.668 1.470 2.82
Table: MLE Model, Book Crossings Data
Sublinear speedup due to matrix inversion, but still faster at 8chunks.
![Page 65: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/65.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
MLE, MovieLens Data
chunks scatter train. pred. mean abs. error
full - 99.028 0.267 0.710
2 4.503 100.356 0.317 0.737
4 2.596 73.055 0.469 0.752
8 8.408 100.356 0.483 0.764
Table: MLE Model, MovieLens Data, 1M
Speedup limited due to matrix inversion.
![Page 66: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/66.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
MLE, MovieLens Data
chunks scatter train. pred. mean abs. error
full - 99.028 0.267 0.710
2 4.503 100.356 0.317 0.737
4 2.596 73.055 0.469 0.752
8 8.408 100.356 0.483 0.764
Table: MLE Model, MovieLens Data, 1M
Speedup limited due to matrix inversion.
![Page 67: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/67.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
General ML Applications
Methods: Logistic regression; neural nets; k-NN; randomforests.Datasets:
• NYC taxi data: Trip times, fares, location etc.
• Forest cover data: Predict type of ground cover fromsatellite data.
• Last.fm: Popularity of songs.
![Page 68: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/68.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
General ML Applications
Methods: Logistic regression; neural nets; k-NN; randomforests.Datasets:
• NYC taxi data: Trip times, fares, location etc.
• Forest cover data: Predict type of ground cover fromsatellite data.
• Last.fm: Popularity of songs.
![Page 69: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/69.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Logit, NYC Taxi Data
# of chunks time prop. correct class.
full 40.641 0.694
2 38.753 0.694
4 23.501 0.694
8 14.320 0.694
Table: Logistic Model, NYC Taxi Data
Have matrix inversion here too, but still getting speedup at 8threads (and up to 32 on another machine, 16 cores).
![Page 70: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/70.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Logit, NYC Taxi Data
# of chunks time prop. correct class.
full 40.641 0.694
2 38.753 0.694
4 23.501 0.694
8 14.320 0.694
Table: Logistic Model, NYC Taxi Data
Have matrix inversion here too, but still getting speedup at 8threads (and up to 32 on another machine, 16 cores).
![Page 71: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/71.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
NNs, Last.fm Data
# of chunks time mean abs. error
full 486.259 221.41
2 325.567 211.94
4 254.306 210.15
8 133.495 221.41
Table: Neural nets, Last.fm data, 5 hidden layers
Sublinear, but still improving at 8 chunks. Better predictionwith 2, 4 chunks; tuning thus suboptimal in full case.
![Page 72: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/72.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
NNs, Last.fm Data
# of chunks time mean abs. error
full 486.259 221.41
2 325.567 211.94
4 254.306 210.15
8 133.495 221.41
Table: Neural nets, Last.fm data, 5 hidden layers
Sublinear, but still improving at 8 chunks. Better predictionwith 2, 4 chunks; tuning thus suboptimal in full case.
![Page 73: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/73.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
k-NN, NYC Taxi Data
# of chunks time mean abs. error
full 87.463 456.00
2 48.110 451.08
4 25.75 392.13
8 17.413 424.36
Table: k-NN, NYC TaxiData
Superlinear speedup at 4 chunks, with better prediction error; ktoo large in full?
![Page 74: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/74.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
k-NN, NYC Taxi Data
# of chunks time mean abs. error
full 87.463 456.00
2 48.110 451.08
4 25.75 392.13
8 17.413 424.36
Table: k-NN, NYC TaxiData
Superlinear speedup at 4 chunks, with better prediction error; ktoo large in full?
![Page 75: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/75.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
RF, Forest Cover Data
# of chunks time prob. correct class.
full 841.884 0.955
2 485.171 0.941
4 236.518 0.919
6 194.803 0.911
Table: Random Forests, Forest Cover Data
As noted, EP anyway, but still interesting.
![Page 76: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/76.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
RF, Forest Cover Data
# of chunks time prob. correct class.
full 841.884 0.955
2 485.171 0.941
4 236.518 0.919
6 194.803 0.911
Table: Random Forests, Forest Cover Data
As noted, EP anyway, but still interesting.
![Page 77: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/77.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
GPU Settings
Use of Software Alchemy with GPUs.
• In a multi-GPU setting, chunking is a natural solution,hence SA.
• If GPU memory insufficinet, use SA serially. Still may geta speedup (per earlier slide).
![Page 78: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/78.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
GPU Settings
Use of Software Alchemy with GPUs.
• In a multi-GPU setting, chunking is a natural solution,hence SA.
• If GPU memory insufficinet, use SA serially. Still may geta speedup (per earlier slide).
![Page 79: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/79.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Conclusions, Comments
• Software Alchemy extremely simple, statistically valid —same statistical accuracy.
• Generally got linear or even superlinear speedup on mostrecommender systems and other ML algorithms.
• We used our partools package, which is based on a“Leave It There” philosophy: Keep an object distributedas long as possible, including as a distributed file. Thus noscatter time needed.
![Page 80: Fast, General Parallel Computation for Machine Learning€¦ · Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Norm Matlo University of California](https://reader034.fdocuments.us/reader034/viewer/2022050219/5f648035d6725873727928f9/html5/thumbnails/80.jpg)
Fast, GeneralParallel
Computationfor MachineLearning
RobinElizabethYancey and
Norm MatloffUniversity ofCalifornia at
Davis
Conclusions, Comments
• Software Alchemy extremely simple, statistically valid —same statistical accuracy.
• Generally got linear or even superlinear speedup on mostrecommender systems and other ML algorithms.
• We used our partools package, which is based on a“Leave It There” philosophy: Keep an object distributedas long as possible, including as a distributed file. Thus noscatter time needed.