Machine Learning Pipelines

32
MACHINE LEARNING PIPELINES Evan R. Sparks Graduate Student, AMPLab With: Shivaram Venkataraman, Tomer Kaftan, Gylfi Gudmundsson, Michael Franklin, Benjamin Recht, and others!

Transcript of Machine Learning Pipelines

MACHINE LEARNING PIPELINES

Evan R. SparksGraduate Student, AMPLab

With: Shivaram Venkataraman, Tomer Kaftan, Gylfi Gudmundsson, Michael Franklin, Benjamin Recht, and others!

WHAT IS MACHINE LEARNING?

–Wikipedia

“Machine learning is a scientific discipline that deals with the construction and study of algorithms that can learn from data. Such algorithms operate by building

a model based on inputs and using that to make predictions or decisions, rather than following only

explicitly programmed instructions.”

Model

Data

ML PROBLEMS• Real data often not ∈ Rd

• Real data not well-behaved according to my algorithm.

• Features need to be engineered.

• Transformations need to be applied.

• Hyperparameters need to be tuned.

SVM Input:

Real Data:

• Datasets are huge.

• Distributed computing is hard.

• Mapping common ML techniques to distributed setting may be untenable.

SYSTEMS PROBLEMS

WHAT IS MLBASE?

• Distributed Machine Learning - Made Easy!

• Spark-based platform to simplify the development and usage of large scale machine learning.

A STANDARD MACHINE LEARNING PIPELINE

Right?

Data TrainClassifier Model

A STANDARD MACHINE LEARNING PIPELINE

That’s more like it!

DataTrainLinear

ClassifierModelFeature

Extraction

Test Data

Predictions

A REAL PIPELINE FOR IMAGE CLASSIFICATION

Inspired by Coates & Ng, 2012

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

ModelTestData

LabelExtractor

Feature Extractor

Test Error

ErrorComputer

Pooler

Feature Extractor

A SIMPLE EXAMPLE

• Load up some images.

• Featurize.

• Apply a transformation.

• Fit a linear model.

• Evaluate on test data. Replicates Fast Food Features Pipeline - Le et. al., 2012

PIPELINES API

• A pipeline is made of nodes which have an expected input and output type.

• Nodes fit together in a sensible way.

• Pipelines are just nodes.

• Nodes should be things that we know how to scale.

WHAT’S IN THE TOOLBOX?Nodes Images - Patches, Gabor Filters, HoG, Contrast NormalizationText - n-grams, lemmatization, TF-IDF, POS, NER General Purpose - ZCA Whitening, FFT, Scaling, Random Signs, Linear Rectifier, Windowing, Pooling, Sampling, QR DecomopsitionStatistics - Borda Voting, Linear Mapping, Matrix MultiplyML - Linear Solvers, TSQR, Cholesky Solver, MLlibSpeech and more - coming soon!

Pipelines Example pipelines across domains CIFAR, MNIST, ImageNet, ACL Argument Extraction, TIMIT.

Spark

MLlibGraphX ml-matrix Featurizers StatsUtils

Pipelines

Hyper Parameter Tuning Libraries

Stay Tuned!

MLI

A REAL PIPELINE FOR IMAGE CLASSIFICATION

Inspired by Coates & Ng, 2012

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

ModelTestData

LabelExtractor

Feature Extractor

Test Error

ErrorComputer

Pooler

Feature Extractor

YOU’RE GOING TO BUILD THIS!!

BEAR WITH ME

Photo: Andy Rouse, (c) Smithsonian Institute

COMPUTER VISION CRASH COURSE

SVM Model

FEATURE EXTRACTION

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

Model

Pooler

Feature Extractor

FEATURE EXTRACTION

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

Model

Pooler

Feature Extractor

NORMALIZATION• Moves pixels from [0, 255] to

[-1.0,1.0].

• Why? Math!

• -1*-1 = 1, 1*1 =1

• If I overlay two pixels on each other and they’re similar values, their product will be close to 1 - otherwise, it will be close to 0 or -1.

• Necessary for whitening.

0

255

-1

+1

FEATURE EXTRACTION

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

Model

Pooler

Feature Extractor

PATCH EXTRACTION• Image patches become our

“visual vocabulary”

• Intuition from text classification.

• If I’m trying to classify a document as “sports” - I’d look for words like “football”, “batter”, etc.

• For images - classifying pictures as “face” - I’m looking for things that look like eyes, ears, noses, etc.

Visual Vocabulary

FEATURE EXTRACTION

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

Model

Pooler

Feature Extractor

CONVOLUTION• A convolution filter applies a weighted

average to sliding patches of data.

• Can be used for lots of things - finding edges, blurring, etc.

• Normalized Input:

• Image, Ear Filter

• Output:

• New image - close to 1 for areas that look like the ear filter.

• Apply many of these simultaneously.

FEATURE EXTRACTION

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

Model

Pooler

Feature Extractor

LINEAR RECTIFICATION

• For each feature, x, given some a (=0.25):

• xnew=max(x-a, 0)

• What does it do?

• Removes a bunch of noise.

FEATURE EXTRACTION

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

Model

Pooler

Feature Extractor

POOLING• convolve(image, k filters) => k

filtered images.

• Lots of info - super granular.

• Pooling lets us break the (filtered) images into regions and sum.

• Think of the “sum” a how much an image quadrant is activated.

• Image summarized into 4*k numbers.

0.5 8

0 2

LINEAR CLASSIFICATION

WHY LINEAR CLASSIFIERS?They’re simple. They’re fast. They’re well studied. They scale.

With the right features, they do a good job!

Data: A Labels: b Model: x

Hypothesis: Ax = b + error

Find the x, which minimizes the error = |Ax - b|

BACK TO OUR PROBLEM• What is A in our problem?

• #images x #features (4f)

• What about x?

• #features x #classes

• For f < 10000, pretty easy to solve!

• Bigger - we have to get creative.

10m 100kx =

100k1k

10m

1k

TODAY’S EXERCISE

• Build 3 image classification pipelines - simple, intermediate, advanced.

• Qualitatively (with your eyes) and quantitatively (with statistics) compare their effectiveness.

ML PIPELINES

• Reusable, general purpose components.

• Built with distributed data in mind from day 1.

• Used together: give a complex system comprised of well-understood parts.

GO BEARS