Logistics - GitHub Pages · Logistics ØProject proposals are due by midnight tonight (two pages)...

9
Logistics Ø Project proposals are due by midnight tonight (two pages) Ø Email to instructors Ø We will send feedback by next week Ø In class proposal presentations on Monday 3/11 (two weeks) Ø 4 minutes each Ø Presentations should briefly answer the following questions Ø What is the problem and why is it important Ø What are the key limitations of previous work Ø What is your proposed approach

Transcript of Logistics - GitHub Pages · Logistics ØProject proposals are due by midnight tonight (two pages)...

Page 1: Logistics - GitHub Pages · Logistics ØProject proposals are due by midnight tonight (two pages) ... Machine Learning Lifecycle. Offline Training Data Data Collection Cleaning &

LogisticsØ Project proposals are due by midnight tonight (two pages)

Ø Email to instructors

Ø We will send feedback by next week

Ø In class proposal presentations on Monday 3/11 (two weeks)Ø 4 minutes each

Ø Presentations should briefly answer the following questions Ø What is the problem and why is it important

Ø What are the key limitations of previous work

Ø What is your proposed approach

Page 2: Logistics - GitHub Pages · Logistics ØProject proposals are due by midnight tonight (two pages) ... Machine Learning Lifecycle. Offline Training Data Data Collection Cleaning &

AutoML“Democratizing ML”

Joseph E. [email protected]

Page 3: Logistics - GitHub Pages · Logistics ØProject proposals are due by midnight tonight (two pages) ... Machine Learning Lifecycle. Offline Training Data Data Collection Cleaning &

OfflineTraining

Data

DataCollection

Cleaning &Visualization

Feature Eng. &Model Design

Training &Validation

Model Development

TrainedModelsTraining Pipelines

LiveData

Training

Validation

End UserApplication

Query

Prediction

Prediction Service

Inference

Feedback

Logic

Machine Learning Lifecycle

Page 4: Logistics - GitHub Pages · Logistics ØProject proposals are due by midnight tonight (two pages) ... Machine Learning Lifecycle. Offline Training Data Data Collection Cleaning &

OfflineTraining

Data

DataCollection

Cleaning &Visualization

Feature Eng. &Model Design

Training &Validation

Model DevelopmentData Preparation

Model Design & Training

Page 5: Logistics - GitHub Pages · Logistics ØProject proposals are due by midnight tonight (two pages) ... Machine Learning Lifecycle. Offline Training Data Data Collection Cleaning &

Raw Data

X1 X2 X3 Y

Training Data

X1 / X2 Sin(X2) OH(X3) Y

Transformed Training Data

Model Development: Data Preparation

Requires substantial domain expertise:Ø Where are the data?Ø What do the columns mean?Ø How should they be coded?

Page 6: Logistics - GitHub Pages · Logistics ØProject proposals are due by midnight tonight (two pages) ... Machine Learning Lifecycle. Offline Training Data Data Collection Cleaning &

Andrej Karpathy

Page 7: Logistics - GitHub Pages · Logistics ØProject proposals are due by midnight tonight (two pages) ... Machine Learning Lifecycle. Offline Training Data Data Collection Cleaning &

Model Development: Design and TrainingModel Parameters are

”fit” during training.

What Kind of Model

Hyper-Parameters

Model:

Ø Linear ModelØ SVMØ Decision TreeØ Random ForrestØ …

Ø Conv. NetØ Recurrent Net.Ø Auto Encoder.Ø Graph Net.Ø …

Classic Models Neural NetworkØ RegularizationØ Batch size Ø Structural Parameters

Ø Hidden UnitsØ ActivationsØ Layer design

Ø Training Alg.Ø Learning RateØ Parallelism

Tuned using Hyper-Parameter Search

Tuned UsingExpert Knowledge

Model Algorithm

Page 8: Logistics - GitHub Pages · Logistics ØProject proposals are due by midnight tonight (two pages) ... Machine Learning Lifecycle. Offline Training Data Data Collection Cleaning &

OfflineTraining

Data

DataCollection

Cleaning &Visualization

Feature Eng. &Model Design

Training &Validation

Model Development TechnologiesData prep. and feature engineering

Model design and Training

Page 9: Logistics - GitHub Pages · Logistics ØProject proposals are due by midnight tonight (two pages) ... Machine Learning Lifecycle. Offline Training Data Data Collection Cleaning &

Systems Research OpportunitiesØ Accelerate data collection and preparation

Ø Automatic data discovery

Ø Distributed data processing, esp. for image and video data

Ø Data cleaning and schema driven auto-featurization

Ø Accelerate model selection and hyper-parameter searchØ Parallel and distributed execution

Ø Data and feature caching across training runs

Ø ProvenanceØ Track previous model development to inform future decisions

Ø Connect errors in production with decisions in model development