Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton...

59
Software Engineering for ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn

Transcript of Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton...

Page 1: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Software Engineering for ML/AI

Claire Le Goues Michael Hilton

Christopher Meiklejohn

Page 2: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Administrivia

• Homework 2 (Code Artifacts) due today.

Page 3: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Learning goals

• Identify differences between traditional software

development and development of ML systems.

• Understand the stages that comprise the typical ML

development pipeline.

• Identify challenges that must be faced within each

stage of the typical ML development pipeline.

3

Page 4: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Participation Survey

• YES in Zoom:

“I’ve a taken machine learning course.”

• NO in Zoom:

“I have not taken a machine learning course.”

4

Page 5: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Software Engineering and ML

Page 6: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Traditional Software Development

“It is easy. You just chip

away the stone that

doesn’t look like David.” –

(probably not) Michelangelo

Page 7: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

ML Development

• Observation

• Hypothesis

• Predict

• Test

• Reject or Refine

Hypothesis

Page 8: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Microsoft’s view of Software Engineering for ML

Page 9: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Three Fundamental Differences:

• Data discovery and management

• Customization and Reuse

• No incremental development of model itself

Page 10: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

CASE STUDY

Case study developed by

Christian Kästner

https://ckaestne.github.io/seai/

Page 11: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

WHAT CHALLENGES ARE THERE IN BUILDING AND DEPLOYING ML?

Page 12: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

12

Page 13: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

13

Page 14: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

14

Page 15: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Qualities of Interest?

15

A

B

C

Page 16: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

16

Page 17: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Qualities of Interest?

17

Page 18: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

MACHINE LEARNING PIPELINE

18

Page 19: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Typical ML Pipeline

• Statico Get labeled data (data collection, cleaning and, labeling)o Identify and extract features (feature engineering)o Split data into training and evaluation set o Learn model from training data (model training)o Evaluate model on evaluation data (model evaluation)o Repeat, revising features

• with production datao Evaluate model on production data; monitor (model monitoring)o Select production data for retraining (model training + evaluation)o Update model regularly (model deployment)

19

Page 20: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Example Data

UserId PickupLocation TargetLocation OrderTime PickupTime

5 …. … 18:23 18:31

20

Page 21: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Example Data

21

Page 22: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Learning Data

22

Page 23: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Feature Engineering

• Identify parameters of interest that a model may learn on

• Convert data into a useful form

• Normalize data

• Include context

• Remove misleading things

• In OCR/translation:

24

Page 24: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

25

Features?

Page 25: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

26

Features?

Page 26: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Feature Extraction

• In surge prediction:

o Location and time of past surges

o Events

o Number of people traveling to an area

o Typical demand curves in an area

o Demand in other areas

27

Page 27: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Data Cleaning

• Removing outliers

• Normalizing data

• Missing values

• …

28

Page 28: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Learning

• Build a predictor that best describes an outcome for

the observed features

29

Page 29: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Evaluation

• Prediction accuracy on learned data vs

• Prediction accuracy on unseen datao Separate learning set, not used for training

• For binary predictors: false positives vs. false negatives, precision vs. recall

• For numeric predictors: average (relative) distance between real and predicted value

• For ranking predictors: topK etc

30

Page 30: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

31

Evaluation Data?

Page 31: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

32

Evaluation Data?

Page 32: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Learning and Evaluating in Production

• Beyond static data sets, build telemetry

• Design challenge: identify mistakes in practice

• Use sample of live data for evaluation

• Retrain models with sampled live data regularly

• Monitor performance and intervene

33

Page 33: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

ML COMPONENT TRADEOFFS

34

Page 34: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Qualities of ML Components

• Accuracy

• Capabilities (e.g. classification, recommendation, clustering…)

• Amount of training data needed

• Inference latency

• Learning latency; incremental learning?

• Model size

• Explainable? Robust?

• …

35

Page 35: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Understanding Capabilities and Tradeoffs

• Deep Neural Networks • Decision Trees

36

Page 36: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

SYSTEM ARCHITECTURE CONSIDERATIONS

37

Page 37: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Where should the model live?

38

Glasses

Phone

Cloud

OCR Component

Translation Component

Page 38: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Where should the model live?

39

Car

Phone

Cloud

Surge Prediction

Page 39: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Considerations

• How much data is needed as input for the model?

• How much output data is produced by the model?

• How fast/energy consuming is model execution?

• What latency is needed for the application?

• How big is the model? How often does it need to be updated?

• Cost of operating the model? (distribution + execution)

• Opportunities for telemetry?

• What happens if users are offline?

40

Page 40: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Typical Designs

• Static intelligence in the product

o difficult to update

o good execution latency

o cheap operation

o offline operation

o no telemetry to evaluate and improve

• Client-side intelligence

o updates costly/slow, out of sync problems

o complexity in clients

o offline operation, low execution latency

41

Page 41: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Typical Designs

• Server-centric intelligence

o latency in model execution (remote calls)

o easy to update and experiment

o operation cost

o no offline operation

• Back-end cached intelligence

o precomputed common results

o fast execution, partial offline

o saves bandwidth, complicated updates

• Hybrid models

42

Page 42: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Other Considerations

• Coupling of ML pipeline parts

• Coupling with other parts of the system

• Ability for different developers and analysists to collaborate

• Support online experiments

• Ability to monitor

43

Page 43: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Reactive System Design Goals

• Responsive

o consistent, high performance

• Resilient

o maintain responsive in the face of failure, recovery, rollback

• Elastic

o scale with varying loads

44

Page 44: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Common Design Strategies

• Message-driven, lazy computation, functional programming

o asynchronous, message passing style

• Replication, containment, supervision

o replicate and coordinate isolated components, e.g. with containers

• Data streams, “infinite data”, immutable facts

o streaming technologies, data lakes

• See “big data systems” and “cloud computing”

45

Page 45: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

UPDATING MODELS

47

Page 46: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Updating Models

• Models are rarely static outside the lab

• Data drift, feedback loops, new features, new

requirements

• When and how to update models?

• How to version? How to avoid mistakes?

48

Page 47: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

49

Server

Data Stream(e.g. Kafka)

logsotherfeatures(e.g weather)

Data Lake

archival

Analytics(OLAP)

Stream Processing

Models

learning in nightly batch processing

incremental learning

• Latency and automation vary widely

• Heavily distributedserving

Page 48: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

50

Page 49: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

51

Update Strategy?

Page 50: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

52

Update Strategy?

Page 51: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

PLANNING FOR MISTAKES

54

Page 52: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Mistakes will happen

• No specification

• ML components detect patterns from data (real and spurious)

• Predictions are often accurate, but mistakes always possible

• Mistakes are not predicable or explainable or similar to human

mistakes

• Plan for mistakes

• Telemetry to learn about mistakes?

55

Page 53: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

How Models can Break

• System outage

• Model outage

o model tested? deployment and updates reliable? file corrupt?

• Model errors

• Model degradation

o data drift, feedback loops

56

Page 54: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Hazard Analysis

• Worst thing that can happen?

• Backup strategy? Undoable? Nontechnical compensation?

57

Page 55: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Mitigating Mistakes

• Investigating in ML

o e.g., more training data, better data, better features, better engineers

• Less forceful experience

o e.g., prompt rather than automate decisions, turn off

• Adjust learning parameters

o e.g., more frequent updates, manual adjustments

• Guardrails

o e.g., heuristics and constraints on outputs

• Override errors

o e.g., hardcode specific results

58

Page 56: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

59

Mistakes?

Page 57: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

60

Mistakes?

Page 58: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Telemetry

• Purpose:

o monitor operation

o monitor success (accuracy)

o improve models over time (e.g., detect new features)

• Challenges:

o too much data – sample, summarization, adjustable

o hard to measure – intended outcome not observable? proxies?

o rare events – important but hard to capture

o cost – significant investment must show benefit

o privacy – abstracting data

61

Page 59: Software Engineering for ML/AI · 2020. 12. 10. · ML/AI Claire Le Goues Michael Hilton Christopher Meiklejohn. Administrivia •Homework 2 (Code Artifacts) due today. Learning goals

Summary

• Machine learning in production systems is challenging

• Many tradeoffs in selecting ML components and in

integrating them in larger system

• Plan for updates

• Manage mistakes, plan for telemetry

64