Senior Modeling Architect, Netflix Jeffrey Wong Measurement in XP Final (Public… ·...

71
Jeffrey Wong Senior Modeling Architect, Netflix Measurement in an Experimentation Platform

Transcript of Senior Modeling Architect, Netflix Jeffrey Wong Measurement in XP Final (Public… ·...

Jeffrey WongSenior Modeling Architect, Netflix

Measurement in an Experimentation Platform

Research Engineering

JW

My Career at Netflix: Phase 1

Research

JW

My Career at Netflix: Phase 2

Engineering

Research

JW

My Career at Netflix: Phase 3

Engineering

1. Experiments at Netflix2. Advancements in Measuring Causal Effects3. Algorithmic Decision Making4. Problems being worked on5. Open Problems

Main Themes

Experiments at Netflix

Personalizing Artwork

X1

X2

X3

Y

X1

X2

X3

ΔY

Machine Learning for Correlation & Causation

Causal Effect

Decision

Measurement

The Dichotomy in Experimentation

The Engineering of Experiments

Random Allocations

A B

A B

Random Allocations

Data & Statistics

A B

Engagement 0.9 0.91

Revenue $10 $11

A B

Advancements in Measuring Causal Effects

Regression Adjustment

Create a model that predicts streaming engagement

Regression Adjustment

Ask the model 2 questions:1. Predict streaming engagement when a user is in the treatment2. Predict streaming engagement when a user is in the control

Regression Adjustment

1. Use features to create predictions of the KPI2. Get better accuracy on the causal effect3. Regression models estimate treatment effects and are highly extendable

Heterogeneous Effects

Extend regression to ...Automatically detect which segments have significant lift.

Discover opportunities for personalization

Temporal Effects

Extend regression to ...1. See how lift changes over time2. Report treatment effects even if

allocation rates changed midway

Intervention Time

Experiment Interactions

Extend regression to …See breakdown of effects even when many experiments are running simultaneously

Treatment ControlTreatment -2 1 -0.5

Control 1 0 0.5-0.5 0.5

Experiment 2

Experiment 1

Algorithmic Decision Making

● Algorithms to measure causal effects● Personalize causal effects by segment● Understand long term vs short term effects● Engineering system that scales for enterprises

Causal Effect

Decision

Measurement

The Dichotomy in Experimentation

Time, Context and Causality in Recommender SystemsExcerpt:

Mathematical Engineering Challenges

Screenshot of https://triptoes1.github.io/tools-for-causality/

Mathematical Engineering at Netflix

Engineering high performance scientific libraries for causal inference in production

Mathematical Engineering at Netflix

1. Programmatic way to describe the causal effects problem

2. Generic way to compute the causal effect3. Scalable Computation

1. Grammar for Causal Inference

2. Computing Causal Effects

3. Scalable Computation

Grammar for Causal Inference

Data ModelCausal Effects

Causal Annotation

Potential Outcome

(Treatment)

Potential Outcome (Control)

Causal Annotation

Models need to know structure and properties of data in order to have causal identification

1. Was there randomization? How was it controlled?

Causal DatasetsAvg Streaming

Treatment 16

Control 6.67

Difference 9.33

Correct average treatment effect is 7.5

50%

75%

Unconditional Randomization

y

W e

Conditional Randomization

y

W

X

e

Instrumental Variable

y

W

Z

e

Causal Datasets

Graph API

y

W

X

e

1. Grammar for Causal Inference

2. Computing Causal Effects

3. Scalable Computation

Computing Causal Effects

Create a model with causal identification that predicts streaming engagement. Ask it 2 questions:1. Predict streaming engagement when a user is in the treatment2. Predict streaming engagement when a user is in the control

Potential Outcomes, Counterfactual Scoring

General framework for many models! Models only need:

1. Causal Identification2. .predict method

1. Grammar for Causal Inference

2. Computing Causal Effects

3. Scalable Computation

Optimal Computation for Causal Effects

1. Optimize for sparse linear algebra2. Optimize memory

Ambitions

1. 150M Netflix users2. Hundreds of experiments3. Return causal effects in 1 minute

Model training and scoring tend to operate on sparse matrices

Sparse Linear Algebra1. Less than 10% nonzero2. Matrix multiplication with

0s is redundant3. Consumes excessive

amount of memory

Computational Library built around Eigen

Teach Researchers to think like Engineers

Vector-Matrix- Vector Multiplication

Good for Eigen::SparseMatrix iterators

Aggregating first reduces time by 45%!

Memory Optimizations

US US US US

CA CA CA CA

US US US US

CA CA CA CA

Memory Optimizations

US US US US

CA CA CA CA

US US US US

CA CA CA CA

Optimizations

1. Optimize for sparse linear algebra2. Rearrange math to get much faster runtimes3. Optimize for spatial locality

Causal Effects + Algorithmic Decision Making

Maintaining Control and Randomization

Making Decisions on Delayed Effects

Intervention Time

Making Decisions when Choices Change

Summary of Challenges

1. Grammar for Causal Inference

2. Causal Annotation & Graph API

3. Optimal compute for Potential Outcomes

1. Maintaining controlled, randomized, environments

2. Bandit algorithms with delayed effects

3. Making decisions when choices change

Measuring Causal Effects Decision Making

Netflix Careers: Software Engineer for Experimentation Platform

Thank You!

[email protected]