An Architecture for Agile Machine Learning in Real-Time Applications

Post on 19-Aug-2015

130 views 1 download

Tags:

Transcript of An Architecture for Agile Machine Learning in Real-Time Applications

An Architecture for Agile Machine Learning in Real-Time Applications

johann@ifwe.co@jssmith github.com/ifwe

Johann Schleier-Smith if(we) Inc.

August 11, 2015KDD, Sydney Australia

• Profitable startup actively pursuing big opportunities in social apps

• Millions of users on existing products

• Thousands of social contacts per second

Overview

• Agile machine learning can be difficult—but brings big benefits

• Key challenges in deployment and feature engineering

• Solution in single path to data

production

development

servepersonalized

recommendations

datacollection

modelupdates

production

development

servepersonalized

recommendations

datacollection

modelupdates

study &understand

train &backtest

design newmodels & features

production

development

servepersonalized

recommendations

datacollection

study &understand

design newmodels & features

modelupdates

train &backtest

modelupdates

train &backtest

writespec

e-mail modelto engineers

requestengineering

why didwe want this?

QA

bug fixesmeetingswait

exportto Excel

checkparameters

Java development

new databaseschema

modelupdates

train &backtest

• Shared path to data• Shared feature definition code

production

development

servepersonalized

recommendations

datacollection

modelupdates

study &understand

train &backtest

design newmodels & features

• >10 million candidates • >1000 updates/sec

• Must be responsive to current activity • Users expect instant query results

Recommendation Enginefor Dating Product

Model

Model

Model• Decompose likelihood of match between vote outcomes

and vote occurrence

• Logistic regression

• Real-time personalization through feature vector evolution

• Model parameters trained offline by data scientists

• Consider 1000s of features, select 50-100

Application APIs& Business Logic

RDBMS

Application APIs& Business Logic

RDBMSData Warehouse /

Hadoop

Application APIs& Business Logic

RDBMSData Warehouse /

HadoopStreaming Logs

Application APIs& Business Logic

RDBMSData Warehouse /

HadoopStreaming Logs

Application APIs& Business Logic

RDBMS

production

development

ExploratoryAnalysis

Training &Backtesting

Data Warehouse /HadoopStreaming Logs

Application APIs& Business Logic

RDBMS

production

development

ExploratoryAnalysis

Training &Backtesting

BatchPredictions

Data Warehouse /HadoopStreaming Logs

Application APIs& Business Logic

RDBMS

production

development

ExploratoryAnalysis

Training &Backtesting

BatchPredictions

Predictive Services /Ranking

Data Warehouse /HadoopStreaming Logs

Application APIs& Business Logic

RDBMS

production

development

ExploratoryAnalysis

Training &Backtesting

BatchPredictions

Predictive Services /Ranking

Data Warehouse /HadoopStreaming Logs

EventsTime

Aggregation

first( )last( )

count( )

sum( )max( )

count( )

avg( ) min( )

EventsTime

Machine learning inputAggregation

first( )last( )

count( )

sum( )max( )

count( )

avg( ) min( )

EventsTime

Event History APItrait EventHistory { def publishEvent(e: Event)

def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }

Event History APItrait EventHistory { def publishEvent(e: Event)

def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }

Event History APItrait EventHistory { def publishEvent(e: Event)

def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }

Event History APItrait EventHistory { def publishEvent(e: Event)

def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }

+∞ for real-time streaming

Events

Alice updates profile

Bob opens app

Bob sees Alice in recommendations

Bob swipes yes on Alice

Alice receives push notification

Alice sees Bob in recommendations

Alice sends message to Bob

Tim

e

Online feature stateEvents

Alice updates profile

Bob opens app

Bob sees Alice in recommendations

Bob swipes yes on Alice

Alice receives push notification

Alice sees Bob in recommendations

Alice sends message to Bob

Tim

e

Machine learning inputOnline feature stateEvents

Alice updates profile

Bob opens app

Bob sees Alice in recommendations

Bob swipes yes on Alice

Alice receives push notification

Alice sees Bob in recommendations

Alice sends message to Bob

Tim

e

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

Monitoring

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

• Single path to data for real-time streaming and history

• Shared feature engineering code for development and production

• Team shares access to code and data

• Fine-grained alignment of feature state and prediction outcomes

• Temporally accurate modeling ensured (no looking ahead)

Event History API

15 new models released and tested within 6 months >30% cumulative improvement in usage shown in A/B testing

0

500,000

1,000,000

1,500,000

2,000,000

Apr 2013 Jul 2013 Oct 2013 Jan 2014 Apr 2014

Daily

Uni

que

User

s

MatchersVoters

New model releasedA/B test updated

• Open source implementation derived from if(we)’s proprietary platform

• Provides Scala DSL for building online features from event history

• Examples include dating recommendations and product search with learning to rank

• Not yet ready for scale or production

• Seeking collaborators

Production Serving Data Science

Ranking R MatlabPython

Feature Engineering

Event History API

Kafka

Streaming data

Storm

Historical data

S3 NFSHDFS

Antelope Open Source Vision

Agile Machine Learning with Event History

• Solving deployment yields quick product cycles

• All data saved and retrieved as time-ordered events • Single path to data for both historical and real-time access • Same feature engineering code used in development and production

• Agile success • Team shares access to code and data • Production product iterations measured in days rather than months

github.com/ifwe/antelope@jssmith