Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising

Post on 11-Apr-2017

151 views 2 download

Transcript of Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising

Building Real-timeAnalytics Engine with Kafka and Spark for Mobile Advertising

Mobile Advertising? - Social & Game

Authentic to Consumers Authentic to Entertainment

Authentic to Engagement

Mobile Games

eMarketer, “US Mobile Phone Content Usage Metrics, 2013-2019.” February, 2015.

People Spend A Lot of Time Gaming

3

Over 55 minutes a day on average is spent playing mobile games

Minutes Spent in Mobile

eMarketer, “US Mobile Phone Content Usage Metrics, 2013-2019.” February, 2015.

Innovate Advertising as Reward Ads

● Free-to-Play (Freemium) App● Only 2~5% users In-app-purchase● Publisher can give “reward” on users who engaged to Ads● Video + Game Economics + Reward

Mobile Video App Advertising

AdvertiserPay on Video-View

Pub Paid

Tapjoy Profit

User Earn Reward

Video to Install

Video Install

reward No reward

Video to Install to Event

Video Install

reward

No reward

Event

- Level N- Registration- In-app-purchase- First Booking

Mobile Video App Advertising - Data Science

Video Views

Installs

Early Retention

Life Time Value

“Event”

Look-alike Model

Real-timeBidding Engine

Advertiser’s Return

“Investment”

Building a Data Science Platform

Bigger in Scale

FasterServin

g

Smart and

Smarter !

Data Product

Tapjoy’s Data Platform

Algo Serving InfrastructureDatawarehousing

300,000 RPM throughput

Bidding & Targeting &

Personalization

<10 ms response time

20 TB daily addition2.3 PB DUM

Cloud & On-PremiseIn-house & SaaSBatch based & real-time

The Logic Stack

Data warehousing

HDFS / S3 / GSReporting

MPPs (BigQuery)

Algo Service

Batch + Streaming Hadoop / Spark

• Collect data, set rules• Reduce data friction• Improve signal-to-noise

ratio• Model training & iteration

• Deliver business insights• Driving data awareness

• Apply ideas to product (online)

• Serve model output• Drive revenue

Data Viz

A/B TestingData Viz

The Data Flow

Tapjoy’s Algo Service Engine (SOA)

● SOA (algo service) in Natty● 320, 000 lines of Java● 99% response time < 20 ms @ 200k - 400k RPM

Ad Request

A/B test classification

Main Algo & pre-filters

Apply Logic Pipe

Response (offer list)

Video BiddingTargetingPersonaLookalike

...

Biz logic filters

Algo Service’s Data Components

Component What’s in there Purpose

Kafka Raw activity logs Everything starts here

Spark Streaming ETL ETL & Algo feature updates

Aerospike User Big Table (User DNA) Real-time k-v lookups. I.e LookALike

MemSQL Striped down raw user activity data!!

● Device level real time aggregations

● Hot data sink ● Real time reporting

Elasticsearch Aggregates or Unstructured logs

Cube aggregates or fulltext search

Mobile Video App Advertising - Data Science

Video Views

Installs

Early Retention

Life Time Value

“event”

Look-alike Model

Real-timeBidding Engine

Advertiser’s Return

“Investment”

Big Table / MemCache

Use Case 1 - Ad-Request Level Decision

Video Bid

# CVR

Spending History

max(views) > T(n)

...

User app usages

Kafka+

Spark Streaming

S - App 1

S - App 2

S - App 3

S - App ..

S - App N

Lamda Batch

Use Case 1 - Ad-Request Level Decision

Video BidKafka

ORSpark

Streaming

S - App

RAW DATA

Use Case 1 - Ad-Request Level Decision High throughput low latency queries querying 30 days device

level data which are streamed into MemSQL.

Does the calculations on the fly and serving as decision features

Reference Join Subquery

Reference Join

In Fact - One Fits All

Algo Serving

KafkaOR

Spark Streaming

Real-Time Dashboard

Data Warehouse Hot Batch

Data Sink

HotBatch

RealtimeQuery

RealtimeQuery

eMarketer, “US Mobile Phone Content Usage Metrics, 2013-2019.” February, 2015.

Conclusion

20

❖ Mobile Advertising is all about knowing your audience❖ Fast & Accurate data is key to Data Science as Service❖ But, “Realtime” is a relative word❖ Try to simplify moving parts when it come to streaming

➢ Difficult to debug➢ Hard to backfill

❖ Generalized hot-data sink for stability and multi-purpose data storage

yohan.chin@tapjoy.comrobin.li@tapjoy.com