Hemispheres of Data

7

Click here to load reader

Transcript of Hemispheres of Data

Page 1: Hemispheres of Data

Hemispheres of DataFOX Audience NetworkBrian Dolan, Director of Research Analytics

Page 2: Hemispheres of Data

What is FOX Audience Network?

• Formally a division of FOX Interactive with sister company MySpace, we are now an independent ad network

• Exclusive consumer of MySpace profile data• Owner of two massive data stores:

– ~500TB Hadoop instance containing MySpace user data

– ~250 TB (1 PB w/ redundancy) ad serving events in Greenplum data warehouse.

Page 3: Hemispheres of Data

FAN's Data Challenge

• 3-5 Billion ad serving events captured today, not including hundreds of millions of dimension

• Updating 30-50 million user profiles today• Training over 2,000 sophisticated

mathematical models weekly against multi-TB data sets

Page 4: Hemispheres of Data

Data Character Varies Dramatically

• User Data– Very Sparse– Intermittent– Unstructured and

user generated– Untrustworthy– Enormous

• Advertiser Data– Dense– Current– Defined Business

Dimensions– Verified– Enormous

Page 5: Hemispheres of Data

Not Separate, Isolated

Hadoop

I love Horror Movies!

I need a cell phone

It's Miley!

Greenplum

What is responding to my ad?

How much revenue did I generate today?

Is this campaign fatigued?

Page 6: Hemispheres of Data

Platform Tasks Also Differ Dramatically

• User Data– Long strings parsed

with regexp routines– No more than three

passes through the data

– Unreliable data feed where dimensions change weekly

– Complicated APIs

• Advertiser Data– Hundreds of 1st

Normal Form dimension tables

– Self-joining a routine task

– Views and temporary tables

– Reporting needs– User management

Page 7: Hemispheres of Data

Communicating

• Now– Flat files passed between the systems –BAD!

• Soon– Hive to provide better structured output from

Hadoop– Greenplum to release HDFS reader/writer

• Message Bus methods can feed both systems all the data, but transfer will always be necessary

• Don't drink the Kool-Aid