Big Data: Guidelines and Examples for the Enterprise Decision Maker

34
Big Data: Examples and Guidelines for the Enterprise Decision Maker Solutions Architect, MongoDB Buzz Moschetti [email protected] #MongoDB

description

This presentation covers how to use MongoDB with Hadoop to leverage big data within your company.

Transcript of Big Data: Guidelines and Examples for the Enterprise Decision Maker

Page 1: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Big Data: Examples and Guidelines for the Enterprise Decision Maker

Solutions Architect, MongoDB

Buzz [email protected]

#MongoDB

Page 2: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Who is your Presenter?• Yes, I use “Buzz” on my business cards• Former Investment Bank Chief Architect at

JPMorganChase and Bear Stearns before that

• Over 25 years of designing and building systems• Big and small• Super-specialized to broadly useful in any

vertical• “Traditional” to completely disruptive• Advocate of language leverage and strong

factoring• Still programming – using emacs, of course

Page 3: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Agenda• (Occasionally) Brutal Truths about Big Data

• Review of Directed Content Business Architecture

• A Simple Technical Implementation

Page 4: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Truths• Clear definition of Big Data still maturing

• Efficiently operationalizing Big Data is non-trivial• Developing, debugging, understanding MapReduce• Cluster monitoring & management, job scheduling/recovery• If you thought regular ETL Hell was bad….

• Big Data is not about math/set accuracy• The last 25000 items in a 25,497,612 set “don’t matter”

• Big Data questions are best asked periodically• “Are we there yet?”

• Realtime means … realtime

Page 5: Big Data: Guidelines and Examples for the Enterprise Decision Maker

It’s About The Functions, not the Terms

DON’T ASK:• Is this an operations or an analytics

problem?• Is this online or offline?• What query language should we use?• What is my integration strategy across tools?ASK INSTEAD:• Am I incrementally addressing data (esp.

writes)?• Am I computing a precise answer or a

trend?• Do I need to operate on this data in

realtime?• What is my holistic architecture?

Page 6: Big Data: Guidelines and Examples for the Enterprise Decision Maker

What We’re Going to “Build” today

Realtime Directed Content System• Based on what users click,

“recommended” content is returned in addition to the target

• The example is sector (manufacturing, financial services, retail) neutral

• System dynamically updates behavior in response to user activity

Page 7: Big Data: Guidelines and Examples for the Enterprise Decision Maker

The Participants and Their Roles

DirectedContentSystem

Customers

ContentCreators

Management/

Strategy

Analysts/Data

Scientists

Generate and tag content from a known domain of tags

Make decisions based on trends and other summarized data

Operate on data to identify trends and develop tag domains

Developers/ProdOps

Bring it all together: apps, SDLC, integration, etc.

Page 8: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Priority #1: Maximizing User value

Considerations/Requirements

Maximize realtime user value and experienceProvide management reporting and trend analysisEngineer for Day 2 agility on recommendation engineProvide scrubbed click history for customerPermit low-cost horizontal scalingMinimize technical integrationMinimize technical footprintUse conventional and/or approved toolsProvide a RESTful service layer…..

Page 9: Big Data: Guidelines and Examples for the Enterprise Decision Maker

The Architecture

mongoDB HadoopApp(s) MapReduce

Page 10: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Complementary Strengths

mongoDB HadoopApp(s) MapReduce

• Standard design paradigm (objects, tools, 3rd party products, IDEs, test drivers, skill pool, etc. etc.)

• Language flexibility (Java, C#, C++ python, Scala, …)

• Webscale deployment model• appservers, DMZ,

monitoring• High performance rich shape

CRUD

• MapReduce design paradigm• Node deployment model• Very large set operations• Computationally intensive,

longer duration• Read-dominated workload

Page 11: Big Data: Guidelines and Examples for the Enterprise Decision Maker

“Legacy” Approach: Somewhat unidirectional

mongoDB HadoopApp(s) MapReduce

• Extract data from mongoDB and other sources nightly (or weekly)

• Run analytics• Generate reports for people to

read

• Where’s the feedback?

Page 12: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Somewhat better approach

mongoDB HadoopApp(s) MapReduce

• Extract data from mongoDB and other sources nightly (or weekly)

• Run analytics• Generate reports for people to

read• Move important summary data

back to mongoDB for consumption by apps.

Page 13: Big Data: Guidelines and Examples for the Enterprise Decision Maker

…but the overall problem remains:

• How to realtime integrate and operate upon both periodically generated data and realtime current data?

• Lackluster integration between OLTP and Hadoop

• It’s not just about the database: you need a realtime profile and profile update function

Page 14: Big Data: Guidelines and Examples for the Enterprise Decision Maker

The legacy problem in pseudocode

onContentClick() {String[] tags = content.getTags();Resource[] r = f1(database, tags);

}

• Realtime intraday state not well-handled

• Baselining is a different problem than click handling

Page 15: Big Data: Guidelines and Examples for the Enterprise Decision Maker

The Right Approach• Users have a specific Profile entity

• The Profile captures trend analytics as baselining information

• The Profile has per-tag “counters” that are updated with each interaction / click

• Counters plus baselining are passed to fetch function

• The fetch function itself could be dynamic!

Page 16: Big Data: Guidelines and Examples for the Enterprise Decision Maker

24 hours in the life of The System

• Assume some content has been created and tagged

• Two systemetized tags: Pets & PowerTools

Page 17: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Monday, 1:30AM EST

• Fetch all user Profiles from mongoDB; load into Hadoop• Or skip if using the mongoDB-Hadoop

connector!

mongoDB HadoopApp(s) MapReduce

Page 18: Big Data: Guidelines and Examples for the Enterprise Decision Maker

mongoDB-Hadoop MapReduce Example

public class ProfileMapper extends Mapper<Object, BSONObject, IntWritable, IntWritable> { @Override public void map(final Object pKey,

final BSONObject pValue,final Context pContext )

throws IOException, InterruptedException{ String user = (String)pValue.get(”user"); Date d1 = (Date)pValue.get(“lastUpdate”); int count = 0; List<String> keys = pValue.get(“tags”).keys(); for ( String tag : keys) { count += pValue.get(tag).get(“hist”).size(); ) int avg = count / keys.size(); pContext.write( new IntWritable( count), new IntWritable( avg ) ); }}

Page 19: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Monday, 1:45AM EST

• Grind through all content data and user Profile data to produce:• Tags based on feature extraction (vs. creator-

applied tags)• Trend baseline per user for tags Pets and

PowerTools

• Load Profiles with new baseline back into mongoDB• Or skip if using the mongoDB-Hadoop

connector!

mongoDB HadoopApp(s) MapReduce

Page 20: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Monday, 8AM EST

• User Bob logs in and Profile retrieved from mongoDB• Bob clicks on Content X which is already tagged as

“Pets”• Bob has clicked on Pets tagged content many times• Adjust Profile for tag “Pets” and save back to

mongoDB

• Analysis = f(Profile)

• Analysis can be “anything”; it is simply a result. It could trigger an ad, a compliance alert, etc.

mongoDB HadoopApp(s) MapReduce

Page 21: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Monday, 8:02AM EST

• Bob clicks on Content Y which is already tagged as “Spices”

• Spice is a new tag type for Bob• Adjust Profile for tag “Spices” and save back to

mongoDB• Analysis = f(profile)

mongoDB HadoopApp(s) MapReduce

Page 22: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Profile in Detail{ user: “Bob”, personalData: { zip: “10024”, gender: “M” }, tags: { PETS: { algo: “A4”, baseline: [0,0,10,4,1322,44,23, … ], hist: [ { ts: datetime1, url: url1 }, { ts: datetime2, url: url2 } // 100 more ]}, SPICE: { hist: [ { ts: datetime3, url: url3 } ]} }}

Page 23: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Tag-based algorithm detailgetRecommendedContent(profile, [“PETS”, other]) { if algo for a tag available {

filter = algo(profile, tag); } fetch N recommendations (filter);}

A4(profile, tag) { weight = get tag (“PETS”) global weighting; adjustForPersonalBaseline(weight, “PETS” baseline); if “PETS” clicked more than 2 times in past 10 mins then weight += 10; if “PETS” clicked more than 10 times in past 2 days then weight += 3; return new filter({“PETS”, weight}, globals)}

Page 24: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Tuesday, 1AM EST

mongoDB HadoopApp(s) MapReduce

• Fetch all user Profiles from mongoDB; load into Hadoop• Or skip if using the mongoDB-Hadoop

connector!

Page 25: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Tuesday, 1:30AM EST

• Grind through all content data and user profile data to produce:• Tags based on feature extraction (vs. creator-

applied tags)• Trend baseline for Pets and PowerTools and Spice

• Data can be specific to individual or by group• Load baseline back into mongoDB

• Or skip if using the mongoDB-Hadoop connector!

mongoDB HadoopApp(s) MapReduce

Page 26: Big Data: Guidelines and Examples for the Enterprise Decision Maker

New Profile in Detail{ user: “Bob”, personalData: { zip: “10024”, gender: “M” }, tags: { PETS: { algo: “A4”, baseline: [0,0,10,4,1322,44,23, … ], hist: [ { ts: datetime1, url: url1 }, { ts: datetime2, url: url2 } // 100 more ]}, SPICE: { hist: [ baseline: [0], { ts: datetime3, url: url3 } ]} }}

Page 27: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Tuesday, 1:35AM EST

• Perform maintenance on user Profiles• Click history trimming (variety of

algorithms)• “Dead tag” removal• Update of auxiliary reference data

mongoDB HadoopApp(s) MapReduce

Page 28: Big Data: Guidelines and Examples for the Enterprise Decision Maker

New Profile in Detail{ user: “Bob”, personalData: { zip: “10022”, gender: “M” }, tags: { PETS: { algo: “A4”, baseline: [ 1322,44,23, … ], hist: [ { ts: datetime1, url: url1 } // 50 more ]}, SPICE: { algo: “Z1”, hist: [ baseline: [0], { ts: datetime3, url: url3 } ]} }}

Page 29: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Feel free to run the baselining more frequently

… but avoid “Are We There Yet?”

mongoDB HadoopApp(s) MapReduce

Page 30: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Nearterm / Realtime Questions & Actions

With respect to the Customer:• What has Bob done over the past 24 hours?• Given an input, make a logic decision in 100ms or

less

With respect to the Provider:• What are all current users doing or looking at?• Can we nearterm correlate single events to shifts in

behavior?

Page 31: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Longterm/ Not Realtime Questions & Actions

With respect to the Customer:• Any way to explain historic performance /

actions?• What are recommendations for the future?

With respect to the Provider:• Can we correlate multiple events from multiple

sources over a long period of time to identify trends?

• What is my entire customer base doing over 2 years?

• Show me a time vs. aggregate tag hit chart• Slice and dice and aggregate tags vs. XYZ• What tags are trending up or down?

Page 32: Big Data: Guidelines and Examples for the Enterprise Decision Maker

The Key To Success: It is One System

mongoDB

Hadoop

App(s)

MapReduce

Page 33: Big Data: Guidelines and Examples for the Enterprise Decision Maker

Webex Q&A