HP Discover: Real Time Insights from Big Data

13

description

Slides from HP Discover Europe, 10-12 December 2013. Covering systems architecture, use cases, and real time interactive visualization

Transcript of HP Discover: Real Time Insights from Big Data

Page 1: HP Discover: Real Time Insights from Big Data
Page 2: HP Discover: Real Time Insights from Big Data

Billions of Rows, Millions of Insights

Right Now

Developing a Landscape for Real Time Information

Page 3: HP Discover: Real Time Insights from Big Data

Spil Games: A leader in online gaming• 180 million monthly and 12 million daily players

• >50 websites, local in 15 languages

• A rich source of data about traffic, content, and consumers

• Battling changing consumer expectations on content delivery (the Netflix effect)

Page 4: HP Discover: Real Time Insights from Big Data

Big data created big paradigm shifts

• Highly consistent• Highly connectable• Inflexible• Slow

• Open• Adaptive/Evolving• Inconsistent

You always need both

Traditionally, we define data based on what we expect

With big data, we capture first and define later

Capture

Explore Define

Apply + Track

Page 5: HP Discover: Real Time Insights from Big Data

VELOCITY

VARIETY

VERACITY

What is big data?

VALUEThe Only V that Matters

Big data also brings new challenges: the four Vs

Page 6: HP Discover: Real Time Insights from Big Data

Velocity: What is real time?

Traditional ETL“Real Time”

• Once a day• Once a week• Delayed

• Faster than human perception

• <200 milliseconds “In Time”

In Time: Information is available fast enough to influence decisions• While in the shop/on the site (minutes)• While the query runs (seconds)• While the page loads (milliseconds)

The Velocity Continuum

Page 7: HP Discover: Real Time Insights from Big Data

How big data drives value at Spil

Informing Decisions Making Decisions

• Day to day business reporting

• Analytical reporting for self-service analysis

• Business analytics for advising decisions

• Descriptive models to explain our business

• Customer Lifetime Value• Marketing ROI

• Customer content recommendations

• Email campaign targeting

• Site learning and optimization

• System monitoring and alerting

Page 8: HP Discover: Real Time Insights from Big Data

Unstructured data intake

Unstructured data storage

Structured data storage

Human interface layer

Predictive analytics tools

Select A,B,sum(C)From XGroup by 1,2

• High Query Performance• Denormalized• Scalable; high concurrency

• Cheap• Flexible Schema• Easy Management

• Scalable• Schemaless or adaptive schema• Resilient

• Highly Flexible• Simple to use• In-tool metadata

• Not memory constrained• Flexible inputs/outputs• Easy iteration

The pieces needed for a big data stack

Page 9: HP Discover: Real Time Insights from Big Data

The nuts and bolts of our big data tech

Page 10: HP Discover: Real Time Insights from Big Data

Why we chose our tech

• Affordable• Highly available and resilient

• Extremely fast development due to SQL• Excellent query performance = lazy

optimization

• Right price• Easy (and fun!) development• Excellent library availability

• Industry standard for Map/Reduce• Cheap storage for “data lake”

• Easy integration with existing tech

Page 11: HP Discover: Real Time Insights from Big Data

How much data do we handle?

Through Map/Reduce: 1.4 Billion Events/Day (200 Million Rows/Day

into DWH)

Through ETL: 100-200 Million

Rows/Day into DWH

Map/Reduce: 20 Billion Rows

Vertica: 50 Billion Rows

Long Term Storage:All of 2013 Events

Predictive models: >500 million scores per day

ETLs to Production DBs: >10 Models

Reporting: 150 Dashboards, 80 data

sources

Queries: >2000 per day

Ingestion Persistence Usage

Page 12: HP Discover: Real Time Insights from Big Data

What it drives for us every day

Demographic Prediction

Multivariate Testing/Site Optimization

Page 13: HP Discover: Real Time Insights from Big Data

Q&A + Demo