2011 x.commerce Innovate Data Alchemy

29

description

The New Alchemy: Turning Data into Gold Developers are leading the charge to turn consumer behavior into profitable solutions. By accessing and analyzing the explosion of data from consumer activities, any developer can create the personalized, relevant products and services that customers demand and merchants urgently need. We will discuss how to acquire, store, and mine information, and how to design analytics-focused software and build data-driven software engines.

Transcript of 2011 x.commerce Innovate Data Alchemy

Page 1: 2011 x.commerce Innovate Data Alchemy

!!

!

Page 2: 2011 x.commerce Innovate Data Alchemy

Every Second – in over 50,000 Categories

Page 3: 2011 x.commerce Innovate Data Alchemy

eBay Analytics

3

>50 TB/day new data

>150 PB/day >100 Trillion pairs of information

Millions of queries/day

>7500 business users & analysts

>50k chains of logic

24x7x365 99.98+% Availability

turning over a TB every second Structured/Unstructured

Near-Real-time

>100k data elements

Always online

Processed

Page 4: 2011 x.commerce Innovate Data Alchemy

Big

Page 5: 2011 x.commerce Innovate Data Alchemy

Detail

Page 6: 2011 x.commerce Innovate Data Alchemy

Designing for the Unknown >85% of analytical workload is NEW & Unknown The metrics you know are cheap The metrics you don’t know are expensive – but high in potential ROI Exploration & Testing are core pillars of an analytics-driven

organization

Page 7: 2011 x.commerce Innovate Data Alchemy

incremental storage

Volume

DATA

Page 8: 2011 x.commerce Innovate Data Alchemy

processing

change

incremental storage

Volume

Velocity DATA

Page 9: 2011 x.commerce Innovate Data Alchemy

structured

incremental storage

processing

change

Volume

Variety Velocity DATA

semi-structured un-structured

Page 10: 2011 x.commerce Innovate Data Alchemy

www.wallpapertimes.com

$’s per year in incremental revenue

Value > Cost

Page 11: 2011 x.commerce Innovate Data Alchemy

!   Data Growing Faster

Page 12: 2011 x.commerce Innovate Data Alchemy
Page 13: 2011 x.commerce Innovate Data Alchemy

•  Impact

Page 14: 2011 x.commerce Innovate Data Alchemy
Page 15: 2011 x.commerce Innovate Data Alchemy
Page 16: 2011 x.commerce Innovate Data Alchemy

16

questions later

structure later

($0.04/GB, $80/2TB)

single HDFS instances >50PB

Value > Cost

Data

Page 17: 2011 x.commerce Innovate Data Alchemy
Page 18: 2011 x.commerce Innovate Data Alchemy

Synonyms  derived  from  top  queries  in  item  query  clusters  texas  instruments  ba  ii  plus   /  ba  ii  plus  brighton  handbag   brighton  purse  lenovo  x200   thinkpad  x200  king  bedspread   king  coverlet  rockabilly  dress   swing  dress  1963  ford  falcon   63  falcon  jessica  simpson  hair  extensions   jessica  simpson  hairdo  

 Abbrevia7ons/acronym  derived  from  query  transi7ons  

stanford  ky   stanford  kentucky  dc  sub   dc  subwoofer  snowboard  helmet  l   snowboard  helmet  large  motorcycle  cam   motorcycle  camera  diamond  amp   diamond  amplifier  

Page 19: 2011 x.commerce Innovate Data Alchemy

Toys and Hobbies ATC > Artist trading card in ART ATC > Automatic Tool Change in Business and Industrial

Page 20: 2011 x.commerce Innovate Data Alchemy
Page 21: 2011 x.commerce Innovate Data Alchemy
Page 22: 2011 x.commerce Innovate Data Alchemy

Service

Offline Online

Editorial

Big Data Store NoSQL

Small Data

Code

Clients

Search

Selling

Others…

Behavioral Logs

Document Data

<3 milliseconds per query 1.2 billion queries per day 1,000’s of queries per second per machine

Human Judgment

Page 23: 2011 x.commerce Innovate Data Alchemy
Page 24: 2011 x.commerce Innovate Data Alchemy

German Compound Words •  German compound words can be arbitrarily created and extremely long

Adidastrainingsanzug (Adidas track suit) Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz

(beef labeling regulation & delegation of supervision law) •  Syntactically, words can be combined and split in many ways. •  Some words shouldn’t be de-compounded.

beiden (both) – bei(at) den(the) •  Too many candidates for

Granitpflastersteine (granite paving stones) Granit(granite) pflastersteine(cobblestones) Granit(granite) pflaster(paving/band-aid) steine(stones)

•  Binding characters Hochzeitsschuhe (grammatically correct, 593 hits on ebay.de) Hochzeitschuhe (129 hits on ebay.de).

Page 25: 2011 x.commerce Innovate Data Alchemy

Data Warehouse + Behavioral

Data Warehouse

Semi-Structured SQL++

Structured SQL

Low End Enterprise-class System

Contextual-Complex Analytics Deep, Seasonal, Consumable Data Sets

Production Data Warehousing Large Concurrent User-base

Enterprise-class System

Unstructured Java/C++/Pig/Hive

Structure the Unstructured Detect Patterns

Commodity Hardware System

8+PB 60+PB 40+PB

Hadoop

Analyze & Report

Discover & Explore

Page 26: 2011 x.commerce Innovate Data Alchemy
Page 27: 2011 x.commerce Innovate Data Alchemy

Brian knows the satisfaction and importance of good search results, and his team is responsible for ensuring that the millions of queries entered onto the eBay website provide just that. The words “Did you mean…?” are incredibly meaningful to Brian as he combs through a universe of queries altered by synonyms, acronyms, attributes, and expansions. He’s been doing this sort of work since he joined eBay nine years ago. Brian has loved technology ever since junior high school, when he played the game “Lunar Lander” on a paper teletype before video games existed, and pulled pranks in the local Radio Shack. When Brian gets outside, he goes backpacking on Mount Whitney, enters triathlons, and walks on water (barefoot water skiing).

Page 28: 2011 x.commerce Innovate Data Alchemy
Page 29: 2011 x.commerce Innovate Data Alchemy