Real Time Data Analytics with MongoDB and Fluentd at Wish

72
Analytics @ Wish Powered by Fluentd & MongoDB

description

 

Transcript of Real Time Data Analytics with MongoDB and Fluentd at Wish

Page 1: Real Time Data Analytics with MongoDB and Fluentd at Wish

Analytics @ Wish

Powered by Fluentd & MongoDB

Page 2: Real Time Data Analytics with MongoDB and Fluentd at Wish

Hi

I’m Adam.

Page 3: Real Time Data Analytics with MongoDB and Fluentd at Wish

Wish ♥︎ MongoDB

• Primary database since 2011

• 67x mongod

• AWS → bare metal (SSDs ftw!)

Page 4: Real Time Data Analytics with MongoDB and Fluentd at Wish

What’s Wish?

• Mobile eCommerce

• 30M+ users worldwide

• Top 10 iOS & Android

Page 5: Real Time Data Analytics with MongoDB and Fluentd at Wish

Experiment

‘cause otherwise you’re just guessing…

Page 6: Real Time Data Analytics with MongoDB and Fluentd at Wish
Page 7: Real Time Data Analytics with MongoDB and Fluentd at Wish

Hypothesis

“Billing Zip” is confusing outside America

Page 8: Real Time Data Analytics with MongoDB and Fluentd at Wish
Page 9: Real Time Data Analytics with MongoDB and Fluentd at Wish
Page 10: Real Time Data Analytics with MongoDB and Fluentd at Wish

Data

Compare checkout conversions for international,

Android users

Page 11: Real Time Data Analytics with MongoDB and Fluentd at Wish
Page 12: Real Time Data Analytics with MongoDB and Fluentd at Wish

Conclusion

~7% boost in mobile sales

Page 13: Real Time Data Analytics with MongoDB and Fluentd at Wish

Goal

Frictionless analytics to everyone

Page 14: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“solution”: [“logging”, “aggregation”, “analysis”, “serving”]}

Page 15: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“solution”: [“logging”, “aggregation”, “analysis”, “serving”]}

Page 16: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“solution”: [“logging”, “aggregation”, “analysis”, “serving”]}

Page 17: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“solution”: [“logging”, “aggregation”, “analysis”, “serving”]}

Page 18: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“solution”: [“logging”, “aggregation”, “analysis”, “serving”]}

Page 19: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“solution”: [“logging”, “aggregation”, “analysis”, “serving”]}

Page 20: Real Time Data Analytics with MongoDB and Fluentd at Wish

Request Logs = Source of Truth

Page 21: Real Time Data Analytics with MongoDB and Fluentd at Wish

{'contest_impressions’:'53060fbd34067e4d6cee70f4,535ad13a7360465e2ca799f8,528b714df689996fdb574800,525976a71c23882ab3b73ecb,5285df6db5baba737f459037,5208ae7d3deaf74a6cc65da4,5209e5c31c238861a1ab91cc,5285df6db5baba735f459061,51f7778f3ba3770a514a5431,527be1fc227d210d2bcdeac5,532fcfe3796f6832713b5c3a,527be203227d210dd5cdeaac,52d3ef2806ea960dde85cb97,527bc781227d210d8acdea47,527bc793227d210d4fcdea48, 5208ad653deaf74a4bc65d41,5208acdd1c238846f9ab9028,5182fc1273c67621e507591b,5311ae6c796f68283f8f86c3, 52de2bf4ab980a2d00da786a,5208a9c53deaf74a75c65c6b,52eca45a717951350382e4be,52d3ef73bb5aa51ccf866c01, 533d6fae5aefb0427771f346,5285df6db5baba734d45901b,51c27d8d5ffe8f0b0b9b0359,52d0e002a30fb227725b6e06, 52f71bd89f5ef741d8f34698,52d3ef71bb5aa53135866d76, 5308bc467360464265101ed9,52d3ef27bb5aa5024d866c09, 52c399d60599170e49fd866e,5209be541c23886177ab91db,5208b15e1c2388615fab91b7', '_country_code': u'CA', '_lang': u'en', '_fb_uid': 500406911, '_device_id': None, '_uid': '4eb346049b120f09f60007c0', '_tid': 2, '_host': 'adam.corp.contextlogic.com', '_last_id': u'cc3aa96b2b3c45bca11009edc049f2f6', '_experiment_tags': ['mobile_commerce_home_v4_female_ignore', 'mobile_large_cart_cell_ignore', 'hannibal_cohort_firsttime_buyer_ignore', 'localize_product_names__fr_ignore', 'mobile_cart_guarantee_view_ignore', 'mobile_related_tags_v2_ignore', 'shipping_price_us_ignore', 'stripe_settle_on_ship_control', 'related_super_feed_iphone_show-v4', 'mobile_commerce_home_v3_male_i18n_show', 'braintree_settle_on_ship_control', 'mobile_show_tabbed_billing_page_i18n_ignore', 'mobile_new_guarantee_text_ios_ignore', 'mobile_use_category_signup_flow_i18n_ignore', 'male_curated_first_ipad_ignore', 'mobile_commerce_home_v4_female_i18n_ignore', 'commerce_product_page_show', 'mobile_use_category_signup_flow_v3_ignore', 'mobile_save_for_price_us_female_relaunch_2_ignore', 'web_stripe_checkout_ignore', 'mobile_show_tabbed_billing_page_us_ignore', 'stripe_checkout_show', 'shipping_price_i18n_fixed-price-promo', 'chukou1_pilot_experiment_ignore', 'mobile_implicit_ratings_v1_show', 'feed_commerce_2_control', 'mobile_commerce_home_v3_male_ignore', 'swap_out_male_feed_show-weight-deep', 'related_super_feed_ipad_ignore', 'female_curated_first_iphone_ignore', 'mobile_psuedo_localized_currency_show', 'hannibal_cohort_repeat_buyer_ignore', 'web_boleto_checkout_ignore', 'exploration_v2_control', 'female_curated_first_android_ignore', 'male_curated_first_android_ignore', 'related_super_feed_android_show-v4', 'curated_feed_female_shopping_ignore', 'mobile_localized_currency_control', 'male_curated_first_iphone_ignore', 'mobile_show_required_shipping_fields_ignore', 'mobile_ct2_variable_shipping_price_showcountry', 'mobile_c2c_ignore', 'localize_product_names__es_ignore', 'related_products_v2_control', 'female_curated_first_ipad_ignore', 'mobile_categories_v1_ignore', 'related_super_feed_show', 'mobile_baby_category_signup_flow_ignore', 'mobile_checkout_offer_v2_control', 'mobile_minimum_notification_interval_ignore', 'mobile_show_tabbed_feed_existing_user_ignore', 'mobile_cart_fake_only_x_left_show', 'late_shipment_apology_v2_ignore', 'mobile_show_tabbed_feed_new_user_ignore'], '_app_type': 0, 'impression_feed_category': None, '_client': 'web', '_refer_url': None, 'sort': 'recommended', '_user_agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36', '_arguments': {}, '_currency': 'CAD', '_protocol': 'http', 'offset': 0, '_method': 'GET', 'count': 40, '_locale': 'en', '_timestamp': 1401996333, '_bsid': '979b5fbcad4f4fdbb1477ae7ba8ed123', '_is_cached': False, '_version': None, '_response_status': 200, 'filter': 'all', '_response_time': 0.2887430191040039, '_uri': '/', '_remote_ip': None, '_is_user_pending': False, '_id': '1e6135e3d2eb4214afdbd99456d71183'}

A feed request…

Page 22: Real Time Data Analytics with MongoDB and Fluentd at Wish

{'products_shown': '...','feed_category': null,'sort': 'recommended','filter': 'all','offset': 0,'count': 40,

Page 23: Real Time Data Analytics with MongoDB and Fluentd at Wish

'_uid': '4eb34609ff60007c0', '_client': 'web','_country_code': 'CA',

'_id': '1e6135e3d9456d7183’,'_last_id’: 'cc39edc49f2f6','_experiment_tags': [...],

Page 24: Real Time Data Analytics with MongoDB and Fluentd at Wish

'_uri': '/','_refer_url': null,'_arguments': {},'_method': 'GET','_locale': 'en','_response_status': 200}

Page 25: Real Time Data Analytics with MongoDB and Fluentd at Wish

One problem

Searching all requests ever is slow

Page 26: Real Time Data Analytics with MongoDB and Fluentd at Wish

Transaction Log{'txn_id': '5390c295e9b9bbe68b2', 'user_id': '4eb346049b9f60007c0’,

'total': 18.0, 'shipping': 2.0,

'items': [{ 'product_id': '537b42379b9e3f55f', 'qty': 1, 'price': 16.0 }] }

Page 27: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“solution”: [“logging”, “aggregation”, “analysis”, “serving”]}

Page 28: Real Time Data Analytics with MongoDB and Fluentd at Wish

Centralize Logs

• Synchronously?

• Fire & forget?

• fluentd!

Page 29: Real Time Data Analytics with MongoDB and Fluentd at Wish

Architecture

App server

Wishfluentd

Aggregation serverfluent

d

Aggregation serverfluent

d

Hadoop/Hive

Page 30: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“solution”: [“logging”, “aggregation”, “analysis”, “serving”]}

Page 31: Real Time Data Analytics with MongoDB and Fluentd at Wish

Hadoop & Hive

• Great for log analysis

• Arbitrary queries

• No schema design constraints

Page 32: Real Time Data Analytics with MongoDB and Fluentd at Wish

Hadoop & Hive

• Running a Hadoop cluster sucks– TreasureData’s managed Hive solution

rocks!

Page 33: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“solution”: [“logging”, “aggregation”, “analysis”, “serving”]}

Page 34: Real Time Data Analytics with MongoDB and Fluentd at Wish

MongoDB!

• Analysis results → MongoDB

• Store all combinations– Unsexy, but fast– 2 TB total

Page 35: Real Time Data Analytics with MongoDB and Fluentd at Wish

Schema

{"_id": ObjectId(…), "click_id": 2, "source_page_id": 1000, "count": 20171, "timestamp": 20140601,

Page 36: Real Time Data Analytics with MongoDB and Fluentd at Wish

Schema

"gender": "Male", "client": "Android", "country": "CA", "experiment_tag": "zip_help_text-show"}

Page 37: Real Time Data Analytics with MongoDB and Fluentd at Wish

Let’s Review

MongoDB

Logs (app servers)

Fluentd

Hadoop/Hive

Page 38: Real Time Data Analytics with MongoDB and Fluentd at Wish

Tools

Who doesn’t love nifty graphs?

Page 39: Real Time Data Analytics with MongoDB and Fluentd at Wish

Dashy

• Graphing dashboard

Page 40: Real Time Data Analytics with MongoDB and Fluentd at Wish

Perimeter

• A/B test reports– Summary

tables, detailed CSVs

– See trade-offs

Page 41: Real Time Data Analytics with MongoDB and Fluentd at Wish

Analytics = faster iteration

More growth, more revenue

Page 42: Real Time Data Analytics with MongoDB and Fluentd at Wish

Analytics = faster iteration

Powered by Fluentd & MongoDB

Page 43: Real Time Data Analytics with MongoDB and Fluentd at Wish

Happy Analyzing!

[email protected]

Page 44: Real Time Data Analytics with MongoDB and Fluentd at Wish
Page 45: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“subtitle”:”Why Fluentd?”}

Page 46: Real Time Data Analytics with MongoDB and Fluentd at Wish

http://cacm.acm.org/blogs/blog-cacm/169199-data-science-workflow-overview-and-challenges/fulltext

Page 47: Real Time Data Analytics with MongoDB and Fluentd at Wish

Acquire Data (or so you think)

WUT!? Invalid UTF8?

Fix the encoding issue…

Yell at the engineers

Some columns are missing!?

Run the script…DIVISION BY

ZERO!!!

Page 48: Real Time Data Analytics with MongoDB and Fluentd at Wish

Hmm…

Page 49: Real Time Data Analytics with MongoDB and Fluentd at Wish

Logging.priority=> :not_super_high

Page 50: Real Time Data Analytics with MongoDB and Fluentd at Wish

Analytics.priority=> :very_high

Page 51: Real Time Data Analytics with MongoDB and Fluentd at Wish

Analytics.needs? :logs=> true

Page 52: Real Time Data Analytics with MongoDB and Fluentd at Wish
Page 53: Real Time Data Analytics with MongoDB and Fluentd at Wish
Page 54: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“subtitle”: ”Overview”, “has_code”: true, “has_example”: true}

Page 55: Real Time Data Analytics with MongoDB and Fluentd at Wish
Page 56: Real Time Data Analytics with MongoDB and Fluentd at Wish

127.0.0.1 - - [05/Feb/2012:17:11:55 +0000] "GET / HTTP/1.1" 200 140 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.5 Safari/535.19"

Page 57: Real Time Data Analytics with MongoDB and Fluentd at Wish

{ "host": "127.0.0.1", "user": "-", "method": "GET", "path": "/", "code": "200", "size": "140", "referer": "-", "agent": “Mozilla/5.0 (Windows…"}

Page 58: Real Time Data Analytics with MongoDB and Fluentd at Wish
Page 59: Real Time Data Analytics with MongoDB and Fluentd at Wish

Parse as JSON!

Page 60: Real Time Data Analytics with MongoDB and Fluentd at Wish

?

Page 61: Real Time Data Analytics with MongoDB and Fluentd at Wish

[“05/Feb/2012:17:11:55”,“web.access”,{ "host": "127.0.0.1", "user": "-", "method": "GET", "path": "/", "code": "200", "size": "140", "referer": "-", "agent": “Mozilla/5.0 (Windows…"}]

Page 62: Real Time Data Analytics with MongoDB and Fluentd at Wish

?

web.mongodb

web.file

web.hdfs

web.s3

web.mysql

Page 63: Real Time Data Analytics with MongoDB and Fluentd at Wish

<source>

type tail

path /var/log/apache/access.log

tag web.access

format apache2

</source>Apache log

Fluentd

Page 64: Real Time Data Analytics with MongoDB and Fluentd at Wish

<source>

type tail

path /var/log/apache/access.log

tag web.access

format apache2

</source>

<match web.access>

type mongo

user kiyoto

password heartbleed

database web

collection access

… # host, port, etc.

</match>

Apache log

Fluentd

MongoDB

Page 65: Real Time Data Analytics with MongoDB and Fluentd at Wish

<match web.access>

type copy

<store>

type mongo

user kiyoto

password heartbleed

database web

collection access

… # host, port, etc.

</store>

<store>

type s3

… # aws secret, bucket, etc.

</store>

</match>

Apache log

Fluentd

MongoDB S3

Page 66: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“subtitle”: ”scalability”}

Page 67: Real Time Data Analytics with MongoDB and Fluentd at Wish

• Automate monitoring!

• App and System metrics

• JSON everywhere

Page 68: Real Time Data Analytics with MongoDB and Fluentd at Wish

• 2000+ node• ~1B events/day• Forwarder-

Aggregator

Page 69: Real Time Data Analytics with MongoDB and Fluentd at Wish

{“subtitle”: ”Demo”, “need”: “Demo Karma”}

Page 70: Real Time Data Analytics with MongoDB and Fluentd at Wish

<source>

type mongostat

uri “172.17.0.2”

</source>

<match mongostat.*.*>

type mongo

user kiyoto

password heartbleed

database web

collection access

… # host, port, etc.

</match>

Fluentd

MongoDB

MongoDB

Page 71: Real Time Data Analytics with MongoDB and Fluentd at Wish

Build your own *MS!

Page 72: Real Time Data Analytics with MongoDB and Fluentd at Wish

{ “install”: “gem install fluentd”, “website”: “www.fluentd.org”, “github” : “fluent/fluentd”, “twitter”: “@fluentd”}