StumbleUpon UK Hadoop Users Group 2011

A Sneak Peek into StumbleUpon’s Infrastructure

Quick SU Intro

Our Traffic

Our Stack:100% Open-Source

• MySQL (legacy source of truth)• Memcache (lots)• HBase (most new apps / features)• Hadoop (DWH, MapReduce, Hive, ...)• elasticsearch (“you know, for search”)• OpenTSDB (distributed monitoring)• Varnish (HTTP load-balancing)• Gearman (processing off the fast path)• ... etc

In prodsince ’09

The InfrastructureArista 7050 Arista 7050

Arista 7048T Arista 7048T Arista 7048T Arista 7048T

2 core switches

1U 52 x 10GbESFP+

48x1GbEcopper

4x10GbESFP+

Thin Nodes

ThickNodes

L3 ECMP

MTU=9000. . .

The Infrastructure• SuperMicro half-width motherboards• 2 x Intel L5630 (40W TDP) (16 hardware threads total)• 48GB RAM• Commodity disks (consumer grade SATA 7200rpm)• 1x2TB per “thin node” (4-in-2U) (web/app servers, gearman, etc.)• 6x2TB per “thick node” (2-in-2U) (Hadoop/HBase, elasticsearch, etc.)

(86 nodes = 1PB)

The Infrastructure• No virtualization• No oversubscription• Rack locality doesn’t matter much (sub-100µs RTT across racks)• cgroups / Linux containers to keep MapReduce under control

Two production HBase clusters per colo• Low-latency (user-facing services)• Batch (analytics, scheduled jobs...)

Low-Latency Cluster• Workload mostly driven by HBase• Very few scheduled MR jobs• HBase replication to batch cluster• Most queries from PHP over Thrift

Challenges:• Tuning Hadoop for low latency• Taming the long latency tail• Quickly recovering from failures

Batch Cluster• 2x more capacity• Wildly changing workload (e.g. 40K 14M QPS)• Lots of scheduled MR jobs• Frequent ad-hoc jobs (MR/Hive)• OpenTSDB’s data >800M data points added per day 133B data points totalChallenges:• Resource isolation• Tuning for larger scale

Questions?

We’re hiringThink this is cool?

StumbleUpon UK Hadoop Users Group 2011

Technology

Transcript of StumbleUpon UK Hadoop Users Group 2011

Slideshare - Reddit and StumbleUpon Presentation

How to Use StumbleUpon

DROPBOX / EVERNOTE / GMAIL / STUMBLEUPON / … · dropbox / evernote / gmail / stumbleupon / instagram / pinterest / delicious DROPBOX / EVERNOTE / GMAIL / STUMBLEUPON / INSTAGRAM

HBase and Hive at StumbleUpon Presentation

StumbleUpon .

Our Secret Arsenal: StumbleUpon, Reddit and Digg

Apache Apex - Hadoop Users Group

Group Presentation on Stumbleupon

Stumbleupon Infographic

Analysis and Processing of Massive Data Based on Hadoop ... · 3. Hadoop cloud platform architecture structure Hadoop is a distributedsystem infrastructure, users can be distributed

FAST VIRTUALIZED HADOOP AND SPARK ON ALL …...Hadoop can be deployed directly from open source Apache Hadoop, but many production Hadoop users employ a distribution such as Cloudera,

Social Media Marketing - SEOViP.vn › ebookseo › Social_Media_Marketing... · Social Bookmarking •Delicious •StumbleUpon •Notes & tags are added to each bookmark •Users

Sons of Perdition - (StumbleUpon) 8-6-2012

Social Media Presentation - StumbleUpon

Traffic Getting Secret from StumbleUpon

Configuration for Hadoop WPS Configuration for Hadoop · components, Hadoop can use Kerberos, which is a third party authentication mechanism, whereby users and services that users

Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*

StumbleUpon Pictures 4

stumbleupon web search instead of google

Introduc)on*to* Hadoop - O'Reilly Mediaassets.en.oreilly.com/1/event/61/Introduction to Hadoop... · · 2011-07-29• TwiNer,*Facebook,*Yahoo,*StumbleUpon*and*others. Copyright2011*ClouderaInc.*All*rights*reserved

Introduc)onto Hadoop - O'Reilly Mediaassets.en.oreilly.com/1/event/61/Introduction to Hadoop... · · 2011-07-29• TwiNer,Facebook,Yahoo,StumbleUponandothers. Copyright2011ClouderaInc.Allrights*reserved