HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data

You’ve got HBaseHow AOL Mail handles Big Data

May 22, 2012Presented at HBaseCon

Presented atHBaseCon 2012

The AOL Mail SystemOver 15 years old

Constantly evolving

10,000+ hosts

70+ Million mailboxes

50+ Billion emails

A technology stack that runs the gamut

What that means…Lots of data

Lots of moving parts

Tight SLAs

Mature system + Young software = Tough marriageWe don’t buy “commodity” hardware

Engrained Dev/QA/Prod product lifecycle

Somewhat “version locked” to tried-and-true platforms

Expect service outages to be quickly mitigated by our NOC w/out waiting for an on-call

So where does HBase fit?It’s a component, not the foundation

Currently used in two places

Being evaluated for moreIt will remain a tool in our diverse Big Data arsenal

An Activity Profiler

An “Activity Profiler”Watches for particular behaviors

Designed and built in 6/2010

Originally “vanilla” Hadoop 0.20.2 + HBase 0.90.2

Currently CDH3

1.4+ Million Events/min

60x 24TB (raw) DataNodes w/ local RegionServers

15x application hosts

Is an internal-only toolUsed by automated anti-abuse systems

Leveraged by data analysts for adhoc queries/MapRed

An “Activity Profiler”

Why the “Event Catcher” layer?Has to “speak the language” of our existing systems

Easy to plug an HBase translator in to existing data feeds

Hard to modify the infrastructure to speak HBase

Flume was too young at the time

Why batch load via MapRed?Real time is not currently a requirement

Allows filtering at different points

Allows us to “trigger” eventsDesigned before coprocessors

Early data integrity issues necessitated “replaying”Missing append support early on

Holes in the Meta table

Long splits and GC pauses caused client timeouts

Can sample data into a “sandbox” for job development

Makes pig, hive, and other MapRed easy and stableWe keep the raw data around as well

HBase and MapRed can live in harmonyBigger than “average” hardware

36+GB RAM

8+ cores

Proper system tuning is essentialGood information on tuning Hadoop is prolific, but…

XFS > EXT

JBOD > RAID

As far as HBase is concerned…

Just go buy Lars’ book

Careful job development, optimization is key!

Contact History API

Contact History API Services a member-facing API

Designed and built in 10/2010

Modeled after the previous applicationBuilt by a different Engineering team

Used to solve a very different problem

250K+ Inserts/min

3+ Million Inserts/min during MapRed

20x 24TB (raw) DataNodes w/ local RegionServers

14x application hosts

Leverages Memcached to reduce query load on HBase

Contact History API

Where we go from here

Amusing mistakes to learn fromExploding regions

Batch inserts via MapRed result in fast, symmetrical key space growth

Attempting to split every region at the same time is a bad idea

Turning off region splitting and using a custom “rolling region splitter” is a good idea

Take time and load into consideration when selecting regions to split

Backups, backups, backups!You can never have to many

Large, non-splitable regions tell you thingsOur key space maps to accounts

Excessively large keys equal excessively “active” accounts

Next-generation model

Thanks!

HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data

Technology

Transcript of HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data

HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live Website

MyLife with HBase or HBase three flavors

Reusable data access patterns by Gary Helmling, HBaseCon 2015

HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data Collection

HBaseCon 2014-Just the Basics

BIG DATA HADOOP FULLlBulk Loading in HBase lCreate, Insert, Read Tables in HBase lHBase Admin APIs l HBase Security lHBase vs Hive lBackup & Restore in HBase lApache HBase External

HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!

HBaseCon 2015- HBase @ Flipboard

HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon

HBaseCon 2012 | Building Mobile Infrastructure with HBase

HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower

HBaseCon 2012 | Scaling GIS In Three Acts

HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe

HBaseCon 2012 | Growing Your Inbox, HBase at Tumblr

Elastic HBase on Mesos - HBaseCon 2015

HBaseCon 2012 | Orchestrating Clusters with Ironfan and Chef - Runa

HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera

Building a LINQ Provider for HBase MapReduce · 2019-04-30 · HBase/ Hadoop Building a LINQ Provider for HBase MapReduce Building a LINQ Provider for HBase MapReduce Summary HBase