HBase Sizing Guide

44
Sizing Your HBase Cluster Lars George | @larsgeorge EMEA Chief Architect @ Cloudera

description

This talk was given during the HBase Meetup on the 15th of October, 2014 at the Google Offices in Chelsea.

Transcript of HBase Sizing Guide

Page 1: HBase Sizing Guide

Sizing Your HBase Cluster

Lars George | @larsgeorge

EMEA Chief Architect @ Cloudera

Page 2: HBase Sizing Guide

2

Agenda

•  Introduction

•  Technical Background/Primer

•  Best Practices

•  Summary

©2014 Cloudera, Inc. All rights reserved.

Page 3: HBase Sizing Guide

3

Who I am…

Lars George [EMEA Chief Architect]

•  Clouderan since October 2010

•  Hadooper since mid 2007

•  HBase/Whirr Committer (of Hearts)

•  github.com/larsgeorge

©2014 Cloudera, Inc. All rights reserved.

Page 4: HBase Sizing Guide

4

Bruce Lee: ”As you think, so shall you become.”

©2014 Cloudera, Inc. All rights reserved.

Page 5: HBase Sizing Guide

5

Introduction

©2014 Cloudera, Inc. All rights reserved.

Page 6: HBase Sizing Guide

6

HBase Sizing Is...

•  Making the most out of the cluster you have by... –  Understanding how HBase uses low-level resources –  Helping HBase understand your use-case by configuring it appropriately - and/or - –  Design the use-case to help HBase along

•  Being able to gauge how many servers are needed for a given use-case

Page 7: HBase Sizing Guide

7

Technical Background

“To understand your fear is the beginning of really seeing…”

— Bruce Lee

©2014 Cloudera, Inc. All rights reserved.

Page 8: HBase Sizing Guide

8

HBase Dilemma

Although HBase can host many applications, they may require completely opposite features

Events Entities

Time Series Message Store

Page 9: HBase Sizing Guide

9

Competing Resources

•  Reads and Writes compete for the same low-level resources –  Disk (HDFS) and Network I/O –  RPC Handlers and Threads –  Memory (Java Heap)

•  Otherwise they do exercise completely separate code paths

Page 10: HBase Sizing Guide

10

Memory Sharing

•  By default every region server is dividing its memory (i.e. given maximum heap) into –  40% for in-memory stores (write ops) –  20% (40%) for block caching (reads ops) –  Remaining space (here 40% or 20%) go towards usual Java heap usage •  Objects etc. •  Region information (HFile metadata)

•  Share of memory needs to be tweaked

Page 11: HBase Sizing Guide

11

Writes

•  The cluster size is often determined by the write performance –  Simple schema design implies writing to all (entities) or only one region (events)

•  Log structured merge trees like –  Store mutation in in-memory store and write-ahead log –  Flush out aggregated, sorted maps at specified threshold - or - when under pressure –  Discard logs with no pending edits –  Perform regular compactions of store files

Page 12: HBase Sizing Guide

12

Writes: Flushes and Compactions

Older Newer TIME

SIZE (MB)

1000

0

250

500

750

Page 13: HBase Sizing Guide

13

Flushes

•  Every mutation call (put, delete etc.) causes a check for a flush

•  If threshold is met, flush file to disk and schedule a compaction –  Try to compact newly flushed files quickly

•  The compaction returns - if necessary - where a region should be split

Page 14: HBase Sizing Guide

14

Compaction Storms

•  Premature flushing because of # of logs or memory pressure –  Files will be smaller than the configured flush size

•  The background compactions are hard at work merging small flush files into the existing, larger store files –  Rewrite hundreds of MB over and over

Page 15: HBase Sizing Guide

15

Dependencies

•  Flushes happen across all stores/column families, even if just one triggers it

•  The flush size is compared to the size of all stores combined –  Many column families dilute the size –  Example: 55MB + 5MB + 4MB

Page 16: HBase Sizing Guide

16

Write-Ahead Log

•  Currently only one per region server –  Shared across all stores (i.e. column families) –  Synchronized on file append calls

•  Work being done on mitigating this –  WAL Compression –  Multithreaded WAL with Ring Buffer –  Multiple WAL’s per region server ➜ Start more than one region server per node?

Page 17: HBase Sizing Guide

17

Write-Ahead Log (cont.)

•  Size set to 95% of default block size –  64MB or 128MB, but check config!

•  Keep number low to reduce recovery time –  Limit set to 32, but can be increased

•  Increase size of logs - and/or - increase the number of logs before blocking

•  Compute number based on fill distribution and flush frequencies

Page 18: HBase Sizing Guide

18

Write-Ahead Log (cont.)

•  Writes are synchronized across all stores –  A large cell in one family can stop all writes of another –  In this case the RPC handlers go binary, i.e. either work or all block

•  Can be bypassed on writes, but means no real durability and no replication –  Maybe use coprocessor to restore dependent data sets (preWALRestore)

Page 19: HBase Sizing Guide

19

Some Numbers

•  Typical write performance of HDFS is 35-50MB/s

Cell Size OPS 0.5MB 70-100

100KB 350-500

10KB 3500-5000 ??

1KB 35000-50000 ????

This is way to high in practice - Contention!

Page 20: HBase Sizing Guide

20

Some More Numbers

•  Under real world conditions the rate is less, more like 15MB/s or less

– Thread contention and serialization overhead is cause for massive slow down

Cell Size OPS 0.5MB 10

100KB 100

10KB 800

1KB 6000

Page 21: HBase Sizing Guide

21

Write Performance

•  There are many factors to the overall write performance of a cluster –  Key Distribution ➜ Avoid region hotspot –  Handlers ➜ Do not pile up too early –  Write-ahead log ➜ Bottleneck #1 –  Compactions ➜ Badly tuned can cause ever increasing background noise

Page 22: HBase Sizing Guide

22

Cheat Sheet

•  Ensure you have enough or large enough write-ahead logs

•  Ensure you do not oversubscribe available memstore space

•  Ensure to set flush size large enough but not too large

•  Check write-ahead log usage carefully

•  Enable compression to store more data per node

•  Tweak compaction algorithm to peg background I/O at some level

•  Consider putting uneven column families in separate tables

•  Check metrics carefully for block cache, memstore, and all queues

Page 23: HBase Sizing Guide

23

Example: Write to All Regions

•  Java Xmx heap at 10GB

•  Memstore share at 40% (default) –  10GB Heap x 0.4 = 4GB

•  Desired flush size at 128MB –  4GB / 128MB = 32 regions max!

•  For WAL size of 128MB x 0.95% –  4GB / (128MB x 0.95) = ~33 partially uncommitted logs to keep around

•  Region size at 20GB –  20GB x 32 regions = 640GB raw storage used

Page 24: HBase Sizing Guide

24

Notes

•  Compute memstore sizes based on number of written-to regions x flush size

•  Compute number of logs to keep based on fill and flush rate

•  Ultimately the capacity is driven by –  Java Heap –  Region Count and Size –  Key Distribution

Page 25: HBase Sizing Guide

25

Reads

•  Locate and route request to appropriate region server –  Client caches information for faster lookups

•  Eliminate store files if possible using time ranges or Bloom filter

•  Try block cache, if block is missing then load from disk

Page 26: HBase Sizing Guide

26

Seeking with Bloom Filters

Page 27: HBase Sizing Guide

27

Writes: Where’s the Data at?

Older Newer TIME

SIZE (MB)

1000

0

250

500

750

Existing Row Mutations Unique Row Inserts

Page 28: HBase Sizing Guide

28

Block Cache

•  Use exported metrics to see effectiveness of block cache –  Check fill and eviction rate, as well as hit ratios ➜ random reads are not ideal

•  Tweak up or down as needed, but watch overall heap usage

•  You absolutely need the block cache –  Set to 10% at least for short term benefits

Page 29: HBase Sizing Guide

29

Testing: Scans

HBase scan performance •  Use available tools to test •  Determine raw and KeyValue read performance –  Raw is just bytes, while KeyValue means block parsing

•  Insert data using YCSB, then compact table –  Single region enforced

•  Two test cases –  Small data: 1 column with 1 byte value –  Large(r) data: 1 column with 1KB value

•  About same size for both in total: 15GB

©2014 Cloudera, Inc. All rights reserved.

Page 30: HBase Sizing Guide

30

Testing: Scans

©2014 Cloudera, Inc. All rights reserved.

Page 31: HBase Sizing Guide

31

Scan Row Range

•  Set start and end key to limit scan size

Page 32: HBase Sizing Guide

32

Best Practices

“If you spend too much time thinking about a thing, you'll never get it done.”

— Bruce Lee

©2014 Cloudera, Inc. All rights reserved.

Page 33: HBase Sizing Guide

33

How to Plan

Advice on

•  Number of nodes

•  Number of disk and total disk capacity

•  RAM capacity

•  Region sizes and count

•  Compaction tuning

©2014 Cloudera, Inc. All rights reserved.

Page 34: HBase Sizing Guide

34

Advice on Nodes

•  Use previous example to compute effective storage based on heap size, region count and size –  10GB heap x 0.4 / 128MB x 20GB = 640GB, if all regions are active –  Address more storage with read-from-only regions

•  Typical advice is to use more nodes with fewer, smaller disks (6 x 1TB SATA or 600GB SAS, or SSDs)

•  CPU is not an issue, I/O is (even with compression)

©2014 Cloudera, Inc. All rights reserved.

Page 35: HBase Sizing Guide

35

Advice on Nodes

•  Memory is not an issue, heap sizes small because of Java Garbage Collection limitation –  Up to 20GB has been used –  Newer versions of Java should help –  Use off-heap cache

•  Current servers typically have 48GB+ memory

©2014 Cloudera, Inc. All rights reserved.

Page 36: HBase Sizing Guide

36

Advice on Tuning

•  Trade off throughput against size of single data points –  This might cause schema redesign

•  Trade off read performance against write amplification –  Advise users to understand read/write performance and background write amplification

Ø This drives the number of nodes needed!

©2014 Cloudera, Inc. All rights reserved.

Page 37: HBase Sizing Guide

37

Advice on Cluster Sizing

•  Compute the number of nodes needed based on –  Total storage needed –  Throughput required for either reads and writes

•  Assume ≈15MB/s minimum for each read and write –  Increasing the KeyValue sizes improves this

©2014 Cloudera, Inc. All rights reserved.

Page 38: HBase Sizing Guide

38

Example: Twitter Firehose

©2014 Cloudera, Inc. All rights reserved.

Page 39: HBase Sizing Guide

39

Example: Consume Data

©2014 Cloudera, Inc. All rights reserved.

Page 40: HBase Sizing Guide

40

HBase Heap Usage

•  Overall addressable amount of data is driven by heap size –  Only read-from regions need space for indexes,

filters –  Written-to regions also need MemStore space

•  Java heap space is limited still as garbage collections will cause pauses –  Typically up to 20GB heap –  Or invest is pause-less GC

Page 41: HBase Sizing Guide

41

Summary

“All fixed set patterns are incapable of adaptability or pliability. The truth is outside of all fixed patterns.”

— Bruce Lee

©2014 Cloudera, Inc. All rights reserved.

Page 42: HBase Sizing Guide

42

WHHAT BRUCE? IT DEPENDS? L

©2014 Cloudera, Inc. All rights reserved.

Page 43: HBase Sizing Guide

43

Checklist

To plan for the size of an HBase cluster you have to:

•  Know the use-case –  Read/write mix –  Expected throughput –  Retention policy

•  Optimize the schema and compaction strategy –  Devise a schema that allows for only some regions being written to

•  Take “known” numbers to compute cluster size

©2014 Cloudera, Inc. All rights reserved.

Page 44: HBase Sizing Guide

Thank you @larsgeorge