006 performance tuningandclusteradmin

75
PERFORMANCE TUNING & CLUSTER ADMINISTRATION 2012/8/2 Scott Miao

Transcript of 006 performance tuningandclusteradmin

Page 1: 006 performance tuningandclusteradmin

PERFORMANCE TUNING & CLUSTER ADMINISTRATION2012/8/2

Scott Miao

Page 2: 006 performance tuningandclusteradmin

2

AGENDA

Course Credit

Performance Tuning More…

Cluster Administration More…

Page 3: 006 performance tuningandclusteradmin

3

COURSE CREDIT

Show up, 30 scores Ask question, each question earns 5 scores Hands-on, 40 scores 70 scores will pass this course

Each course credit will be calculated once for each course finished

The course credit will be sent to you and your supervisor by mail

Page 4: 006 performance tuningandclusteradmin

4

PERFORMANCE TUNING

Garbage Collection Tuning MSLAB Compression Optimizing Splits and Compactions Load Balancing Merging Regions Client API: Best Practices Configuration Load Tests

Page 5: 006 performance tuningandclusteradmin

5

GARBAGE COLLECTION TUNING

The process to rewrite the heap generation in question is called a garbage collection (GC)

GC parameters only need to be added to the region servers

JRE comes with basic assumptions Regarding what your programs are doing, how

they create objects, how they allocate the heap to handle data, and so on

These assumptions work well in a lot of cases

But NOT work well for HBase… Especially write-heavy ones It cannot safely rely on the JRE assumption alone

Page 6: 006 performance tuningandclusteradmin

6https://service.ithome.com.tw/20120720Java/index3.html#3

Page 7: 006 performance tuningandclusteradmin

7

Page 8: 006 performance tuningandclusteradmin

8

GARBAGE COLLECTION TUNING – WRITE-HEAVY USE CASES (1/2)

Memstore flushes the data by the configured minimum flush size, hbase.hregion.memstore.flush.size

It leaves different size of holes in the heap Data resided in different locations in the

generational architecture of the Java heap Depending on how long the data was in memory Young generation (new generation)

The space can be reclaimed quickly and no harm is done Old generation (tenured generation)

Data promoted to this location if it stays in memory for a longer period of time

Page 9: 006 performance tuningandclusteradmin

9

GARBAGE COLLECTION TUNING – WRITE-HEAVY USE CASES (2/2)

Reuse the holes created by data that has been written to disk

Requests a size of heap that does not fit into one of those holes Needs to compact the fragmented heap Young to Old

The promotion of longer-living objects from the young to the old generation

Old to Stop-The-World There is no longer enough space for a young allocation

caused by the fragmentation Falls back to the stop-the-world garbage collector Rewrites the entire heap space and compacts it to the

remaining active objects

If this fails, you will see a promotion failure in your garbage collection logs

Page 10: 006 performance tuningandclusteradmin

10What is the Heap looks like ?

Page 11: 006 performance tuningandclusteradmin

11

GARBAGE COLLECTION TUNING –SPECIFY THE YOUNG GENERATION SIZE

Young generation is between 128 MB and 512 MB

Old generation holds the remaining available heap, which is

usually many gigabytes of memory Using 128 MB is a good starting point

Further observation of the JVM metrics should be conducted

Specify the young generation size like so -XX:MaxNewSize=128m -XX:NewSize=128m One convenient option

-Xmn128m

Page 12: 006 performance tuningandclusteradmin

12

GARBAGE COLLECTION TUNING – GC OPTIONS SETTING

GC Options setting for HBase Adding them in the hbase-env.sh configuration

file HBASE_OPTS variable for all HBase HBASE_REGIONSERVER_OPTS variable for all

region servers Enable the JRE’s log output for garbage

collection details

Monitor it for occurrences of "concurrent mode failure" or "promotion

failed" messages

-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps \ -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"

Page 13: 006 performance tuningandclusteradmin

13

GARBAGE COLLECTION TUNING – GC STRATEGY FOR YOUNG GENERATION

Recommended value for young generation -XX:+UseParNewGC

Use the Parallel New Collector It stops the entire Java process to clean up the

young generation heap Since Young generation’s size is small in comparison

Usually less than a few hundred milliseconds

Page 14: 006 performance tuningandclusteradmin

14

GARBAGE COLLECTION TUNING – GC STRATEGY FOR OLD GENERATION

Recommended value for old generation -XX:+UseConcMarkSweepGC

Use the Concurrent Mark-Sweep Collector (CMS) It tries to do as much work concurrently as

possible, without stopping the Java process It takes extra effort and an increased CPU load

Avoids the required stops to rewrite a fragmented old generation heap

If you hit the promotion error It falls back to stop-the-world again

Page 15: 006 performance tuningandclusteradmin

15

GARBAGE COLLECTION TUNING – GC STRATEGY FOR OLD GENERATION

A switch for CMS -XX:CMSInitiatingOccupancyFraction=70

A percentage that specifies when the background process starts

Avoids the concurrent mode failure The background process to mark and sweep the heap for

collection is still running when the heap runs out of usable space

Falls back to stop-the-world again Initiating occupancy fraction to 70%

20% block cache + 40% memstore limits = 60%, by default

Starts the background process at appropriate time Early enough, and not too early

Page 16: 006 performance tuningandclusteradmin

16

GARBAGE COLLECTION TUNING - SUMMARY Recommended GC options

The Alex Su’s GC options

GC Options Reference

export HBASE_REGIONSERVER_OPTS= "-Xmx8g -Xms8g -Xmn128m -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc \ -XX:+PrintGCDetails -XX:+PrintGCTimeStamps \ -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"

-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps \-Xloggc:<%= hbase_log_path %>/hbase-regionserver-gc-`date +%F-%H-%M-%S`.log \-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled \-XX:CMSInitiatingOccupancyFraction=70 -XX:PrintFLSStatistics=1 \-XX:+HeapDumpOnOutOfMemoryError \-XX:HeapDumpPath=<%= hbase_log_path %>/hbase-regionserver.hprof

http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

Page 17: 006 performance tuningandclusteradmin

17

MSLAB - QUESTION

For solving the stop-the-world issue Stop-the-world

The key to reducing these compacting collections is to reduce fragmentation

Only objects of exactly the same size should be allocated from the heap Subsequent allocations of new objects of the exact

same size will always reuse these holes No promotion error, and therefore no stop-the-world

compacting collection is required

Page 18: 006 performance tuningandclusteradmin

18

MSLAB – MEMSTORE-LOCAL ALLOCATION BUFFER (1/3)

Are buffers of fixed sizes containing KeyValue instances of varying sizes

1. A buffer cannot completely fit a newly added KeyValue, it is considered full

2. And a new buffer is created, once again of the given fixed size

Enabled by default in version 0.92 Disabled in version 0.90 of HBase hbase.hregion.memstore.mslab.enabled property

It is recommended that test your setup with this feature

Page 19: 006 performance tuningandclusteradmin

19

MSLAB – MEMSTORE-LOCAL ALLOCATION BUFFER (2/3)

The size of each allocated, fixed-sized buffer hbase.hregion.memstore.mslab.chunksize property Default is 2 MB Based on your KeyValue instances, you may have to

adjust this value E.g., 100 KB in size, you need to increase the MSLAB size

to fit more than just a few cells

An upper boundary of what is stored in the buffers hbase.hregion.memstore.mslab.max.allocation

property Default 256 KB Any cell (KeyValue) that is larger will be directly

allocated in the Java heap

Page 20: 006 performance tuningandclusteradmin

20

MSLAB – MEMSTORE-LOCAL ALLOCATION BUFFER (3/3)

MSLAB do not come without a cost More wasteful in regard to heap usage

Most likely not fill every buffer to the last byte

A Tradeoff Use MSLABs and benefit from better garbage

collection but incur the extra space that is required NOT use MSLABs and benefit from better memory

efficiency but deal with the problem caused by garbage collection pauses Could plan to restart the servers every few days, or

weeks, before the pause happens The buffers require an additional byte array copy

operation, therefore slightly slower Measure the impact on your workload

Page 21: 006 performance tuningandclusteradmin

21

COMPRESSION

A number of compression algorithms that can be enabled at the column family level

It is recommended Enable compression unless you have a reason

not to do so For example, when using already compressed content,

such as JPEG images

Compression usually will yield overall better performance The overhead of the CPU performing the

compression /de-compression is less than what is required to read more data from disk

Page 22: 006 performance tuningandclusteradmin

22

COMPRESSION – AVAILABLE CODECS

It is recommended Snappy/Zippy (in Bigtable)

Released by Google under the BSD License Ships with the required JNI libraries to be able to use it in HBase-

0.92 Must install the native binary library on all region servers

LZO (Lempel-Ziv-Oberhumer) A lossless data compression algorithm that is focused on

decompression speed, and written in ANSI C HBase cannot ship with LZO because of licensing issues

incompatible GNU General Public License (GPL) LZO installation needs to be performed separately, after

HBase has been installed

http://norfolk.cs.washington.edu/htbin-post/unrestricted/colloq/details.cgi?id=437

Page 23: 006 performance tuningandclusteradmin

23

COMPRESSION – COMPRESSION TEST TOOL Use command

hbase org.apache.hadoop.hbase.util.CompressionTest <path> <none|gz|lzo|snappy>

Example ./bin/hbase

org.apache.hadoop.hbase.util.CompressionTest \ /user/larsgeorge/test.gz gz

It will return result based on the test If success

If failed

…SUCCESS

Exception in thread "main" java.lang.RuntimeException: \ java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec…

Page 24: 006 performance tuningandclusteradmin

24

COMPRESSION – STARTUP CHECK

A fast failing setup notices the missing libraries Instead of running into issues later

For example, check the Snappy and LZO compression libraries

The server will abort at startup with an IOException stating "Compression codec <codec-name> not

supported, aborting RS construction" Copy the changed configuration file to all

region servers and to restart them afterward

<property> <name>hbase.regionserver.codecs</name> <value>snappy,lzo</value></property>

Page 25: 006 performance tuningandclusteradmin

25

COMPRESSION – ENABLING COMPRESSION

Install the JNI libraries

Install native compression libraries

Specifying the chosen algorithm in the column family schema In HBase shell

create 'testtable', { NAME => 'colfam1', COMPRESSION => 'GZ' }

In API HColumnDescriptor.setCompressionType(…) Refer to ppt#003, p#11

Page 26: 006 performance tuningandclusteradmin

26

OPTIMIZING SPLITS AND COMPACTIONS- SPLIT/COMPACTION STORMS

Grow your regions roughly at the same rate

Eventually they all need to be split at about the same time

A large spike in disk I/O because of the required compactions to rewrite the split region Refer to ppt#004, p#13

Page 27: 006 performance tuningandclusteradmin

27

OPTIMIZING SPLITS AND COMPACTIONS – MANAGED SPLITTING (1/2) you can turn it off and manually invoke the split and

major_compact commands Setting Region Maximum File Size

hbase.hregion.max.filesize property for the entire cluster table level by API

HTableDescriptor.setMaxFileSize(…) Refer to ppt#003, p#7

To a very high number Better to set this value to a reasonable upper boundary

Such as 100GB Long.MAX_VALUE is not recommended in case the manual

splits fail to run Then you can time-control them

Running them staggered across all regions Spreads the I/O load as much as possible, avoiding any

split/compaction storm Use HBase shell + cron Or write your own codes with HBase Admin API supports Refer to #003, p#21

Page 28: 006 performance tuningandclusteradmin

28

OPTIMIZING SPLITS AND COMPACTIONS –MANAGED SPLITTING (2/2)

RegionSplitter Class (added in version 0.90.2) Another way to split existing regions Rolling split feature

Split the existing regions while waiting long enough for the involved compactions to complete

API docs

An additional advantage Have better control over which regions are

available at any time In rare case, you need to do very low-level

debugging With automated splits, it is hard to debug !!

Due to this region is split to two daughter regions

Page 29: 006 performance tuningandclusteradmin

29

OPTIMIZING SPLITS AND COMPACTIONS –REGION HOTSPOTTING You may be dealing with a write pattern that is

causing a specific region to run hot Use Region Server Metrics to observe

Refer to ppt#005, p#12

Key design approaches Salt keys, random keys, etc Refer to ppt#004, p#52

Other only way to alleviate this situation Manually split a hot region into one or more new

regions, at exact boundaries You can specify any row key within specific region

Be able to generate halves that are completely different in size Refer ppt#003, p#21

This can not dealing with completely sequential key ranges Those are always going to hit one region for a considerable

amount of time

Page 30: 006 performance tuningandclusteradmin

30

OPTIMIZING SPLITS AND COMPACTIONS –PRESPLITTING REGIONS (1/3) Manage splits manually is useful

Therefore start with a larger number of regions right from the table creation

Means to create a table with the required number of regions

Three ways… HBase shell

create, refer to ppt#003, p#37 API

HBaseAdmin.createTable(…), refer to ppt#003, p#16 RegionSplitter Class

By default, MD5StringSplit class to partition the row keys into ranges Use -D split.algorithm=<your-algorithm-class> for other

implementation

/bin/hbase org.apache.hadoop.hbase.util.RegionSplitterusage: RegionSplitter <TABLE>

Page 31: 006 performance tuningandclusteradmin

31

OPTIMIZING SPLITS AND COMPACTIONS –PRESPLITTING REGIONS (2/3)

RegionSplitter with MD5StringSplit sample

testtable,,1309766006467.c0937d09f1da31f2a6c2950537a61093.testtable,0ccccccc,1309766006467.83a0a6a949a6150c5680f39695450d8a.testtable,19999998,1309766006467.1eba79c27eb9d5c2f89c3571f0d87a92.testtable,26666664,1309766006467.7882cd50eb22652849491c08a6180258.testtable,33333330,1309766006467.cef2853e36bd250c1b9324bac03e4bc9.testtable,3ffffffc,1309766006467.00365940761359fee14d41db6a73ffc5.

Page 32: 006 performance tuningandclusteradmin

32

OPTIMIZING SPLITS AND COMPACTIONS –PRESPLITTING REGIONS (3/3)

How many presplit regions ? Start low with 10 presplit regions per server and watch

as data grows over time It is better to err on the side of too few regions and

using a rolling split later

If Presplit regions to thin Increase hbase.hregion.majorcompaction property Refet to ppt#004, p# 19

If data size grows too large Use the RegionSplitter utility to perform a rolling split of

all regions

The main objective is to avoid split/compaction storm

Page 33: 006 performance tuningandclusteradmin

33

LOAD BALANCING – BALANCER (1/3)

The master has a built-in feature Called the balancer

By default, runs every five minutes hbase.balancer.period property

Attempts to equal out the number of assigned regions per region server Within one region of the average number per

server Determines a new assignment plan

Describes which regions should be moved where starts the process of moving the regions by calling the unassign() method Refer to ppt#003, p#22

Page 34: 006 performance tuningandclusteradmin

34

LOAD BALANCING - BALANCER (2/3)

balancer has an upper limit on how long it is allowed to run hbase.balancer.max.balancing property defaults to half of the balancer period value

2.5 mins

The balancer switch Toggle the balancer status between enabled and

disabled HBase shell

balance_switch command, refer to ppt#003, p#39 balanceSwitch() API method, refer to ppt#003, p#22

Page 35: 006 performance tuningandclusteradmin

35

LOAD BALANCING - BALANCER (3/3)

Can be explicitly started HBase shell

balancer command, refer to ppt#003, p#39 balancer() API method, refer to ppt#003, p#22 Return true

Any work has be done Return false

balancer was switched off No work to be done balancer was not able to run the balancer

There is a region currently in transition, the balancer will be skipped

Page 36: 006 performance tuningandclusteradmin

36

LOAD BALANCING - MOVE

Can also use the move To assign regions to other servers HBase shell

move command, refer to ppt#003, p#39 move() API method, refer to ppt#003, p#22

Page 37: 006 performance tuningandclusteradmin

37

MERGING REGIONS

Sometimes you may need to merge regions For example, after you have removed a large

amount of data and you want to reduce the number of regions hosted by each server

HBase allows you to merge two adjacent regions The HBase cluster must be offline, but HDFS/bin/hbase org.apache.hadoop.hbase.util.Merge

Usage: bin/hbase merge <table-name> <region-1> <region-2>

Page 38: 006 performance tuningandclusteradmin

38

CLIENT API: BEST PRACTICES (1/3)

Disable auto-flush When performing a lot of put operations Refer to ppt#002, p#9

Use scanner-caching Set Scan.setCaching() method to something greater

than the default of 1 if needed Refer to ppt#002, p#26

Limit scan scope If only a small number of the available columns are to

be processed, only those should be specified in the input scan

For example, use Scan.addFamily() method Refer to ppt#002, p#24

Page 39: 006 performance tuningandclusteradmin

39

CLIENT API: BEST PRACTICES (2/3)

Close ResultScanners Avoiding performance problems

This may cause problems on the region servers Refer to ppt#002, p#25

Block cache usage Scan instances can be set to use the block cache

in the region server via the setCacheBlocks() method

true by default, default settings of the table and family are used

API docs Server side block cache settings

Refer to ppt#003, p#12

Page 40: 006 performance tuningandclusteradmin

40

CLIENT API: BEST PRACTICES (3/3)

Optimal loading of row keys When performing a table scan where only the

row keys are needed a FilterList with a MUST_PASS_ALL operator +

FirstKeyOnlyFilter + KeyOnlyFilter Refer to ppt#002, p#43 & 46

Turn off WAL on Puts Increasing throughput on Puts is to call

writeToWAL(false), there might be data loss Consider to use the bulk loading techniques

instead

Page 41: 006 performance tuningandclusteradmin

41

CONFIGURATION (1/6)

Advanced options you can consider adjusting based on your use case Most properties are configured in hbase-site.xml Others are in hbase-env.sh

Decrease ZooKeeper timeout The default timeout between a region server and

the ZooKeeper quorum is three minutes Tune the timeout down to a minute, or even less,

so the master notices failures sooner zookeeper.session.timeout property Be careful of “Juliet Pause”

Page 42: 006 performance tuningandclusteradmin

42

CONFIGURATION (2/6)

Increase handlers The number of threads that are kept open to

answer incoming requests to user tables By default is 10 hbase.regionserver.handler.count property Keep this number low when the payload per

request approaches megabytes And high when the payload is small

Increase heap settings HBASE_HEAPSIZE setting in hbase-env.sh file Consider using HBASE_REGIONSERVER_OPTS

instead of changing the global HBASE_HEAP SIZE Region servers may need more memory than Master

Page 43: 006 performance tuningandclusteradmin

43

CONFIGURATION (3/6)

Enable data compression Should enable compression for the storage files

In most cases, boosts performance

Increase region size Consider going to larger regions to cut down on

the total number of regions on your cluster Fewer regions to manage makes for a smoother-

running cluster

Page 44: 006 performance tuningandclusteradmin

44

CONFIGURATION (4/6) Adjust block cache size

The amount of heap used for the block cache is specified as a percentage Defaults to 20%

perf.hfile.block.cache.size property It is good if you have mainly reading workloads

Adjust memstore limits Memstore heap usage

hbase.regionserver.global.memstore.upperLimit property Defaults to 40%

hbase.regionserver.global.memstore.lowerLimit property Defaults to 35% Control the amount of flushing that will take place once the server

is required to free heap space Mainly read-oriented workloads

Consider reducing both limits to make more room for the block cache

Handling many writes Increase the memstore limits to reduce the excessive amount of I/O

this causes

Page 45: 006 performance tuningandclusteradmin

45

CONFIGURATION (5/6) Increase blocking store files

The region servers block further updates from clients to give compactions time to reduce the number of files

Default is seven files hbase.hstore.blockingStoreFiles property

Increase block multiplier A safety latch that blocks any further updates from

clients when the memstores exceed the multiplier * flush size limit

hbase.hregion.memstore.block.multiplier property Default to 2 If you have enough memory, can increase this

value to handle spikes more gracefully Refer to ppt#003, p#8

Page 46: 006 performance tuningandclusteradmin

46

CONFIGURATION (6/6)

Decrease maximum logfiles How often flushes occur based on the number of

WAL files on disk Default is 32 hbase.regionserver.maxlogs property Can be high in a write-heavy use case Lower it to force the servers to flush data more

often to disk

Page 47: 006 performance tuningandclusteradmin

47

LOAD TESTS

It is advisable to run performance tests to verify functionality of your cluster

These tests give you a baseline which you can refer to After making changes to the configuration of the

cluster Or the schemas of your tables

Doing a burn-in of your cluster Show you how much you can gain from it But this does not replace a test with the load as

expected from your use case

Page 48: 006 performance tuningandclusteradmin

48

LOAD TESTS – PERFORMANCE EVALUATION (1/2)

HBase ships with its own tool to execute a performance evaluation Performance Evaluation (PE)

Wiki http

://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation

/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluationUsage: java org.apache.hadoop.hbase.PerformanceEvaluation \ [--miniCluster] [--nomapred] [--rows=ROWS] <command> <nclients>

Page 49: 006 performance tuningandclusteradmin

49

LOAD TESTS – PERFORMANCE EVALUATION (2/2)

Example

/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 111/07/03 13:18:34 INFO hbase.PerformanceEvaluation: Start class \ org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at \ offset 0 for 1048576 rows...11/07/03 13:18:41 INFO hbase.PerformanceEvaluation: 0/104857/1048576...11/07/03 13:18:45 INFO hbase.PerformanceEvaluation: 0/209714/1048576...11/07/03 13:20:03 INFO hbase.PerformanceEvaluation: 0/1048570/104857611/07/03 13:20:03 INFO hbase.PerformanceEvaluation: Finished class \ org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest \ in 89062ms at offset 0 for 1048576 rows

Page 50: 006 performance tuningandclusteradmin

50

LOAD TESTS – YCSB (1/2) Yahoo! Cloud Serving Benchmark* (YCSB)

It is a suite of tools that can be used to run comparable workloads against different storage systems

Also a reasonable tool for performing an HBase cluster burn-in—or performance test

Using YCSB is preferred over the HBase-supplied Performance Evaluation Offers more options Can combine read and write workloads

Home page http://research.yahoo.com/Web_Information_Management/YC

SB

Page 51: 006 performance tuningandclusteradmin

51

LOAD TESTS – YCSB (2/2)

Use HBase shell create “usertable”, “family”

git pull cd ${GIT_HOME}/hbase-training/006/ycsb Run command

Then you can see performance metrics in ycsb-laod.log file

java -cp "${HBASE_CONF_DIR}:core-0.1.4.jar:hbase-binding-0.1.4.jar" com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family -p recordcount=1000 -s > ycsb-load.log

Page 52: 006 performance tuningandclusteradmin

52

CLUSTER ADMINISTRATION

Operational TasksNode

DecommissionRolling RestartsAdding Backup

MasterAdding a Region

Server

Data Task Export Import CopyTable Tool Bulk Import

Troubleshooting HBase Fsck Analyzing the Logs

Page 53: 006 performance tuningandclusteradmin

53

OPERATIONAL TASKS – NODE DECOMMISSION (1/2) Use following script

In normal HBase distribution

In tm distribution

Disable the Load Balancer before Decommissioning a node In hbase shell

balance_switch false

Regions could be offline for a good period of time Many regions on the server All regions close The master notices the region server’s ZooKeeper

znode being removed

${HBASE_HOME}/bin/hbase-daemon.sh stop regionserver

${TM_PUPPET_HOME}/bin/services/shutdown-regionservers.sh [<host> ...]

Page 54: 006 performance tuningandclusteradmin

54

OPERATIONAL TASKS – NODE DECOMMISSION (2/2)

Stop a region server gradually A node to gradually shed its load and then shut

itself down From HBASE 0.90.2 ${HBASE_HOME}/bin/graceful_stop.sh

Example

Check the HOSTNAME on your HBase master UI Refer to ppt#003, p#41

IP address is NOT supported at present

${HBASE_HOME}/bin/graceful_stop.sh HOSTNAME

Page 55: 006 performance tuningandclusteradmin

55

OPERATIONAL TASKS – ROLLING RESTARTS Also use graceful_stop.sh Steps as follows

1. Ensure the cluster is consistent

Fix it if inconsistent

2. Restart the master

3. Disable the region balancer

4. Run the graceful_stop.sh script per region server

5. Restart the master again Clear out the dead servers list and reenable the

balancer

6. Run hbck to ensure the cluster is consistent

hbase hbck

hbase hbck -fix

${HBASE_HOME}/bin/hbase-daemon.sh stop master; \ ${HBASE_HOME}/bin/hbase-daemon.sh start master

echo "balance_switch false" | ${HBASE_HOME}/bin/hbase shell

for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh \ --restart --reload --debug $i; done &> /tmp/log.txt &

Page 56: 006 performance tuningandclusteradmin

56

OPERATIONAL TASKS – ADDING BACKUP MASTER (1/2) To prevent the Single Point of Failure

The machine currently hosting the active master is failing, the system can fall back to a backup master

Underlying operations1. A dedicated ZooKeeper znode /hbase/master2. All master processes will race to create, and the

first one to create it wins (become currently master)

It happens at startup

3. All other master processes simply loop around the znode check and wait for it to disappear

Triggering the race again

Page 57: 006 performance tuningandclusteradmin

57

OPERATIONAL TASKS – ADDING BACKUP MASTER (2/2)

How to start multiple backup master processes Use original way to start a master process

In tm distribution

Specifically start a backup master process

${HBASE_HOME}/bin/hbase-daemon.sh start master

${TM_PUPPET_HOME}/bin/services/startup-hmaster.sh [<host> ...]

${HBASE_HOME}/bin/hbase-daemon.sh start master --backup

Page 58: 006 performance tuningandclusteradmin

58

OPERATIONAL TASKS – ADDING A REGION SERVER In normal HBase distribution

Edit the ${HBASE_HOME}/conf/regionservers To add newly added region server’s host name

Two scripts can use… ${HBASE_HOME}/bin/start-hbase.sh

It will bypass the original existing region servers, and start the newly added region server referred to regionservers file

${HBASE_HOME}/bin/hbase-daemon.sh start regionserver Must executing on the newly added region server

In tm distribution New feature, not talk about this here

Page 59: 006 performance tuningandclusteradmin

59

DATA TASK

You may be required to move the data as a whole or in parts Archive data for backup purposes To bootstrap another cluster

hadoop jar ${HBASE_HOME}/hbase-0.91.0-SNAPSHOT.jarAn example program must be given as the first argument.Valid program names are: … completebulkload: Complete a bulk data load. copytable: Export a table from local cluster to peer cluster export: Write table data to HDFS. import: Import data written by Export. importtsv: Import data in TSV format. …http://hbase.apache.org/book/ops_mgt.html

Page 60: 006 performance tuningandclusteradmin

60

DATA TASK – EXPORT (1/3)

hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar exportUsage: Export [-D <property=value>]* <tablename> <outputdir> \ [<versions> [<starttime> [<endtime>]]

Page 61: 006 performance tuningandclusteradmin

61

DATA TASK - EXPORT (2/3)

hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar export \ testtable /user/larsgeorge/backup-testtable

11/06/25 15:58:29 INFO mapred.JobClient: Running job: job_201106251558_000111/06/25 15:58:30 INFO mapred.JobClient: map 0% reduce 0%…11/06/25 15:59:40 INFO mapred.JobClient: map 100% reduce 0%11/06/25 15:59:42 INFO mapred.JobClient: Job complete: job_201106251558_000111/06/25 15:59:42 INFO mapred.JobClient: Counters: 611/06/25 15:59:42 INFO mapred.JobClient: Job Counters 11/06/25 15:59:42 INFO mapred.JobClient: Rack-local map tasks=3211/06/25 15:59:42 INFO mapred.JobClient: Launched map tasks=3211/06/25 15:59:42 INFO mapred.JobClient: FileSystemCounters11/06/25 15:59:42 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=364811/06/25 15:59:42 INFO mapred.JobClient: Map-Reduce Framework11/06/25 15:59:42 INFO mapred.JobClient: Map input records=011/06/25 15:59:42 INFO mapred.JobClient: Spilled Records=011/06/25 15:59:42 INFO mapred.JobClient: Map output records=0

Page 62: 006 performance tuningandclusteradmin

62

DATA TASK - EXPORT (3/3)

Each part-m-nnnnn file contains a piece of the exported data Together they form the full backup of the table

Use the hadoop distcp command to move the directory from one cluster to another, and perform the import there

hadoop dfs -lsr /user/larsgeorge/backup-testtable drwxr-xr-x - ... 0 2011-06-25 15:58 _logs-rw-r--r-- 1 ... 114 2011-06-25 15:58 part-m-00000-rw-r--r-- 1 ... 114 2011-06-25 15:58 part-m-00001…-rw-r--r-- 1 ... 114 2011-06-25 15:59 part-m-00030-rw-r--r-- 1 ... 114 2011-06-25 15:59 part-m-00031

Page 63: 006 performance tuningandclusteradmin

63

DATA TASK – IMPORT (1/2)hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar importUsage: Import <tablename> <inputdir>

hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar import \ testtable /user/larsgeorge/backup-testtable11/06/25 17:09:48 INFO mapreduce.TableOutputFormat: Created table instance \ for testtable11/06/25 17:09:48 INFO input.FileInputFormat: Total input paths to process : 3211/06/25 17:09:49 INFO mapred.JobClient: Running job: job_201106251558_000311/06/25 17:09:50 INFO mapred.JobClient: map 0% reduce 0%11/06/25 17:10:04 INFO mapred.JobClient: map 6% reduce 0%…11/06/25 17:10:51 INFO mapred.JobClient: Job Counters 11/06/25 17:10:51 INFO mapred.JobClient: Launched map tasks=3211/06/25 17:10:51 INFO mapred.JobClient: Data-local map tasks=3211/06/25 17:10:51 INFO mapred.JobClient: FileSystemCounters11/06/25 17:10:51 INFO mapred.JobClient: HDFS_BYTES_READ=364811/06/25 17:10:51 INFO mapred.JobClient: Map-Reduce Framework11/06/25 17:10:51 INFO mapred.JobClient: Map input records=011/06/25 17:10:51 INFO mapred.JobClient: Spilled Records=011/06/25 17:10:51 INFO mapred.JobClient: Map output records=0

Page 64: 006 performance tuningandclusteradmin

64

DATA TASK - IMPORT (2/2)

Use the Import job to store the data in a different table With the same schema

Both export/import commend are per-table only

Use hadoop distcp command to copy the entire /hbase in HDFS Not recommended May copy store files that are halfway through a

memstore flush operation

Page 65: 006 performance tuningandclusteradmin

65

DATA TASK – COPYTABLE TOOL (1/2)

Designed to bootstrap cluster replication Make a copy of an existing table from the

master cluster to the slave cluster

hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar copytableUsage: CopyTable [--rs.class=CLASS] [--rs.impl=IMPL] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>

Page 66: 006 performance tuningandclusteradmin

66

DATA TASK – COPYTABLE TOOL (2/2)

The copy of the table is stored on the same cluster

hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar copytable \ --new.name=testtable3 testtable11/06/26 15:20:07 INFO mapreduce.TableOutputFormat: \Created table instance for testtable311/06/26 15:20:07 INFO mapred.JobClient: Running job: job_201106261454_000311/06/26 15:20:08 INFO mapred.JobClient: map 0% reduce 0%11/06/26 15:20:19 INFO mapred.JobClient: map 6% reduce 0%…11/06/26 15:21:04 INFO mapred.JobClient: map 100% reduce 0%11/06/26 15:21:06 INFO mapred.JobClient: Job complete: job_201106261454_000311/06/26 15:21:06 INFO mapred.JobClient: Counters: 511/06/26 15:21:06 INFO mapred.JobClient: Job Counters 11/06/26 15:21:06 INFO mapred.JobClient: Launched map tasks=3211/06/26 15:21:06 INFO mapred.JobClient: Data-local map tasks=3211/06/26 15:21:06 INFO mapred.JobClient: Map-Reduce Framework11/06/26 15:21:06 INFO mapred.JobClient: Map input records=011/06/26 15:21:06 INFO mapred.JobClient: Spilled Records=011/06/26 15:21:06 INFO mapred.JobClient: Map output records=0

Page 67: 006 performance tuningandclusteradmin

67

DATA TASK – BULK IMPORT (1/2)

Importtsv tool Given files containing data in tab-separated

value (TSV) format By default , it uses the HBase put() API to insert

data into HBase one row at a time By setting importtsv.bulk.output option, generate

files using HFileOutputFormat These can subsequently be bulk-loaded into HBase by

completebulkload Tool

hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar importtsvUsage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir>

Page 68: 006 performance tuningandclusteradmin

68

DATA TASK – BULK IMPORT (2/2)

completebulkload Tool Is used to import the data into the running

cluster After a data import has been prepared

By using the importtsv tool with the importtsv.bulk.output option

By some other MapReduce job using the HFileOutputFormat

hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar completebulkload \-conf ~/my-hbase-site.xml /user/larsgeorge/myoutput mytable

Page 69: 006 performance tuningandclusteradmin

69

TROUBLESHOOTING – HBASE FSCK (1/4) Shell Command

${HBASE_HOME}/bin/hbase hbck Once started

Scans the .META. table to gather all the pertinent information it holds

Scans the HDFS root directory HBase is configured to use Compare the collected details to report on inconsistencies

and integrity issues Consistency check

Whether the region is listed in .META. and exists in HDFS Is also assigned to exactly one region server

Integrity check Compares the regions with the table details to find

missing regions Those that have holes or overlaps in their row key ranges

Page 70: 006 performance tuningandclusteradmin

70

TROUBLESHOOTING – HBASE FSCK (2/4)

${HBASE_HOME}/bin/hbase hbck -hUsage: fsck [opts] where [opts] are: -details Display full report of all regions. -timelag {timeInSeconds} Process only regions that have not experienced any metadata updates in the last {{timeInSeconds} seconds. -fix Try to fix some of the errors. -sleepBeforeRerun {timeInSeconds} Sleep this many seconds before checking if the fix worked if run with -fix -summary Print only summary of the tables and status.

Page 71: 006 performance tuningandclusteradmin

71

TROUBLESHOOTING – HBASE FSCK (3/4)

No option at all invokes the normal output detail

${HBASE_HOME}/bin/hbase hbckNumber of Tables: 40Number of live region servers: 19Number of dead region servers: 0Number of empty REGIONINFO_QUALIFIER rows in .META.: 0Summary:... testtable2 is okay. Number of regions: 1 Deployed on: host11.foo.com:600200 inconsistencies detected.Status: OK

Page 72: 006 performance tuningandclusteradmin

72

TROUBLESHOOTING – HBASE FSCK (4/4) ${HBASE_HOME}/bin/hbase hbck -fix Repairs following issues

Assign .META. to a single new server if it is unassigned

Reassign .META. to a single new server if it is assigned to multiple servers

Assign a user table region to a new server if it is unassigned

Reassign a user table region to a single new server if it is assigned to multiple servers

Reassign a user table region to a new server if the current server does not match

what the .META. table refers to hbck reports inconsistencies which are temporal,

or transitional only Rerun the tool a few times to confirm a permanent

problem

Page 73: 006 performance tuningandclusteradmin

73

TROUBLESHOOTING – ANALYZING THE LOGS (1/2)

Server type Default Logfile tm settings

HBase Master$HBASE_HOME/logs/hbase-<user>-master-<hostname>.log

/var/log/hbase/hbase-<user>-master-<hostname>.log

HBase RegionServer

$HBASE_HOME/logs/hbase-<user>-regionserver-<hostname>.log

/var/log/hbase/hbase-<user>-regionserver-<hostname>.log

ZooKeeper Console log output only/var/log/hbase/hbase-<user>-zookeeper-<hostname>.log

NameNode$HADOOP_HOME/logs/hadoop-<user>-namenode-<hostname>.log

/var/log/hadoop/hadoop-<user>-namenode-<hostname>.log

DataNode$HADOOP_HOME/logs/hadoop-<user>-datanode-<hostname>.log

/var/log/hadoop/hadoop-<user>-datanode-<hostname>.log

JobTracker$HADOOP_HOME/logs/hadoop-<user>-jobtracker-<hostname>.log

/var/log/hadoop/hadoop-<user>-jobtracker-<hostname>.log

TaskTracker$HADOOP_HOME/logs/hadoop-<user>-jobtracker-<hostname>.log

/var/log/hadoop/hadoop-<user>-jobtracker-<hostname>.log

Page 74: 006 performance tuningandclusteradmin

74

TROUBLESHOOTING – ANALYZING THE LOGS (2/2)

Is useful to begin with the master logfile first It acts as the coordinator service of the entire cluster

Find the processes began logging ERROR level messages Be able to identify the root cause A lot of subsequent messages are often side-effect of

the original problem Recommend to use the error log event metric

under System Event Metrics group Gives you a graph showing you where the server(s)

started logging an increasing number of error messages in the logfiles

If find an error message Google it !! Use the online resources to search for the message

in the public mailing lists Search Hadoop

Page 75: 006 performance tuningandclusteradmin

75

HANDS-ON – USE YCSB New VM list

Due to VMs are not affordable at present :p ${YOUR_HOME}=${GIT_HOME}/hbase-

training/006/hands-on/${YOUR_NAME} mkdir ${YOUR_HOME} cd ${YOUR_HOME}; cp -rf ../../ycsb/* . Use HBase shell

create <YOUR_NAMED_TABLE>, “family” Run YCSB with 5000 record count

And ouput ycsb-load.log file Hands-on result

Put the ycsb-load.log file under ${YOUR_HOME}