Intro to HBase Internals & Schema Design (for HBase users)
-
Upload
alexbaranau -
Category
Technology
-
view
20.037 -
download
3
description
Transcript of Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals &
Schema DesignAlex Baranau, Sematext International, 2012
(for HBase Users)
Monday, July 9, 12
About Me
Software Engineer at Sematext International
http://blog.sematext.com/author/abaranau
@abaranau
http://github.com/sematext (abaranau)
Monday, July 9, 12
Agenda
Logical view
Physical view
Schema design
Other/Advanced topics
Monday, July 9, 12
Why?Why should I (HBase user) care about HBase internals?
HBase will not adjust cluster settings to optimal based on usage patterns automatically
Schema design, table settings (defined upon creation), etc. depend on HBase implementation aspects
Monday, July 9, 12
Logical View
Monday, July 9, 12
Logical View: RegionsHBase cluster serves multiple tables, distinguished by name
Each table contains of rows
Each row contains cells:(row key, column family, column, timestamp) -> value
Table is split into Regions (table shards, each contains full rows), defined by start and end row keys
Monday, July 9, 12
Logical View: Regions are Shards
Regions are “atoms of distribution”
Each region assigned to single RegionServer (HBase cluster slave)
Rows of particular Region served by single RS (cluster slave)
Regions are distributed evenly across RSs
Region has configurable max size
When region reaches max size (or on request) it is split into two smaller regions, which can be assigned to different RSs
Monday, July 9, 12
Logical View: Regions on Cluster
HMaster
ZooKeeper
RegionServer
ZooKeeperZooKeeper
RegionServer
HMaster
RegionServerRegion Region
RegionServerRegion Region
RegionServerRegion Region
client
Monday, July 9, 12
Logical View: Regions Load
It is essential for Regions under the load to be evenly distributed across the cluster
It is HBase user’s job to make sure the above is true. Note: even distribution of Regions over cluster doesn’t imply that the load is evenly distributed
Monday, July 9, 12
Logical View: Regions Load
Take into account that rows are stored in ordered manner
Make sure you don’t write rows with sequential keys to avoid RS hotspotting*
When writing data with monotonically increasing/decreasing keys, data is written at one RS at a time
Use pre-splitting of the table upon creation
Starting with single region means using one RS for some time
In general, splitting can be expensive
Increase max region size
* see https://github.com/sematext/HBaseWD
Monday, July 9, 12
Logical View: Slow RSs
When load is distributed evenly, watch for slowest RSs (HBase slaves)
Since every region served by single RS, one slow RS can slow down cluster performance e.g. when:
data is written into multiple RSs at even pace (random value-based row keys)
data is being read from many RSs when doing scan
Monday, July 9, 12
Physical View
Monday, July 9, 12
Physical View: Write/Read Flow
z
RegionServer
RegionStore(per CF)MemStore
HFile HFile ...
...
clientHTablebuffer
HDFSWrite Ahead Log
clientHTable
Region
MemStore
HFile HFile
write read
...Store(per CF)
flush
Monday, July 9, 12
Physical: Speed up Writing
Enabling & increasing client-side buffer reduces RPC operations amount
warn: possible loss of buffered data
in case of client failure; design for failover
in case of write failure (networking/server-side issues); can be handled on client
Disabling WAL increases write speed
warn: possible data loss in case of RS failure
Use bulk import functionality (writes HFiles directly, which can be later added to HBase)
Monday, July 9, 12
Physical: Memstore FlushesWhen memstore is flushed N HFiles are created (one per CF)
Memstore size which causes flushing is configured on two levels:
per RS: % of heap occupied by memstores
per table: size in MB of single memstore (per CF) of Region
When Region memstores flushes, memstores of all CFs are flushed
Uneven data amount between CFs causes too many flushes & creation of too many HFiles (one per CF every time)
In most cases having one CF is the best design
Monday, July 9, 12
Physical: Memstore Flushes
Important: there are Memstore size thresholds which cause writes to be blocked, so slow memstore flushes and overuse of memory by memstore can cause write perf degradation
Hint: watch for flush queue size metric on RSs
At the same time the more memory memstore uses the better for writing/reading perf (unless it reaches those “write blocking” thresholds)
Monday, July 9, 12
Physical: Memstore Flushes
Example of good situation
* http://sematext.com/spm/index.html
*
Monday, July 9, 12
Physical: HFiles CompactionHFiles are periodically compacted into bigger HFiles containing same data
Reading from less HFiles faster
Important: there’s a configured max number of files in Store which, when reached causes writes to block
Hint: watch for compaction queue size metric on RSs
MemStore
HFile HFile
Store(per CF)
read
Monday, July 9, 12
Physical: Data LocalityRSs are usually collocated with HDFS DataNodes
DataNode DataNode
RegionServer RegionServer
Task
Trac
ker
Task
Trac
ker
Slave Node Slave Node
HDFS
MapReduce
HBase
Monday, July 9, 12
Physical: Data LocalityHBase tries to assign Regions to RSs so that Region data stored physically on the same node. But sometimes fails
after Region splits there’s no guarantee that there’s a node that has all blocks (HDFS level) of new Region and
no guarantee that HBase will not re-assign this Region to different RS in future (even distribution of Regions takes preference over data locality)
There’s an ongoing work towards better preserving data locality
Monday, July 9, 12
Physical: Data LocalityAlso, data locality can break when:
Adding new slaves to cluster
Removing slaves from cluster
Incl. node failures
Hint: look at networking IO between slaves when writing/reading data, it should be minimal
Important:
make sure HDFS is well balanced (use balancer tool)
try to rebalance Regions in HBase cluster if possible (HBase Master restart will do that) to regain data locality
Pre-split table on creation to limit (ideally avoid) splits and regions movement; manage splits manually sometimes helps
Monday, July 9, 12
Schema Design(very briefly)
Monday, July 9, 12
Schema: row keysUsing row key (or keys range) is the most efficient way to retrieve the data from HBase
Row key design is major part of schema design
Note: no secondary indices available out of the box
Row Key Data‘login_2012-03-01.00:09:17’ d:{‘user’:‘alex’}
... ...‘login_2012-03-01.23:59:35’ d:{‘user’:‘otis’}‘login_2012-03-02.00:00:21’ d:{‘user’:‘david’}
Monday, July 9, 12
Schema: row keysRedundancy is OK!
warn: changing two rows in HBase is not atomic operation
Row Key Data‘login_2010-01-01.00:09:17’ d:{‘user’:‘alex’}
... ...‘login_2012-03-01.23:59:35’ d:{‘user’:‘otis’}‘alex_2010-01-01.00:09:17’ d:{‘action’:‘login’}
... ...‘otis_2012-03-01.23:59:35’ d:{‘action’:‘login’}‘alex_login_2010-01-01.00:09:17’ d:{‘device’:’pc’}
... ...‘otis_login_2012-03-01.23:59:35’ d:{‘device’:‘mobile’}
Monday, July 9, 12
Schema: RelationsNot relational
No joins
Denormalization is OK! Use ‘nested entities’
Row Key Data
‘student_abaranau’
d:{student_firstname:Alex,student_lastname:Baranau,
professor_math_firstname:David, professor_math_lastname:Smart,
professor_cs_firstname:Jack, professor_cs_lastname:Weird,
}
‘prof_dsmart’ d:{...}
student
professor
**
Monday, July 9, 12
Schema: row key/CF/qual size
HBase stores cells individually
great for “sparse” data
row key, CF name and column name stored with each cell which may affect data amount to be stored and managed
keep them short
serialize and store many values into single cell
Row Key Data
‘s_abaranau’
d:{s:Alex#Baranau#cs#2009,p_math:David#Smart,p_cs:Jack#Weird,}
Monday, July 9, 12
Other/Advanced Topics
Monday, July 9, 12
Advanced: Co-ProcessorsCoProcessors API (HBase 0.92.0+) allows to:
execute (querying/aggregation/etc.) logic on server side (you may think of it as of stored procedures in RDBMS)
perform auditing of actions performed on server-side (you may think of it as of triggers in RDBMS)
apply security rules for data access
and many more cool stuff
Monday, July 9, 12
Other: Use CompressionUsing compression:
reduces data amount to be stored on disks
reduces data amount to be transferred when RS reading data not from local replica
increases amount of CPU used, but CPU isn’t usually a bottleneck
Favor compression speed over compression ratio
SNAPPY is good
Use wisely:
e.g. avoid wasting CPU cycles on compressing images
compression can be configured on per CF basis, so storing non-compressible data in separate CF sometimes helps
data blocks are uncompressed in memory, avoid this to cause OOME
note: when scanning (seeking data to return for scan) many data blocks can be uncompressed even if none of the data will be returned from those block
Monday, July 9, 12
Other: Use Monitoring
TBD
Ganglia, Cacti, other*, Just use it!
* http://sematext.com/spm/index.html
Monday, July 9, 12
Qs?
Sematext is hiring!Monday, July 9, 12