HBase Lightning Talk
-
Upload
scott-leberknight -
Category
Technology
-
view
3.863 -
download
0
description
Transcript of HBase Lightning Talk
Scott Leberknight
HBASEA P A C H E
BACKGROUND
Bigtable
"Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable including web indexing, Google Earth, and Google Finance."
- Bigtable: A Distributed Storage System for Structured Data
http://labs.google.com/papers/bigtable.html
"A Bigtable is a sparse, distributed, persistent
multidimensional sorted map"
- Bigtable: A Distributed Storage System for Structured Data
http://labs.google.com/papers/bigtable.html
wtf?
distributed
sparse
column-oriented
versioned
(row key, column key, timestamp) => value
The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.
- Bigtable: A Distributed Storage Systemfor Structured Data
http://labs.google.com/papers/bigtable.html
row key => 20120407152657
column family => "personal:"
column key => "personal:givenName", "personal:surname"
timestamp => 1239124584398
Key Concepts:
Row Key Timestamp Column Family "info:"Column Family "info:" ColumN Family "content:"
20120407145045 t7 "info:summary" "An intro to..."20120407145045
t6 "info:author" "John Doe"
20120407145045
t5 "Google's Bigtable is..."
20120407145045
t4 "Google Bigtable is..."
20120407145045
t3 "info:category" "Persistence"
20120407145045
t2 "info:author" "John"
20120407145045
t1 "info:title" "Intro to Bigtable"
20120320162535 t4 "info:category" "Persistence"20120320162535
t3 "CouchDB is..."
20120320162535
t2 "info:author" "Bob Smith"
20120320162535
t1 "info:title" "Doc-oriented..."
Row Key Timestamp Column Family "info:"Column Family "info:" Column Family "content:"
20120407145045 t7 "info:summary" "An intro to..."20120407145045
t6 "info:author" "John Doe"
20120407145045
t5 "Google's Bigtable is..."
20120407145045
t4 "Google Bigtable is..."
20120407145045
t3 "info:category" "Persistence"
20120407145045
t2 "info:author" "John"
20120407145045
t1 "info:title" "Intro to Bigtable"
20120320162535 t4 "info:category" "Persistence"20120320162535
t3 "CouchDB is..."
20120320162535
t2 "info:author" "Bob Smith"
20120320162535
t1 "info:title" "Doc-oriented..."
Get row 20120407145045...
Use HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable.
- http://hbase.apache.org/
HBase Shell
hbase(main):001:0> create 'blog', 'info', 'content'0 row(s) in 4.3640 secondshbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document-oriented storage using CouchDB'0 row(s) in 0.0330 secondshbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith'0 row(s) in 0.0030 secondshbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a document-oriented...'0 row(s) in 0.0030 secondshbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence'0 row(s) in 0.0030 secondshbase(main):006:0> get 'blog', '20120320162535'COLUMN CELL content: timestamp=1239135042862, value=CouchDB is a doc... info:author timestamp=1239135042755, value=Bob Smith info:category timestamp=1239135042982, value=Persistence info:title timestamp=1239135042623, value=Document-oriented... 4 row(s) in 0.0140 seconds
hbase(main):015:0> get 'blog', '20120407145045', {COLUMN=>'info:author', VERSIONS=>3 }timestamp=1239135325074, value=John Doe timestamp=1239135324741, value=John 2 row(s) in 0.0060 secondshbase(main):016:0> scan 'blog', { STARTROW => '20120300', STOPROW => '20120400' }ROW COLUMN+CELL 20120320162535 column=content:, timestamp=1239135042862, value=CouchDB is... 20120320162535 column=info:author, timestamp=1239135042755, value=Bob Smith 20120320162535 column=info:category, timestamp=1239135042982, value=Persistence 20120320162535 column=info:title, timestamp=1239135042623, value=Document... 4 row(s) in 0.0230 seconds
HBase Shell
Got byte[]?
// Create a new tableConfiguration conf = HBaseConfiguration.create();HBaseAdmin admin = new HBaseAdmin(conf);
String tableName = "people";HTableDescriptor desc = new HTableDescriptor(tableName);desc.addFamily(new HColumnDescriptor("personal"));desc.addFamily(new HColumnDescriptor("contactinfo"));desc.addFamily(new HColumnDescriptor("creditcard"));admin.createTable(desc);
System.out.printf("%s is available? %b\n", tableName, admin.isTableAvailable(tableName));
import static org.apache.hadoop.hbase.util.Bytes.toBytes;
// Add some data into 'people' tableConfiguration conf = HBaseConfiguration.create();Put put = new Put(toBytes("connor-john-m-43299"));put.add(toBytes("personal"), toBytes("givenName"), toBytes("John"));put.add(toBytes("personal"), toBytes("mi"), toBytes("M"));put.add(toBytes("personal"), toBytes("surname"), toBytes("Connor"));put.add(toBytes("contactinfo"), toBytes("email"), toBytes("[email protected]"));table.put(put);table.flushCommits();table.close();
Finding data:
get (by row key)
scan (by row key ranges, filtering)
// Get a row. Ask for only the data you need.Configuration conf = HBaseConfiguration.create();HTable table = new HTable(conf, "people");Get get = new Get(toBytes("connor-john-m-43299"));get.setMaxVersions(2);get.addFamily(toBytes("personal"));get.addColumn(toBytes("contactinfo"), toBytes("email"));Result result = table.get(get);
// Update existing values, and add a new oneConfiguration conf = HBaseConfiguration.create();HTable table = new HTable(conf, "people");Put put = new Put(toBytes("connor-john-m-43299"));put.add(toBytes("personal"), toBytes("surname"), toBytes("Smith"));put.add(toBytes("contactinfo"), toBytes("email"), toBytes("[email protected]"));put.add(toBytes("contactinfo"), toBytes("address"), toBytes("San Diego, CA"));table.put(put);table.flushCommits();table.close();
// Scan rows...Configuration conf = HBaseConfiguration.create();HTable table = new HTable(conf, "people");Scan scan = new Scan(toBytes("smith-"));scan.addColumn(toBytes("personal"), toBytes("givenName"));scan.addColumn(toBytes("contactinfo", toBytes("email"));scan.addColumn(toBytes("contactinfo", toBytes("address"));scan.setFilter(new PageFilter(numRowsPerPage));ResultScanner sacnner = table.getScanner(scan);for (Result result : scanner) { // process result...}
DAta Modeling
Row key design
MATCH TO DATA ACCESS PATTERNS
WIDE VS. NARROW ROWS
REferences
shop.oreilly.com/product/0636920014348.do
http://shop.oreilly.com/product/0636920021773.do
(3rd edition pub date is May 29, 2012)
hbase.apache.org
scott.leberknight at nearinfinity.comwww.nearinfinity.com/blogs/twitter: sleberknight
(my info)