HBase Client APIs (for webapps?)

35
HBase Client API (for webapps?) Nick Dimiduk Seattle Scalability Meetup 2013-03-27 1

description

This talk examines HBase client options available to application developers working with HBase. The focus is framed on, but not limited to, building webapps.

Transcript of HBase Client APIs (for webapps?)

Page 1: HBase Client APIs (for webapps?)

HBase Client API(for webapps?)

Nick DimidukSeattle Scalability Meetup

2013-03-27

1

Page 2: HBase Client APIs (for webapps?)

2

Page 3: HBase Client APIs (for webapps?)

3

Page 4: HBase Client APIs (for webapps?)

What are my choices?switch (technology) {

case ‘ ’: ...

case ‘ ’: ...

case ‘ ’: ...}

4

Page 5: HBase Client APIs (for webapps?)

Apache HBase

5

Page 6: HBase Client APIs (for webapps?)

Java client Interfaces

• Configuration holds details where to find the cluster and tunable settings. Roughly equivalent to JDBC connection string.

• HConnection represents connections to to the cluster.

• HBaseAdmin handles DDL operations (create, list, drop, alter, &c.)

• HTablePool connection pool for table handles.

• HTable (HTableInterface) is a handle on a single HBase table. Send "commands" to the table (Put, Get, Scan, Delete, Increment)

6

Page 7: HBase Client APIs (for webapps?)

Java client Example

public static final byte[] TABLE_NAME = Bytes.toBytes("twits");public static final byte[] TWITS_FAM = Bytes.toBytes("twits");

public static final byte[] USER_COL = Bytes.toBytes("user");public static final byte[] TWIT_COL = Bytes.toBytes("twit");

private HTablePool pool = new HTablePool();

https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L23-L30

7

Page 8: HBase Client APIs (for webapps?)

Java client Exampleprivate static class Twit {

private Twit(Result r) { this( r.getColumnLatest(TWITS_FAM, USER_COL).getValue(), Arrays.copyOfRange(r.getRow(), Md5Utils.MD5_LENGTH, Md5Utils.MD5_LENGTH + longLength), r.getColumnLatest(TWITS_FAM, TWIT_COL).getValue()); }

private Twit(byte[] user, byte[] dt, byte[] text) { this( Bytes.toString(user), new DateTime(-1 * Bytes.toLong(dt)), Bytes.toString(text)); }

https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L129-L143

8

Page 9: HBase Client APIs (for webapps?)

Java client Example

private static Get mkGet(String user, DateTime dt) { Get g = new Get(mkRowKey(user, dt)); g.addColumn(TWITS_FAM, USER_COL); g.addColumn(TWITS_FAM, TWIT_COL); return g;}

https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L60-L65

9

Page 10: HBase Client APIs (for webapps?)

Ruby, Python client Interface

10

Page 11: HBase Client APIs (for webapps?)

Ruby, Python client InterfaceJRuby, Jython

: '(

11

Page 12: HBase Client APIs (for webapps?)

Thrift client Interface

1. Generate bindings

2. Run a “Gateway” between clients and cluster

3. ... profit?write code!

12

Page 13: HBase Client APIs (for webapps?)

Sidebar: Architecture Recap

HBase Cluster

HBase Clients

13

Page 14: HBase Client APIs (for webapps?)

Thrift Architecture

HBase Cluster

Thrift Clients

ThriftGateway

14

Page 15: HBase Client APIs (for webapps?)

Thrift client Interface

• Thrift gateway exposes a client to RegionServers

• stateless :D

• ... except for scanners :'(

15

Page 16: HBase Client APIs (for webapps?)

Thrift client Example

transport = TSocket.TSocket(host, port)transport = TTransport.TBufferedTransport(transport)protocol = TBinaryProtocol.TBinaryProtocol(transport)client = Hbase.Client(protocol)transport.open()

https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L17-L21

16

Page 17: HBase Client APIs (for webapps?)

Thrift client Example

columns = ['info:user','info:name','info:email']scanner = client.scannerOpen('users', '', columns)row = client.scannerGet(scanner)while row: yield user_from_row(row[0]) row = scannerGet(scanner)client.scannerClose(scanner)

https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L33-L39

17

Page 18: HBase Client APIs (for webapps?)

Thrift client Example

def user_from_row(row): user = {} for col,cell in row.columns.items(): user[col[5:]] = cell.value return "<User: {user}, {name}, {email}>".format(**user)

https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L26-L30

18

Page 19: HBase Client APIs (for webapps?)

REST client Interface

1. Stand up a "REST Gateway" between your application and the cluster

2. HTTP verbs translate (roughly) into table commands

3. decent support for basic DDL, HTable operations

19

Page 20: HBase Client APIs (for webapps?)

REST Architecture

HBase Cluster

RESTGatewayREST Clients

20

Page 21: HBase Client APIs (for webapps?)

REST client Interface

• REST gateway exposes a client to RegionServers

• stateless :D

• ... except for scanners :'(

21

Page 22: HBase Client APIs (for webapps?)

REST client Example

$ curl -H "Accept: application/json" http://host:port/{ "table": [ { "name": "followers" }, { "name": "twits" }, { "name": "users" } ]}

22

Page 23: HBase Client APIs (for webapps?)

REST client Example$ curl -H ... http://host:port/table/row [/family:qualifier]{ "Row": [ { "key": "VGhlUmVhbE1U", "Cell": [ { "$": "c2FtdWVsQGNsZW1lbnMub3Jn", "column": "aW5mbzplbWFpbA==", "timestamp": 1338701491422 }, { "$": "TWFyayBUd2Fpbg==", "column": "aW5mbzpuYW1l", "timestamp": 1338701491422 }, ] } ] }

23

Page 24: HBase Client APIs (for webapps?)

REST client Example

<Rows> <Row key="VGhlUmVhbE1U"> <Cells> <Cell column="aW5mbzplbWFpbA==" timestamp="1338701491422"> c2FtdWVsQGNsZW1lbnMub3Jn </Cell> <Cell ...> ... </Cells> </Row></Rows>

24

Page 25: HBase Client APIs (for webapps?)

Beyond Apache

25

Page 26: HBase Client APIs (for webapps?)

asynchbase

• Asynchronous non-blocking interface.

• Inspired by Twisted Python.

• Partial implementation of HTableInterface.

• HBaseClient provides entry-point to data.

https://github.com/OpenTSDB/asynchbasehttp://tsunanet.net/~tsuna/asynchbase/api/org/hbase/async/HBaseClient.html

26

Page 27: HBase Client APIs (for webapps?)

asynchbase

output to => [next state] /input => [this state] \ => [error state] Exception

BooleanPut response

Interpret response

3

UpdateResultobject

UpdateFailedException

27

Page 28: HBase Client APIs (for webapps?)

asynchbase Examplefinal Scanner scanner = client.newScanner(TABLE_NAME);scanner.setFamily(INFO_FAM);scanner.setQualifier(PASSWORD_COL);

ArrayList<ArrayList<KeyValue>> rows = null;ArrayList<Deferred<Boolean>> workers = new ArrayList<Deferred<Boolean>>();while ((rows = scanner.nextRows(1).joinUninterruptibly()) != null) { for (ArrayList<KeyValue> row : rows) { KeyValue kv = row.get(0); byte[] expected = kv.value(); String userId = new String(kv.key()); PutRequest put = new PutRequest( TABLE_NAME, kv.key(), kv.family(), kv.qualifier(), mkNewPassword(expected)); Deferred<Boolean> d = client.compareAndSet(put, expected) .addCallback(new InterpretResponse(userId)) .addCallbacks(new ResultToMessage(), new FailureToMessage()) .addCallback(new SendMessage()); workers.add(d); }}

https://github.com/hbaseinaction/twitbase-async/blob/master/src/main/java/HBaseIA/TwitBase/AsyncUsersTool.java#L151-L173

28

Page 29: HBase Client APIs (for webapps?)

OthersFull-blown schema

managementReduce day-to-day

developer pain

Spring-DataHadoop

[Orderly]

Phoenix

Kiji.org

https://github.com/ndimiduk/orderlyhttp://www.springsource.org/spring-data/https://github.com/forcedotcom/phoenix

http://www.kiji.org/29

Page 30: HBase Client APIs (for webapps?)

Apache Futures

• Protobuf wire messages (0.96)

• C client (TBD, HBASE-1015)

• HBase Types (TBD, HBASE-8089)

30

Page 31: HBase Client APIs (for webapps?)

So, Webapps?

http://www.amazon.com/Back-Point-Rapiers/dp/B0000271GC

31

Page 32: HBase Client APIs (for webapps?)

Software Architecture

• Isolate DAO from app logic, separation of concerns, &c.

• Separate environment configs from code.

• Watch out for resource contention.

32

Page 33: HBase Client APIs (for webapps?)

Deployment Architecture

• Cache everywhere.

• Know your component layers.

33

Page 34: HBase Client APIs (for webapps?)

HBase Warts

• Know thy (HBase) version 0.{92,94,96} !

• long-running client bug (HBASE-4805).

• Gateway APIs only as up to date as the people before you require.

• REST API particularly unpleasant for “Web2.0” folk.

34

Page 35: HBase Client APIs (for webapps?)

Thanks!

Nick Dimiduk github.com/ndimiduk @xefyr n10k.com

M A N N I N G

Nick Dimiduk Amandeep Khurana

FOREWORD BY Michael Stack

hbaseinaction.com

35