HBase Client APIs (for webapps?)

Post on 09-May-2015

5.646 views 9 download

description

This talk examines HBase client options available to application developers working with HBase. The focus is framed on, but not limited to, building webapps.

Transcript of HBase Client APIs (for webapps?)

HBase Client API(for webapps?)

Nick DimidukSeattle Scalability Meetup

2013-03-27

1

2

3

What are my choices?switch (technology) {

case ‘ ’: ...

case ‘ ’: ...

case ‘ ’: ...}

4

Apache HBase

5

Java client Interfaces

• Configuration holds details where to find the cluster and tunable settings. Roughly equivalent to JDBC connection string.

• HConnection represents connections to to the cluster.

• HBaseAdmin handles DDL operations (create, list, drop, alter, &c.)

• HTablePool connection pool for table handles.

• HTable (HTableInterface) is a handle on a single HBase table. Send "commands" to the table (Put, Get, Scan, Delete, Increment)

6

Java client Example

public static final byte[] TABLE_NAME = Bytes.toBytes("twits");public static final byte[] TWITS_FAM = Bytes.toBytes("twits");

public static final byte[] USER_COL = Bytes.toBytes("user");public static final byte[] TWIT_COL = Bytes.toBytes("twit");

private HTablePool pool = new HTablePool();

https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L23-L30

7

Java client Exampleprivate static class Twit {

private Twit(Result r) { this( r.getColumnLatest(TWITS_FAM, USER_COL).getValue(), Arrays.copyOfRange(r.getRow(), Md5Utils.MD5_LENGTH, Md5Utils.MD5_LENGTH + longLength), r.getColumnLatest(TWITS_FAM, TWIT_COL).getValue()); }

private Twit(byte[] user, byte[] dt, byte[] text) { this( Bytes.toString(user), new DateTime(-1 * Bytes.toLong(dt)), Bytes.toString(text)); }

https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L129-L143

8

Java client Example

private static Get mkGet(String user, DateTime dt) { Get g = new Get(mkRowKey(user, dt)); g.addColumn(TWITS_FAM, USER_COL); g.addColumn(TWITS_FAM, TWIT_COL); return g;}

https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L60-L65

9

Ruby, Python client Interface

10

Ruby, Python client InterfaceJRuby, Jython

: '(

11

Thrift client Interface

1. Generate bindings

2. Run a “Gateway” between clients and cluster

3. ... profit?write code!

12

Sidebar: Architecture Recap

HBase Cluster

HBase Clients

13

Thrift Architecture

HBase Cluster

Thrift Clients

ThriftGateway

14

Thrift client Interface

• Thrift gateway exposes a client to RegionServers

• stateless :D

• ... except for scanners :'(

15

Thrift client Example

transport = TSocket.TSocket(host, port)transport = TTransport.TBufferedTransport(transport)protocol = TBinaryProtocol.TBinaryProtocol(transport)client = Hbase.Client(protocol)transport.open()

https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L17-L21

16

Thrift client Example

columns = ['info:user','info:name','info:email']scanner = client.scannerOpen('users', '', columns)row = client.scannerGet(scanner)while row: yield user_from_row(row[0]) row = scannerGet(scanner)client.scannerClose(scanner)

https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L33-L39

17

Thrift client Example

def user_from_row(row): user = {} for col,cell in row.columns.items(): user[col[5:]] = cell.value return "<User: {user}, {name}, {email}>".format(**user)

https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L26-L30

18

REST client Interface

1. Stand up a "REST Gateway" between your application and the cluster

2. HTTP verbs translate (roughly) into table commands

3. decent support for basic DDL, HTable operations

19

REST Architecture

HBase Cluster

RESTGatewayREST Clients

20

REST client Interface

• REST gateway exposes a client to RegionServers

• stateless :D

• ... except for scanners :'(

21

REST client Example

$ curl -H "Accept: application/json" http://host:port/{ "table": [ { "name": "followers" }, { "name": "twits" }, { "name": "users" } ]}

22

REST client Example$ curl -H ... http://host:port/table/row [/family:qualifier]{ "Row": [ { "key": "VGhlUmVhbE1U", "Cell": [ { "$": "c2FtdWVsQGNsZW1lbnMub3Jn", "column": "aW5mbzplbWFpbA==", "timestamp": 1338701491422 }, { "$": "TWFyayBUd2Fpbg==", "column": "aW5mbzpuYW1l", "timestamp": 1338701491422 }, ] } ] }

23

REST client Example

<Rows> <Row key="VGhlUmVhbE1U"> <Cells> <Cell column="aW5mbzplbWFpbA==" timestamp="1338701491422"> c2FtdWVsQGNsZW1lbnMub3Jn </Cell> <Cell ...> ... </Cells> </Row></Rows>

24

Beyond Apache

25

asynchbase

• Asynchronous non-blocking interface.

• Inspired by Twisted Python.

• Partial implementation of HTableInterface.

• HBaseClient provides entry-point to data.

https://github.com/OpenTSDB/asynchbasehttp://tsunanet.net/~tsuna/asynchbase/api/org/hbase/async/HBaseClient.html

26

asynchbase

output to => [next state] /input => [this state] \ => [error state] Exception

BooleanPut response

Interpret response

3

UpdateResultobject

UpdateFailedException

27

asynchbase Examplefinal Scanner scanner = client.newScanner(TABLE_NAME);scanner.setFamily(INFO_FAM);scanner.setQualifier(PASSWORD_COL);

ArrayList<ArrayList<KeyValue>> rows = null;ArrayList<Deferred<Boolean>> workers = new ArrayList<Deferred<Boolean>>();while ((rows = scanner.nextRows(1).joinUninterruptibly()) != null) { for (ArrayList<KeyValue> row : rows) { KeyValue kv = row.get(0); byte[] expected = kv.value(); String userId = new String(kv.key()); PutRequest put = new PutRequest( TABLE_NAME, kv.key(), kv.family(), kv.qualifier(), mkNewPassword(expected)); Deferred<Boolean> d = client.compareAndSet(put, expected) .addCallback(new InterpretResponse(userId)) .addCallbacks(new ResultToMessage(), new FailureToMessage()) .addCallback(new SendMessage()); workers.add(d); }}

https://github.com/hbaseinaction/twitbase-async/blob/master/src/main/java/HBaseIA/TwitBase/AsyncUsersTool.java#L151-L173

28

OthersFull-blown schema

managementReduce day-to-day

developer pain

Spring-DataHadoop

[Orderly]

Phoenix

Kiji.org

https://github.com/ndimiduk/orderlyhttp://www.springsource.org/spring-data/https://github.com/forcedotcom/phoenix

http://www.kiji.org/29

Apache Futures

• Protobuf wire messages (0.96)

• C client (TBD, HBASE-1015)

• HBase Types (TBD, HBASE-8089)

30

So, Webapps?

http://www.amazon.com/Back-Point-Rapiers/dp/B0000271GC

31

Software Architecture

• Isolate DAO from app logic, separation of concerns, &c.

• Separate environment configs from code.

• Watch out for resource contention.

32

Deployment Architecture

• Cache everywhere.

• Know your component layers.

33

HBase Warts

• Know thy (HBase) version 0.{92,94,96} !

• long-running client bug (HBASE-4805).

• Gateway APIs only as up to date as the people before you require.

• REST API particularly unpleasant for “Web2.0” folk.

34

Thanks!

Nick Dimiduk github.com/ndimiduk @xefyr n10k.com

M A N N I N G

Nick Dimiduk Amandeep Khurana

FOREWORD BY Michael Stack

hbaseinaction.com

35