Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords - June 2012

25
CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3

description

With the HBase 0.92 release it is now possible to embed code directly into HBase server processes. Coprocessors offer a very flexible extension model to move functionality closer to the data.This presentation explains the key concepts and shows how coprocessors have been used in our social media monitoring system to accomplish the alert feature implemented with prospective search. It allows users to formulate a query by which they will be instantly notified via email about every newly detected document matching the terms.This session aims to help people interested in coprocessors understand how coprocessors can be adapted to their own use cases and how to develop coprocessor extensions.

Transcript of Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords - June 2012

CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3

2

Agenda

•  Why Hadoop and HBase? •  Social Media Monitoring •  Prospective Search and Coprocessors

•  Challenges & Lessons Learned •  Resources to get started

5. Juni 2012

3

About me

Software Architect @ sentric

Co-founder and organizer of the Swiss HUG

Contact: [email protected] http://www.sentric.ch @chrisgugi

5. Juni 2012

4

About sentric

•  Spin-off of MeMo News AG, the leading provider for Social Media Monitoring & Analytics in Switzerland

•  Big Data expert, focused on Hadoop, HBase and Solr

•  Objective: Transforming data into insights

5. Juni 2012

CC 2.0 by Pete Reed | h"p://flic.kr/p/KS9kf  

6

Social Media Monitoring Process

Why Hadoop and HBase?

5. Juni 2012

Information Gathering

Information Processing

Analysis & Interpretation

Insight Presentation

7

Requirements

Why Hadoop and HBase?

5. Juni 2012

SMM

Cost effective

High scalable

RT Alerting Analytical capabilities

Reliable

8

Technology Stack

Why Hadoop and HBase?

5. Juni 2012

HBase /HDFS Storage

Hadoop Mahout Analytics

Solr Search

HBase RowLog Event mechanism (MQ)

Prospective search Real-time alerting

CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf

10

Overview

Social Media Monitoring

5. Juni 2012

Search Agents

Downloaded Articles

Output

match?

RT Alerts Reports Web-UI

Icons by http://dryicons.com

11

Solution Architecture

Social Media Monitoring

5. Juni 2012

REST

n Crawler

MySQL Solr

Web-UI

RT Alerts

RowLog Coprocessor

HBase

Icons by http://dryicons.com

12

Overview

•  Inspired by Google Bigtable coprocessors

•  HBase version 0.92 •  Embed code directly into server

processes •  High-level call interface for clients •  Automatic scaling, load balancing,

request routing

Short Primer on Coprocessors

5. Juni 2012

13

Observer Classes

•  Like a database trigger •  Provides event based hooks

•  Concrete Implementations •  RegionObserver •  CRUD or DML type operations

•  MasterObserver •  DDL or metadata operations and cluster

administration

•  WALObserver •  Write-ahead-log appending and restoration

Short Primer on Coprocessors

5. Juni 2012

14

Observer Execution

Short Primer on Coprocessors

5. Juni 2012

Client:Get()

CP1:preGet() CP2:preGet() CP3:preGet()

Hregion:Get()

CP1:postGet() CP2:postGet() CP3:postGet()

RegionServer

client response

15

Endpoint Classes

•  Comparable to stored procedures •  Custom RPC protocol, used between

client and region server

•  Loaded in region server •  Client call APIs over single row or a

row range •  Framework translates row keys to region

location •  Parallel execution

Short Primer on Coprocessors

5. Juni 2012

16

Endpoint Call Routine

Short Primer on Coprocessors

5. Juni 2012

Client code

Batch.Call<CountProtocol,int>

int call(CountProtocol p) { return p.getRowCount();

} .

Map<byte[], Integer> countsByRegion

HTable

coprocessorExec()

CountProtocol

CountProtocol

CountProtocol

CountProtocol

Region Server 1

table,,12345678

table,bbb,12345678

Region Server 2

table,ccc,12345678

table,ddd,12345678

17

Use Cases

•  HBase Security (Version 0.94) •  Aggregate operations avg(), sum() •  AggregatorProtocol

•  HBASE-3529: Embedded search

Short Primer on Coprocessors

5. Juni 2012

18

Prospective Search with Coprocessors

Social Media Monitoring

5. Juni 2012

Processing

HRegionServer

HRegion

Put operations

Prospective Search

RT Alerts

Icons by http://dryicons.com

19

Testing Setup

•  Standard, virtualized test cluster: 4RS/DN, 1HM, 1NN, 3ZK

•  Test dataset created from 2h of live index (1GB)

•  Drive load on RS/DN

Social Media Monitoring

5. Juni 2012

20

Test Results

Social Media Monitoring

5. Juni 2012

0

200

400

600

800

1000

1200

1400

1600

1800

0 10 50 100 200 400 800

Wri

tes/

sec

# of agents

CC 2.0 by Sean Maurik | h"p://flic.kr/p/JUduu  

22

Challenges

•  Everyone is still learning •  Some issues only appear at scale •  Production cluster configuration •  Hardware issues •  Tuning cluster configuration to our work

loads

•  HBase stability •  Monitoring health of HBase

Challenges & Lessons Learned

5. Juni 2012

23

Lessons

•  Be careful with expensive operations in coprocessors

•  At scale, nothing works as advertised •  Monitoring/Operational tooling is

most important •  Play with all the configurations and

benchmark for tuning

Challenges & Lessons Learned

5. Juni 2012

24

Resources to get started

•  https://blogs.apache.org/hbase/entry/coprocessor_introduction

•  http://hbase.apache.org/apidocs/index.html

•  http://www.lilyproject.org/lily/about/playground/hbaserowlog.html

•  http://www.github.com/sentric/HBasePS

5. Juni 2012

25

Thank you!

Questions? Christian Gügi

[email protected]

Berlin Buzzwords 2012

5. Juni 2012