HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

26
find the talk Building a Large Search Platform on a Shoestring Budget May 22, 2012 Jacques Nadeau, CTO [email protected] @intjesus

description

YapMap is a new kind of search platform that does multi-quanta search to better understand threaded discussions. This talk will cover how HBase made it possible for two self-funded guys to build a new kind of search platform. We will discuss our data model and how we use row based atomicity to manage parallel data integration problems. We’ll also talk about where we don’t use HBase and instead use a traditional SQL based infrastructure. We’ll cover the benefits of using MapReduce and HBase for index generation. Then we’ll cover our migration of some tasks from a message based queue to the Coprocessor framework as well as our future Coprocessor use cases. Finally, we’ll talk briefly about our operational experience with HBase, our hardware choices and challenges we’ve had.

Transcript of HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Page 1: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

find the talk

Building a Large Search Platform on a Shoestring Budget

May 22, 2012

Jacques Nadeau, [email protected]@intjesus

Page 2: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Agenda

What is YapMap?• Interfacing with Data• Using HBase as a data processing pipeline• NoSQL Schemas: Adjusting and Migrating• Index Construction• HBase Operations

Page 3: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

What is YapMap?

• A visual search technology • Focused on threaded

conversations• Built to provide better context

and ranking• Built on Hadoop & HBase for

massive scale• Two self-funded guys• Motoyap.com is largest

implementation at 650mm automotive docs

www.motoyap.com

Page 4: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Why do this?

• Discussion forums and mailings list primary home for many hobbies

• Threaded search sucks– No context in the middle

of the conversation

Page 5: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

How does it work?

Post 1Post 2

Post 3Post 4

Post 5

Post 6

Page 6: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

A YapMap Search Result Page

Page 7: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Entire Thread is MainDocGroup

For long threads, a single group may have multiple MainDocs

Conceptual data model

• Threads are broken up among many web pages and don’t necessarily arrive in order

• Longer threads are broken up– For short threads, MainDocGroup == MainDoc

Post 1Post 2

Post 3Post 4

Post 5

Post 6

Each individual post is a DetailDoc

Page 8: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

General architecture

Targeted Crawlers

Processing Pipeline

Indexing Engine

Results Presentation

HBase

HDFS/MapRfs

Zookeeper

Riak

RabbitMQ MapReduce

MySQL MySQL

Page 9: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

We match the tool to the use case

We also evaluated Voldemort and Cassandra

MySQL HBase RiakPrimary Use Business

management information

Storage of crawl data, processing pipeline

Storage of components directly related to presentation

Key features that drove selection

Transactions, SQL, JPA

Consistency, redundancy, memory to persitence ratio

Predictable low latency, full uptime, max one IOP per object

Average Object Size Small 20k 2kObject Count <1 million 500 million 1 billionSystem Count 2 10 8Memory Footprint <1gb 120gb 240gbDataset Size 10mb 10tb 2tb

Page 10: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Agenda

• What is YapMap?Interfacing with Data• Using HBase as a data processing pipeline• NoSQL Schemas: Adjusting and Migrating• Index Construction• HBase Operations

Page 11: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

HBase client is a power user interface

• HBase client interface is low-level– Similar to JDBC/SQL

• Most people start by using Bytes.to(String|Short|Long)– Spaghetti data layer

• New developers have to learn a bunch of new concepts

• Mistakes are easy to make

Page 12: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

We built a DTO layer to simplify dev• Data Transfer Objects (DTO) & data access layer provide single point for

code changes and data migration • First-class row key objects • Centralized type serialization

– Standard data types– Complex object serialization layer via protobuf

• Provide optimistic locking• Enable asynchronous operation• Minimize mistakes:

– QuerySet abstraction (columns & column families)– Field state management (not queried versus null)

• Newer tools have arrived to ease this burden– Kundera and Gora

Page 13: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Examples from our DTO abstraction<table name="crawlJob" row-id-class=“example.CrawlJobId" > <column-family name="main" compression="LZO" blockCacheEnabled="false" versions="1"> <column name="firstCheckpoint" type=“example.proto.JobProtos$CrawlCheckpoint" /> <column name="firstCheckpointTime" type="Long" /> <column name="entryCheckpointCount" type="Long" /> ...

public class CrawlJobModel extends SparseModel<CrawlJobId>{ public CrawlJobId getId(){…} public boolean hasFirstCheckpoint(){…} public CrawlCheckpoint getFirstCheckpoint(){…} public void setFirstCrawlCheckpoint(CrawlCheckpoint checkpoint){…} …

public interface HBaseReadWriteService{ public void putUnsafe(T model); public void putVersioned(T model); public T get(RowId<T> rowId, QuerySet<T> querySet); public void increment(RowId<T> rowId, IncrementPair<T>... pairs); public SutructuredScanner<T> scanByPrefix(byte[] bytePrefix, QuerySet<T> querySet); ….

Mod

el

Defi

nitio

nG

ener

ated

M

odel

HBa

se

Inte

rfac

e

Page 14: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Example Primary Keys

UrlId

MainDocId

xxxx x xxxxxxx xx

4 byte source id1 byte type of identifier enum (int, long or sha2, generic 32)

Additional identifier (4, 8 or 32 bytes depending on type)

2 byte bucket number (part)GroupId (row)

org.apache.hbase:80:x:/book/architecture.html

Reverse domainOptional Port

Client Protocol (e.g. user name + http)

Path + Query String

Page 15: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Agenda

• What is YapMap?• Interfacing with DataUsing HBase as a data processing pipeline• NoSQL Schemas: Adjusting and Migrating• Index Construction• HBase Operations

Page 16: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Processing pipeline is built on HBase• Multiple steps with checkpoints to manage failures• Out of order input assumed• Idempotent operations at each stage of process• Utilize optimistic locking to do coordinated merges• Use regular cleanup scans to pick up lost tasks• Control batch size of messages to control throughput versus latency

Build Main Docs

Merge + SplitMain Doc

Groups

Pre-index Main DocsCrawlers

Message Message Message

DFS t1:cf1Cache t2:cf1 t2:cf2

Batch Indexing

RT Indexing

Page 17: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Migrating from messaging to coprocessors

• Big challenges– Mixing system code and application code– Memory impact: we have a GC stable state

• Exploring HBASE-4047 to solve

Build Main Docs

Merge + SplitMain Doc

Groups

Pre-index Main DocsCrawlers

Message Message Message

DFS t1:cf1Cache t2:cf1 t2:cf1

Batch Indexing

RT Indexing

CP CP

Page 18: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Agenda

• What is YapMap?• Interfacing with Data• Using HBase as a data processing pipelineNoSQL Schemas: Adjusting and Migrating• Index Construction• HBase Operations

Page 19: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Learn to leverage NoSQL strengths

• Original Structure was similar to traditional RDBMS, – static column names – fully realized MainDoc

• One new DetailDoc could cause a cascading regeneration of all MainDocs

• New structure utilizes a cell for each DetailDoc

• Split metadata maps MainDoc > DetailDocId

• HBase handles cascading changes• MainDoc realized on app read• Use column prefixing

0

MainDoc

1

MainDoc

2

MainDoc

metadata

Splits

detailid1

Detail

detailid2

Detail

Page 20: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Schema migration steps

1. Disable application writes on OldTable2. Extract OldSplits from OldTable3. Create NewTable with appropriate column families and

properties4. Split NewTable based on OldSplits 5. Run MapReduce job that converts old objects into new

objects– Use HTableInputFormat as input on OldTable– Use HFileOutputFormat as output format pointing at NewTable

6. Bulk load output into NewTable7. Redeploy application to read on NewTable8. Enable writes in application layer

Page 21: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Agenda

• What is YapMap?• Interfacing with Data• Using HBase as a data processing pipeline• NoSQL Schemas: Adjusting and MigratingIndex Construction• HBase Operations

Page 22: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Index Shards loosely based on HBase regions

• Indexing is split between major indices (batch) and minor (real time)

• Primary key order is same as index order

• Shards are based on snapshots of splits

• IndexedTableSplit allows cross-region shard splits to be integrated at Index load time

R1

R2

R3

Shard 1

Shard 2

Shard 3

Tokenized Main Docs

Page 23: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Batch indices are memory based, stored on DFS

• Total of all shards about 1tb– With ECC memory <$7/gb, systems easily achieving 128-256gb

each=> no problem• Each shard ~5gb in size to improve parallelism on search

– Variable depending on needs and use case• Each shard is composed of multiple map and reduce parts

along with MapReduce statistics from HBase– Integration of components are done in memory– Partitioner utilizes observed term distributions– New MR committer: FileAndPutOutputCommitter

• Allows low volume secondary outputs from Map phase to be used during reduce phase

Page 24: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Agenda

• What is YapMap?• Interfacing with Data• Using HBase as a data processing pipeline• NoSQL Schemas: Adjusting and Migrating• Index ConstructionHBase Operations

Page 25: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

HBase Operations

• Getting GC right – 6 months– Machines have 32gb, 12gb for HBase, more was a problem

• Pick the right region size: With HFile v2, just start bigger• Be cautious about using multiple CFs• Consider Asynchbase Client

– Benoit did some nice work at SU– Ultimately we just leveraged EJB3.1 @Async capabilities to make our HBase

service async

• Upgrade: typically on the first or second point release– Testing/research cluster first

• Hardware: 8 core low power chips, low power ddr3, 6x WD Black 2TB drives per machine, Infiniband

• MapR’s M3 distribution of Hadoop

Page 26: HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget

Questions• Why not Lucene/Solr/ElasticSearch/etc?

– Data locality between main and detail documents to do document-at-once scoring– Not built to work well with Hadoop and HBase (Blur.io is first to tackle this head on)

• Why not store indices directly in HBase?– Single cell storage would be the only way to do it efficiently – No such thing as a single cell no-read append (HBASE-5993)– No single cell partial read

• Why use Riak for presentation side?– Hadoop SPOF– Even with newer Hadoop versions, HBase does not do sub-second row-level HA on node

failure (HBASE-2357)– Riak has more predictable latency

• Why did you switch to MapR?– Index load performance was substantially faster– Less impact on HBase performance– Snapshots in trial copy were nice for those 30 days