What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

26
What you get by replicating Lucene indexes on the Infinispan Data Grid 4 June 2012 Sanne Grinovero, Red Hat

description

At the core Infinispan is an in-memory data grid with options for multi node replication or distribution. It is also able to index and search stored data using Apache Lucene, or inverting the roles to have Lucene store any index in the grid. This talk will illustrate how to get started and what benefits (or drawbacks!) you get by storing your Lucene indexes in Infinispan. http://berlinbuzzwords.de/sessions/what-you-get-replicating-lucene-indexes-infinispan-data-grid

Transcript of What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Page 1: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

What you get by replicating

Lucene indexes on the

Infinispan Data Grid

4 June 2012Sanne Grinovero, Red Hat

Page 2: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Who is that guy?• Sanne Grinovero

• From this planet

• Team Hibernate

• Hibernate Search

• Hibernate OGM

• Team Infinispan

• Infinispan Core

• Infinispan Query

• Apache Lucene, Netty, HotSpot, ANTLR, JGroups, Byteman, The Jokre

Page 3: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

What are we talking about?

• Apache Lucene

• Infinispan

• Integrations with Lucene

● Infinispan Lucene Directory

Page 4: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Apache Lucene ?

Page 5: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

• An in-memory datagrid

• Memory of multiple nodes

• Cluster modes

• CacheLoaders

• Integrations with Lucene

• Lucene Directory

Page 6: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Infinispan API?• Map-like key/value store

• JSR 107 javax.cache.Cache interface

• JSR 347 ??

• Asynchronous API

Page 7: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

In practice:

cache.put( “user-34”, userInstance );

cache.get( “user-34” );

cache.remove( “user-34” );

cache.putIfAbsent( “user-38”, other );

Page 8: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Distributed Data

Page 9: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Connected via JGroups

A Toolkit for Reliable Multicast Communication

http://jgroups.org

Page 10: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Or remote clients via:• Memcached

• REST

• Hot Rod (Ruby, Python, C, C#, ...)

• Netty

Page 11: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Consistent Hashing: DIST

Page 12: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Transactions!

Page 13: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

JBoss AS7 core component

• Cluster nodes autodiscovery

• Session replication / failover

• Hibernate second level cache

• mod_cluster integration

Page 14: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

In-memory volatile?Cache Stores: durability, warm caches, more capacity...

• Cassandra

• HBase

• JDBC

• Clouds (S3, ...)

• Plain Old Files

• Many more + custom

Page 15: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Back on Lucene:Single Writer lock

Page 16: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Queue-based clustering(filesystem index)

Page 17: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Lucene index storage

Page 18: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)
Page 19: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Index stored in Infinispan

Page 20: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Example architecture : JIRA / Scarlet

Page 21: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Hints• Some tuning options might have

different effects than what you're used

• Network is orders of magnitude faster than disk (YMMV)

• But data locality helps

• Balance resources

• Get mergers to avoid segment chunking, or readlocks will engage

Page 22: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

“benchmarks”, stats and more lies

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 5000 10000 15000 20000 25000

Queries/sec

qu

eri

es

pe

r s

eco

nd

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 50 100 150 200 250 300 350 400

Write ops/sec

Page 23: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 5000 10000 15000 20000 25000

Queries/sec

qu

eri

es

pe

r s

eco

nd

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 50 100 150 200 250 300 350 400

Write ops/sec

It's not about the figures

Page 24: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

What's next?• Infinispan (core) 5.2 and 6

• Lucene 4.x

• Dynamic chunk sizes

• Ad-hoc “Lucene native” CacheStore

• NIO byte buffers?

Page 25: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Conclusions• Quick index replication

• Transactions

• Not a replacements for shards

• Cloud-friendly

• Delegates to any storage

Page 26: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Q&A

@Infinispan@Hibernate@SanneGrinovero

http://infinispan.orghttp://in.relation.tohttp://jboss.org