Download - What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Transcript
Page 1: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

What you get by replicating

Lucene indexes on the

Infinispan Data Grid

4 June 2012Sanne Grinovero, Red Hat

Page 2: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Who is that guy?• Sanne Grinovero

• From this planet

• Team Hibernate

• Hibernate Search

• Hibernate OGM

• Team Infinispan

• Infinispan Core

• Infinispan Query

• Apache Lucene, Netty, HotSpot, ANTLR, JGroups, Byteman, The Jokre

Page 3: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

What are we talking about?

• Apache Lucene

• Infinispan

• Integrations with Lucene

● Infinispan Lucene Directory

Page 4: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Apache Lucene ?

Page 5: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

• An in-memory datagrid

• Memory of multiple nodes

• Cluster modes

• CacheLoaders

• Integrations with Lucene

• Lucene Directory

Page 6: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Infinispan API?• Map-like key/value store

• JSR 107 javax.cache.Cache interface

• JSR 347 ??

• Asynchronous API

Page 7: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

In practice:

cache.put( “user-34”, userInstance );

cache.get( “user-34” );

cache.remove( “user-34” );

cache.putIfAbsent( “user-38”, other );

Page 8: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Distributed Data

Page 9: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Connected via JGroups

A Toolkit for Reliable Multicast Communication

http://jgroups.org

Page 10: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Or remote clients via:• Memcached

• REST

• Hot Rod (Ruby, Python, C, C#, ...)

• Netty

Page 11: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Consistent Hashing: DIST

Page 12: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Transactions!

Page 13: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

JBoss AS7 core component

• Cluster nodes autodiscovery

• Session replication / failover

• Hibernate second level cache

• mod_cluster integration

Page 14: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

In-memory volatile?Cache Stores: durability, warm caches, more capacity...

• Cassandra

• HBase

• JDBC

• Clouds (S3, ...)

• Plain Old Files

• Many more + custom

Page 15: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Back on Lucene:Single Writer lock

Page 16: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Queue-based clustering(filesystem index)

Page 17: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Lucene index storage

Page 18: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)
Page 19: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Index stored in Infinispan

Page 20: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Example architecture : JIRA / Scarlet

Page 21: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Hints• Some tuning options might have

different effects than what you're used

• Network is orders of magnitude faster than disk (YMMV)

• But data locality helps

• Balance resources

• Get mergers to avoid segment chunking, or readlocks will engage

Page 22: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

“benchmarks”, stats and more lies

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 5000 10000 15000 20000 25000

Queries/sec

qu

eri

es

pe

r s

eco

nd

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 50 100 150 200 250 300 350 400

Write ops/sec

Page 23: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 5000 10000 15000 20000 25000

Queries/sec

qu

eri

es

pe

r s

eco

nd

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 50 100 150 200 250 300 350 400

Write ops/sec

It's not about the figures

Page 24: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

What's next?• Infinispan (core) 5.2 and 6

• Lucene 4.x

• Dynamic chunk sizes

• Ad-hoc “Lucene native” CacheStore

• NIO byte buffers?

Page 25: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Conclusions• Quick index replication

• Transactions

• Not a replacements for shards

• Cloud-friendly

• Delegates to any storage

Page 26: What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)

Q&A

@Infinispan@Hibernate@SanneGrinovero

http://infinispan.orghttp://in.relation.tohttp://jboss.org