SolrCloud Cluster management via APIs

17
SolrCloud cluster management APIs Anshum Gupta Apache Lucene/Solr PMC member & committer IBM Watson

Transcript of SolrCloud Cluster management via APIs

SolrCloud cluster management APIsAnshum Gupta

Apache Lucene/Solr PMC member & committerIBM Watson

About me

• Anshum Gupta, Apache Lucene/Solr committer and PMC member, IBM Watson Search team.

• Interested in search and related stuff.

• Apache Lucene since 2006 and Solr since 2010.

Apache Solr is the most widely-used search solution on the planet.

Solr has tens of thousands of applications in production.

You use everyday.

8,000,000+Total downloads

Solr is both established and growing.

250,000+Monthly downloads

2,500+Open Solr jobs and the largest

community of developers.

Project Activity

• SolrCloud - Introduced in 4.0

• No APIs to manage distributed clusters

• We’ve come a long way since then

A little bit of history

01SolrCloud - Physical Architecture

ZooKeeper

Node 1 Node 2

LoadBalancer

Client

Client

Client

Client

Client

Client

Client

Client

Client

Lots

Of

Interaction

Coins by Creative Stall from the Noun Project

• Solr APIs

• Creating a collection

• Managing configuration

Starting up

• Easy!

• Replica placement strategy

• Machine, rack, DC aware placement

• Add more complicated rules

• Use the rules during all replica linked API calls

Scaling Up

• SPLITSHARD

• MIGRATE

Moving Data

• LIST

• OVERSEERSTATUS

• CLUSTERSTATUS

• CoreAdmin STATUS

Get STATUS

• ADD/REMOVE ROLE

• Only Overseer

• ADD/DELETE REPLICAPROP

• Only PreferredLeader

• BALANCESHARDUNIQUE

• Evenly distribute an arbitrary property among nodes for a collection

• REBALANCELEADERS

• FORCELEADER

Election Time!

Recipes

• Monitor disk-size

• CLUSTERSTATUS & Core STATUS

• SPLITSHARD

• Make sure there’s enough spare disk space

• Add one more replica

• Force leader election

• Delete old INACTIVE shard

Shard Splitting

• Auto-add replica

• Monitor cluster status - CLUSTERSTATUS API

• ADD Replica using replica placement strategy

• Remove replicas when there are too many

High availability

• CLUSTERSTATUS

• Overseer, Leaders, etc.

• Move the Overseer first

• Rebalance leaders

• REPLACENODE - Ver 6.2+ only

• Force leader elections if required

Migrating Cluster Infrastructure

• Reinventing the wheel - why?

• SOLR-9735 : Umbrella JIRA for cluster management

• Basic APIs

• Commission a new node

• Decommission an existing node

• Collection and aggregation of metrics at collection/node/shard/replica level

• Use the metrics and trigger recipe(s)

• Contribution in any form would be great!

Imagine… rule based automation!

Connect @

http://www.twitter.com/anshumgupta

http://www.linkedin.com/in/anshumgupta/

[email protected]