Administering and Monitoring SolrCloud Clusters

Post on 26-Jan-2015

110 views 3 download

Tags:

description

37 slides about taking care of your SolrCluster - Collections API, Core API, dynamic schema modification, segment merging, hard vs. soft commit, caches, monitoring, performance, JMX, it's all in here.

Transcript of Administering and Monitoring SolrCloud Clusters

Administering and Monitoring SolrCloud

Rafał Kuć – Sematext Group, Inc.@kucrafal @sematext sematext.com

Ta me…

Sematext consultant & engineerSolr.pl co-founderFather and husband

Solr Server

SolrCloud Concepts

Solr Server

Solr Server Solr Server

Shard1 Replica

Shard2 Replica

Shard2Shard1

Application

Local SolrCloud Cluster

java -Dbootstrap_confdir=./solr/revolution/conf -Dcollection.configName=revolution -DzkRun -DnumShards=1 -jar start.jar

Runs embedded ZooKeeperBootstraps collection with 1 shardsStarts Solr

Starting Solr Cluster

ZooKeeper ZooKeeper ZooKeeper

Solr Server Solr Server

-DzkHost=192.168.1.2:2181,192.168.1.1:2181,192.168.1.3:2181

Solr Server Solr Server

-DzkHost=192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181

-DzkHost=192.168.1.3:2181,192.168.1.1:2181,192.168.1.2:2181

-DzkHost=192.168.1.3:2181,192.168.1.1:2181,192.168.1.2:2181

No Collection

No Collection No Collection

No Collection

Uploading Collection Configuration

./zkcli.sh -cmd upconfig -zkhost 192.168.1.1:2181 -confdir ./conf/ -confname revolution

ZooKeeper

ZooKeeper

ZooKeeper

Collection configuration Solr

Collections APICreate

Delete

Reload

Split

Create Alias

Delete Alias

Shard Creation/Deletionhttp://wiki.apache.org/solr/SolrCloud

Collection Creation

curl 'http://solrhost:8983/solr/admin/collections?action=CREATE&name=revolution&numShards=3&replicationFactor=4'

name

numShards

replicationFactor

maxShardsPerNode

createNodeSet

collection.configName

Collection Split Example

$ curl 'http://solr1:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=1'

Collection Split Example

$ curl 'http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1'

Getting Deeper – CoreAdmin API

curl 'http://solrhost:8983/solr/admin/cores?action=CREATE&name=newcore&collection=revolution&shard=shard2'

collection

shard

numShards

collection.configName

Schema – the API

Reading (Solr 4.2)FieldsDynamic fieldsTypesCopy fieldsName (4.3)Version (4.3)Unique Key (4.3)Similarity (4.3)

Writing (Solr 4.4)Adding new fieldsAdding copy fields

Reading Your Schema

curl -XGET 'http://solrhost:8983/solr/rev/schema/fields/name'

Full reference: http://wiki.apache.org/solr/SchemaRESTAPI

{ "responseHeader" : { "status" : 0, "QTime" : 5 }, "field" : { "name" : "name", "type" : "text_general", "indexed" : true, "stored" : true }}

Dynamic Schema Modifications<schemaFactory class="ManagedIndexSchemaFactory"> <bool name="mutable">true</bool> <str name="managedSchemaResourceName">managed-schema</str> </schemaFactory>

curl -XPUT 'http://solrhost:8983/solr/rev/schema/fields/content' –d'{ "type" : "text", "stored" : "false", "copyFields" : ["catchAll"]}'

curl -XPOST 'http://solrhost:8983/solr/rev/schema/copyFields' -d '[ { "source" : "name", "dest" : [ "text", "personal" ] }]'

The Right Directory

_0.fdt _0.fdx _0.fnm _0.nvd

_1.fdt _1.fdx _1.fnm _1.nvd

StandardDirectory

SimpleFSDirectory

NIOFSDirectory

MMapDirectory

NRTCachingDirectory

RAMDirectory <directoryFactory name="DirectoryFactory" class="solr.NRTCachingDirectoryFactory" />

Segment Merging

a b c d e

Level 0 Level 1

cf g

Segment Merge Under Control

Merge policy

Merge scheduler

Merge factor

Merge policy configuration

https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig

Autocommit or Not?

<autoCommit> <maxTime>15000</maxTime> <maxDocs>1000</maxDocs> <openSearcher>false</openSearcher></autoCommit>

<autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>

Automatic data flush (hard commit)

Automatic index view refresh

Caches

q=lucene+revolution

fq=city:Dublin

Solr Cache

Refreshed with IndexSearcher

Configurable

Different purposes

Different implementations

Monitoring Importance

What to Pay Attention to?

Cluster State

Health

Shards and replica status

Shard placement

Failing nodes

Indexing Related Metrics

Index throughput

Document distribution

I/O subsystem metrics

Merging

Search - related Metrics

Count

Latency

Distribution among nodes

Anomalies and spikes

Monitoring Memory and GC

Heap details

Pool size

Pool utilization

Garbage collection count

Garbage collection time

Monitoring OS Related Metrics

CPU details

Load

I/O activity

Network usage

Solr Administration Panel

Solr & JMX<jmx />

java -Dcom.sun.management.jmxremote –jar start.jar

Solr & JMX

SPMIndex statistics

Request # and latency

Caches and warmup

CPU

JVM Memory and OS Memory

Garbage collector

OS related statistics

SPM Dashboard

Other Monitoring Tools

Ganglia http://ganglia.sourceforge.net/

New Relic http://www.newrelic.com/

Opsview http://www.opsview.com

Too much is too much

Too hot

Caches

We Are Hiring !

Dig Search ?Dig Analytics ?Dig Big Data ?Dig Performance ?Dig working with and in open – source ?We’re hiring world – wide !

http://sematext.com/about/jobs.html

Rafał Kuć @kucrafal rafal.kuc@sematext.com

Sematext @sematext http://sematext.com http://blog.sematext.com

SPM discount code: LR2013SPM20

Thank You !

@ Sematext booth ;)