SolrCloud Failover and Testing

Post on 27-Jan-2015

129 views 2 download

Tags:

description

 

Transcript of SolrCloud Failover and Testing

SolrCloud Failover and Testing

Mark Miller (Cloudera)

Mark Miller

Lucene Committer, Solr Committer.

Works for Cloudera.

A lot of work on SolrCloud.

At Cloudera…

We are building an Enterprise Data Hub

Search is a part of that.

Solr is our search engine.

Solr On HDFS

Performance is good.

It can be even better.

A shared filesystem has advantages.

SolrCloud Reminder

Limitation

We replicate via both Solr and HDFS.

Replicating with just one has huge tradeoffs.

We are working on better tradeoffs.

autoAddReplicas

A new per collection option.

When a replica goes down, it is replaced on a node that is still up.

A shared filesystem as well means all replicas can go down and you can still automatically failover.

+

-

How Does it Work?SolrCloud elects a fault tolerant, single node to be an Overseer.

The Overseer monitors the cluster state in ZooKeeper.

Creates a new SolrCore on a machine that is up when necessary to replace ‘downed’ replicas.

Let’s Do A Demo!

SolrCloud Testing

Let’s talk about tests.

SolrCloud Tests

We did a straw man implementation of SolrCloud first.

We did the same for tests.

We favored integration tests over unit tests.

We did not make enough tests.

Distributed Tests

Are hard.

For a variety of reasons.

The Lucene / Solr testing framework hurts in order to help.

The Lucene / Solr Test Framework

Randomized Testing.

Rule Enforcement.

The Jenkins Cluster.

MocksWe avoided doing them early - too much churn.

They can be dangerous to future contributors / refactoring.

Some of the early mocking that did get in is a little painful.

We need them for good unit tests.

Testing Culture

Lucene has A+ testing culture. In many cases, it’s easier for Lucene.

Solr has a C testing culture.

Solr needs to get better.

Prescription?

More focus on back filling tests when adding features or changing code.

More focus on fixing frequently failing tests.

More focus on unit tests.

The End

@heismark

Thank You.