An Introduction to Distributed Search with Cassandra and Solr

32
TOO BIG TO FAIL An Introduction to Distributed Search with Cassandra and Solr OpenSource Connections @PatriciaGorla [email protected]

description

An Introduction to Distributed Search with Cassandra and Solr by Patricia Gorla

Transcript of An Introduction to Distributed Search with Cassandra and Solr

Page 1: An Introduction to Distributed Search with Cassandra and Solr

TOO BIG TO FAILAn Introduction to Distributed Search with Cassandra and Solr

OpenSource Connections@PatriciaGorla

[email protected]

Page 2: An Introduction to Distributed Search with Cassandra and Solr

ABOUT MESystems AnalystProgramming

Information Retrieval

Page 3: An Introduction to Distributed Search with Cassandra and Solr

Created at Facebook to power inbox search

Distributed data store run on commodity servers

Highly available

No one single point of failure

CASSANDRA

Page 4: An Introduction to Distributed Search with Cassandra and Solr

WHO USES CASSANDRA?

Page 5: An Introduction to Distributed Search with Cassandra and Solr

SEARCH + CASSANDRA, 1

• First implementation: Solandra (originally Lucandra)

• Based on Solr

• Replaced Lucene index with Cassandra column families

Page 6: An Introduction to Distributed Search with Cassandra and Solr

SEARCH + CASSANDRA, 2

• DataStax Enterprise Search

• Uses native Lucene index

• All data is retrieved from Cassandra

Page 7: An Introduction to Distributed Search with Cassandra and Solr

Datastax Enterprise Search Cluster

DistributedLinearly ScalableHighly AvailableEventually ConsistentFull-text searchAggregation

Page 8: An Introduction to Distributed Search with Cassandra and Solr

WRITING TO CLUSTER

• Write to either Cassandra clients or Solr API

• Write process is the same

• True atomic updates to Cassandra

Page 9: An Introduction to Distributed Search with Cassandra and Solr

Information is distributed among Cassandra nodes.

Page 10: An Introduction to Distributed Search with Cassandra and Solr

Data can be written directly to Cassandra

Page 11: An Introduction to Distributed Search with Cassandra and Solr

Data is distributed according to row key hash and replication factor

Page 12: An Introduction to Distributed Search with Cassandra and Solr

DSE first writes to Cassandra

Page 13: An Introduction to Distributed Search with Cassandra and Solr

And then updates the Solr index

Page 14: An Introduction to Distributed Search with Cassandra and Solr

The quorum responds with success / failure

Page 15: An Introduction to Distributed Search with Cassandra and Solr

Data is now distributed evenly

Page 16: An Introduction to Distributed Search with Cassandra and Solr

READING FROM CLUSTER

• Read either Cassandra-side or through Solr API

• Cassandra: fast reads*

• Solr : full-text search

• Read direction affects performance

• Data is stored in Cassandra

Page 17: An Introduction to Distributed Search with Cassandra and Solr

Query is sent to node

Page 18: An Introduction to Distributed Search with Cassandra and Solr

Node uses GOSSIP to find who has the information

Page 19: An Introduction to Distributed Search with Cassandra and Solr

QUERYING CASSANDRA

• Can query Solr or Cassandra directly

• Limited syntax with CQL, can use solr_query parameter

Page 20: An Introduction to Distributed Search with Cassandra and Solr

Querying Cassandra directly

Page 21: An Introduction to Distributed Search with Cassandra and Solr

Cassandra retrieves information from column

family

Page 22: An Introduction to Distributed Search with Cassandra and Solr

Querying Solr index

Page 23: An Introduction to Distributed Search with Cassandra and Solr

Solr checks its index, and queries Cassandra for

stored data

Page 24: An Introduction to Distributed Search with Cassandra and Solr

Cassandra node checks its column family

Page 25: An Introduction to Distributed Search with Cassandra and Solr

Data is always synced (as long as node is in sync)

Page 26: An Introduction to Distributed Search with Cassandra and Solr

Both nodes respond with information

Page 27: An Introduction to Distributed Search with Cassandra and Solr

Updates can be committed and searched over in real time

Page 28: An Introduction to Distributed Search with Cassandra and Solr

PRODUCTION USE

• Will want a mix of analytics, search nodes

Page 29: An Introduction to Distributed Search with Cassandra and Solr

An OLTP - OLAP integrated solution

Page 30: An Introduction to Distributed Search with Cassandra and Solr

TRADEOFFS

• Cassandra is no longer schema-optional: changing the Solr schema requires reindex (standard for Solr)

• No multi-valued fields or composite columns

• All fields must be strings: no date facets for Solr, no counters for Cassandra

Page 31: An Introduction to Distributed Search with Cassandra and Solr

CONCLUSION

• Useful for real-time text-only searches

• DSE is promising, but needs to address drawbacks