Cassandra Summit 2015: Intro to DSE Search
-
Upload
caleb-rackliffe -
Category
Technology
-
view
1.206 -
download
3
Transcript of Cassandra Summit 2015: Intro to DSE Search
An Introduction to DSE SearchCaleb RackliffeSoftware [email protected]@calebrackliffe
What problem were we trying to solve?
3
Application
DataStax Driver
4
SELECT * FROM customers WHERE country LIKE '%land%';
5
What about secondary indexes?
Why not just create your own secondary index implementation that supports wildcard queries?
7
I need full-text search!
Why did we build something new?
10
Application
DataStax Driver Solr Client
Polyglot Persistence!
12
Application
DataStax Driver Solr Client
Consistency
Cost
Complexity
14
partitioning
multi-DC
replication
geospatial
wildcards
monitoring
C* field type support (UDT, Tuple, collections)security
live indexing
sorting
faceting
fault-tolerant distributed search
cachingtext analysis
grouping
automatic index updates
JVM
CQL
repair
15
Application
DataStax Driver Solr Client
Consistency
Complexity
Cost
How about some examples?
Creating a Solr Core
bash$ dse cassandra -s
cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
'Solr':1};
cqlsh:test> CREATE TABLE test.user(username text PRIMARY KEY, fullname text, address_ map<text, text>);
bash$ dsetool create_core test.user generateResources=true
Start a node…
Create a table…
Create the core…
bash$ dsetool get_core_schema test.user
<?xml version="1.0" encoding="UTF-8" standalone=“no"?><schema name="autoSolrSchema" version="1.5"> <types> <fieldType class="org.apache.solr.schema.TextField" name="text"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType class="org.apache.solr.schema.StrField" name="string"/> </types> <fields> <field indexed="true" name="username" stored="true" type="string"/> <field indexed="true" name="fullname" stored="true" type="text"/> <dynamicField indexed="true" name="address_*" stored="true" type="string"/> </fields> <uniqueKey>fullname</uniqueKey></schema>
The Schema
Insert Rows (…and Index Documents)
cqlsh:test> INSERT INTO user(username, fullname, address)VALUES('sbtourist', 'Sergio Bossa', {'address_home' : 'UK', 'address_work' : 'UK'});
cqlsh:test> INSERT INTO user(username, fullname, address) VALUES('bereng', 'Berenguer Blasi', {'address_home' : 'ES', 'address_work' : 'ES'});
cqlsh:test> INSERT INTO user(username, fullname, address)VALUES('thegrinch', 'Sven Delmas', {'address_home':'US','address_work':'HQ'});
…and that’s it. No ETL. No writing to a second datastore.
Wildcards
cqlsh:test> SELECT username, address FROM user WHERE solr_query='{"q":"address_home:U*"}'; username | address-----------+---------------------------------------------------- sbtourist | {‘address_home': 'UK', ‘address_work': 'UK'} thegrinch | {‘address_home': 'US', ‘address_work': 'HQ'}(2 rows)
Sorting and Limitscqlsh:test> SELECT username, address FROM user WHERE solr_query=‘{"q":"*:*", "sort":"address_home desc"}'; username | address-----------+---------------------------------------------------- thegrinch | {'address_home': 'US', 'address_work': 'HQ'} sbtourist | {'address_home': 'UK', 'address_work': 'UK'} bereng | {'address_home': 'ES', 'address_work': 'ES'}(3 rows)
cqlsh:test> SELECT username, address FROM user WHERE solr_query='{"q":"*:*", "sort":"address_home desc"}' LIMIT 1; username | address-----------+---------------------------------------------------- thegrinch | {'address_home': 'US', 'address_work': 'HQ'}(3 rows)
Faceting
cqlsh:test> SELECT * FROM user
WHERE solr_query='{"q":"*:*", "facet":{"field" : "address_work"}}';
facet_fields-------------------------------------------- {"address_work" : {"ES" : 1 , "HQ" : 1 , "UK" : 1}}
(1 rows)
Partition Restrictions
cqlsh:test> CREATE TABLE event(sensor_id bigint, recording_time timestamp, description text, PRIMARY KEY(sensor_id, recording_time));
…
cqlsh:test> SELECT recording_time, description FROM test.event WHERE sensor_id = 2314234432 AND
solr_query=‘description:unremarkable’;
What do the internals look like?
Indexing
26
Buffered
Searchable
Durable
Memory
Disk
27
Buffered
Searchable
Durable
Memory
Disk
28
RAMBuffer
Segment
Segment
Memory
Disk
Segment Segment
Buffered
Searchable
Durable
Soft Commit
Hard Commit
Querying
Replica Selection
A
A
RF=2shards: A-E
B
B CC D
D E
E
coordinator1
2
34
5
Healthy Unhealthy
Replica Selection
A
A
RF=2shards: A-E
B
B CC D
D E
E
coordinator1
2
34
5
Healthy Unhealthy
What happens if a shard query fails?
Failover: Phase 1
4 nodesRF = 2shards: A-Dno vnodes
1
2
3
4
Failover: Phase 2
4 nodesRF = 2shards: A-Dno vnodes
1
2
3
4
Failover: Phase 3
4 nodesRF = 2shards: A-Dno vnodes
1
2
3
4
Platform Integrations
Search + Analytics: Explicit Predicate Pushdown
bash$ dse spark
scala> val table = sc.cassandraTable("wiki","solr")
scala> val result = table.select("id","title") .where(“solr_query=‘body:dog'") .collect
http://docs.datastax.com