Riak Search - Berlin Buzzwords 2010

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)


Riak Search is a distributed data indexing and search platform built on top of Riak. The talk will introduce Riak Search, covering overall goals, architecture, and core functionality.

Transcript of Riak Search - Berlin Buzzwords 2010

  • 1. Riak Search A Full-Text Searchand Indexing Enginebased on Riak Berlin Buzzwords June 2010 Basho Technologies Rusty Klophaus - @rklophaus

2. Why did we build it? What are the major goals? How does it work? 2 3. Part One Why did we buildRiak Search?3 4. Riak is a scalable, highly-available, networked, open-source key/value store.4 5. Writing to a Key/Value StoreKey/ValueCLIENTRIAK 5 6. Writing to a Key/Value StoreObjectCLIENTRIAK 6 7. Querying a Key/Value StoreKeyObject CLIENT RIAK7 8. Querying Riak via LinkWalking Key + InstructionsWalk toRelated Keys Object(s) CLIENT RIAK8 9. Querying Riak via Map/ReduceKey(s) + JS Functions MapMap ReduceComputed Value(s) CLIENT RIAK 9 10. Key/Value Storeslike Key-Based Queries 10 11. Query by Secondary Index where Category == "Shoes"WTF!? I'm aKV store!CLIENTRIAK11 12. Full-Text Query "Converse AND Shoes" This is getting old. CLIENT RIAK12 13. These kinds of queriesneed an Index.*Market Opportunity!* 13 14. Part Two What are the major goals of Riak Search? 14 15. An application built on Riak. Your Riak Application15 16. Hrm... I need an index. YourRiak Application Index Object 16 17. Hrm... I need an index with more features.Your??? Riak Application 17 18. Lucene should do the trick...Your LuceneRiakApplication18 19. ...shard to add more storage capacity... YourApplicationLucene Lucene Lucene Riak19 20. ...replicate to add more throughput.Lucene Lucene Lucene YourApplicationLucene Lucene Lucene RiakLucene Lucene Lucene20 21. ...replicate to add more throughput.Lucene Lucene Lucene YourApplicationLucene Lucene Lucene RiakLucene Lucene Lucene Operations nightmare! 21 22. What do we really want?Your Riak-ifiedRiakApplicationLucene 22 23. What do we really want?YourRiak RiakApplication Search23 24. Functionality? Be like Lucene (and more). Lucene Syntax Leverages Java Lucene Analyzers Solr Endpoints Integration via Riak Post-Commit Hook (Index) Integration via Riak Map/Reduce (Query) Near-Realtime Schema-less 24 25. Operations? Be like Riak. No special nodes Add nodes, get more compute and storage Automatically load balance Replicas for durability and performance Index and query in parallel Swappable storage backends 25 26. Part Three How do we do it?26 27. A Gentle Introduction to Document Indexing27 28. The Inverted IndexDocument Inverted Index day, 1dog, 1 #1Every dog hashis day.every, 1has, 1his, 1 28 29. The Inverted Index DocumentsCombined Inverted Index and, 4#1 Every dog has his day. bag, 3 bark, 2 bite, 2 The dog's bark#2 cat, 3 is worse than his bite. cat, 4 day, 1 dog, 1#3 Let the cat out of the bag. dog, 2 dog, 4 every, 1 #4It's raining cats and dogs.has, 1 ... 29 30. At Query Time... "dog AND cat"AND dog cat30 31. At Query Time... ANDdogcatdog, 1cat, 3dog, 2cat, 4dog, 4 31 32. At Query Time... Result: 4 AND(Merge Intersection)1 32 44 32 33. At Query Time...Result: 1, 2, 3, 4OR (Merge Union)1 32 44 33 34. Complex Behavior from Simple Structures 34 35. Storage Approaches... 35 36. Riak Search uses Consistent Hashingto store data onPartitions 36 37. Introduction to Consistent Hashing and PartitionsPartitions = 10 Number of Nodes = 5 Partitions per Node = 2 Replicas (NVal) = 2 37 38. Introduction to Consistent Hashing and PartitionsObject 38 39. Document Partitioning vs. Term Partitioning 39 40. ...and the Resulting Tradeoffs 40 41. Document Partitioning @ Index Time#1 Every dog has his day.41 42. Document Partitioning @ Query Time"dog OR cat"42 43. Term Partitioning @ Index Timeday, 1dog, 1#1 Every dog has his day.every, 1has, 1his, 1 43 44. Term Partitioning @ Index Time dog, 1day, 1has, 1his, 1every, 144 45. Term Partitioning @ Query Time "dog OR cat" 45 46. Tradeoffs... Document Partitioning Term Partitioning + Lower Latency Queries - Higher Latency Queries- Lower Throughput+ Higher Throughput- Lots of Disk Seeks- Hotspots in Ring(the "Obama" problem) 46 47. Riak Search: Term PartitioningTerm-partitioning is the most viable approach for our beta clients needs: high throughput on Really Big Datasets. Optimizations: Term splitting to reduce hot spots Bloom lters & caching to save query-time bandwidth Batching to save query-time & index-time bandwidth Support for either approach eventually.47 48. Part FourReview 48 49. Riak Search turns this... "Converse AND Shoes" WTF!? I'm aKV store!CLIENT RIAK 49 50. ...into this... "Converse AND Shoes" Gladly! CLIENT RIAK50 51. ...into this... "Converse AND Shoes" Keys or ObjectsCLIENTRIAK 51 52. ...while keeping operations easy. YourRiak Riak Application Search52 53. Thanks! Questions?Search Team:John Muellerleile - @jrecursiveRusty Klophaus - @rklophausKevin Smith - @kevsmithCurrently working with a small set of Beta users. Open-source release planned for Q3. www.basho.com