Mastering ElasticSearch With Ruby (+Tire)
-
Upload
gild-inc -
Category
Technology
-
view
352 -
download
1
description
Transcript of Mastering ElasticSearch With Ruby (+Tire)
MasteringElasticSearch w/ Ruby (+ Tire)
RubyConf 2013
Luca Bonmassar, Gild
Tuesday, 5 November 13
Who am I?I’m Luca Bonmassar (@openmosix) # 31 # Italian living in San Francisco (and Stockholm)
I work at Gild
I love building products [for fun, profit and boredom]
Tuesday, 5 November 13
Search - use case You’re building a product
User generated content
Let (other) users find or discover this content
Tuesday, 5 November 13
Search is NOT easy It usually starts as
but then you want to support AND, OR, NOT, double quotes on multiple fields so
and then it goes like
Tuesday, 5 November 13
And so it goes...
Tuesday, 5 November 13
AgendaLet’s define a “pet project”
Boilerplate (download, install, scaffold, config, bla bla bla, yadda, yadda, yadda)
Build a website w/ simple search
Build a more advanced search
What next (homework)
Tuesday, 5 November 13
Project! (for fun and no profit)
Tuesday, 5 November 13
Project: build a full text-search for RubyGems.org
Tuesday, 5 November 13
RubyGems Search Architecture
RubyGems.org
MongoDB
Rails App (Tire)
Elastic Search
Web Spider
Tuesday, 5 November 13
RubyGems crawler
github.com/openmosix/rubygems-crawler
1. Download all ruby gem names from https://rubygems.org/gems2. Use gems API to download each gem info: read JSON - write JSON (MongoDB)
Tuesday, 5 November 13
RubyGems Crawler II
Tuesday, 5 November 13
RubyGems Search v1
Code: github.com/openmosix/rubygems-searchSpoiler: http://rubyconf.bonmassar.it
Tuesday, 5 November 13
ElasticSearch
Tuesday, 5 November 13
ElasticSearch II Open source search+analytics engine
Distributed[near] Realtime searchMulti tenantBuilt on Apache Lucene
REST APIsJSON documents
Tuesday, 5 November 13
1. Download / set up a ES cluster
4. Query Index3. Index Data2. Define settings and data mapping (opt)
ElasticSearch III
Elastic Search MongoDB
Curl>
curl -‐X GET 'h.p://localhost:9200/ruby_gems/_search?from=0&size=25&pre.y'
{ "ok" : true, "status" : 200, "version" : { "number" : "0.90.5", "lucene_version" : "4.4" }, "tagline" : "You Know, for Search"}
Tuesday, 5 November 13
ES download + setup
> wget http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.6.tar.gz
> tar zxvf elasticsearch-0.90.6.tar.gz
> sudo mv elasticsearch-0.90.6 /usr/local/
Hint #1: you need JavaHint #2: you need Oracle Java
Tuesday, 5 November 13
ES config > ls elasticsearch-0.90.6/config/
Logging.yml: where to log, how much to logElasticsearch.yml: all server config. Defines:Name of the cluster (change it!!!)Node parameters (master/slave, store data/router)Sharing and # replicasPathsPluginsMemory (JVM, heap, memory locking)Network config“Gateway” (cluster backup)RecoveryDiscoverySlow log + GC log
Default options are good enough for dev env
Tuesday, 5 November 13
ES boot + test Start: bin/elasticsearch
Test: curl http://localhost:9200/{ "ok" : true, "status" : 200, "name" : "Iron Fist", "version" : { "number" : "0.90.6", "lucene_version" : "4.5.1" }, "tagline" : "You Know, for Search"}
Stop: curl -XPOST 'http://localhost:9200/_shutdown'
{"cluster_name":"elasticsearch","nodes":{"jm2Z3J4dSzebjJ7Px2fAJg":{"name":"Iron Fist"}}}
Tuesday, 5 November 13
Profit(you are now an ElasticSearch expert - go and tell the world)
Tuesday, 5 November 13
ElasticSearch operations
Create a “RubyGem” IndexDefines a “RubyGem” Index data mapping
Index data (e.g. upload data from MongoDB to ES index = POST)
Query (= GET)
Tuesday, 5 November 13
Tire now Re-Tire ;(
A ruby gem wrapping ElasticSearch REST APIs into a powerful ruby DSL
ActiveModel integration
Rake tasks and utilities to load and query ElasticSearch
Tuesday, 5 November 13
Tire setupcat “gem ‘tire’” > Gemfile && bundle install
> cat config/initializers/elasticsearch.rb...Tire::Configuration.url('http://localhost:9200')Tire.configure { logger "#{Rails.root}/log/elasticsearch-queries.log" } if ENV['ES_LOG']
Tuesday, 5 November 13
Define an ES index (with Tire DSL)
Tuesday, 5 November 13
Indexing Get a recordConvert it to JSON format (to_indexed_json)Push it to Elastic Search (.update_index)
...under the hood...
Tuesday, 5 November 13
Index (all data) Naive (POST on index for each record):
Use bulk updates:
...under the hood...
Tuesday, 5 November 13
Search I
Tuesday, 5 November 13
Search II
Tuesday, 5 November 13
Simple Search
Tuesday, 5 November 13
Highlight matches
Text
...add some CSS...
Tuesday, 5 November 13
Advanced Search
Tuesday, 5 November 13
Advanced Search II
Tuesday, 5 November 13
Advanced Search III
Tuesday, 5 November 13
Facets
Tuesday, 5 November 13
ES facets
Tuesday, 5 November 13
ES facets (running)
Tuesday, 5 November 13
Facets - UI
Tuesday, 5 November 13
Bonsai CoolI Search Suggesters (Did you mean... ?)
Tuesday, 5 November 13
Bonsai Cool II
“Similar to this” (aka “More Like This” API)
Tuesday, 5 November 13
Bonsai Cool III
Percolate API
Tuesday, 5 November 13
DeploymentI Run your own cluster
Some learnings: at least 3 nodesmemory profiling / GCinstall very good monitoring (github.com/karmi/elasticsearch-paramedic)more RAM is (always) betterCheck IOPS (if on AWS)
Pros:Total controlCheaper (lot cheaper)
Cons:Can be a nightmare / Require dedicated devop
Tuesday, 5 November 13
Deployment II ElasticSearch as a service
http://found.nohttp://searchly.comhttp://bonsai.io
Pros:Get cluster up & running in a minuteFocus on dev, not troubleshootingProfessional support
Cons:ExpensiveCan be in the wrong region / hosting providerExpensiveDid I say expensive?
Tuesday, 5 November 13
Thanks!Code: github.com/openmosix/rubygems-searchgithub.com/openmosix/rubygems-crawler
Demo (will be down by the end of rubyconf): http://rubyconf.bonmassar.it
Say “hi”:Luca Bonmassar - [email protected]
twitter.com/openmosix
github.com/openmosix
linkedin.com/in/lucabonmassar
Tuesday, 5 November 13