elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of...
Transcript of elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of...
![Page 1: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/1.jpg)
elasticsearch
![Page 2: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/2.jpg)
sundog-education.com
elasticsearchgetting set up
page02
Install Ubuntu
Install Virtualbox
Install Elasticsearch
![Page 3: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/3.jpg)
sundog-education.com
elasticsearchsystem requirements
page03
enable virtualization
Virtualization must be enabled in your BIOS settings. If you have “Hyper-V” virtualization as an option, turn it off.
beware avast
Avast anti-virus is known to conflict with Virtualbox.
![Page 4: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/4.jpg)
let’s do this.
![Page 5: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/5.jpg)
![Page 6: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/6.jpg)
elasticsearchbasics.
![Page 7: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/7.jpg)
sundog-education.com
logical concepts ofelasticsearch
page07
Documents are the things you’re searching for. They can
be more than text – any structured JSON data works. Every document has a unique
ID, and a type.
documents indices
An index powers search into all documents within a collection of types. They contain inverted
indices that let you search across everything within them
at once.
A type defines the schema and mapping shared by documents that represent the same sort of
thing. (A log entry, an encyclopedia article, etc.)
types
![Page 8: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/8.jpg)
sundog-education.com
what is aninverted index
page08
Document 1:Space: The final frontier. These are the voyages…
Document 2:He’s bad, he’s number one. He’s the space cowboy with the laser gun!
Inverted index
space: 1, 2the: 1, 2final: 1frontier: 1he: 2bad: 2…
![Page 9: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/9.jpg)
sundog-education.com
of course it’s notquite that simple.
page09
TF-IDF means Term Frequency * Inverse Document Frequency
Term Frequency is how often a term appears in a given document
Document Frequency is how often a term appears in all documents
Term Frequency / Document Frequency measures the relevance of a term in a document
![Page 10: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/10.jpg)
sundog-education.com
usingindices
page010
Most languages have specialized Elasticsearchlibraries to make it even easier.
client API’s
Web-based graphical UI’s such as Kibana let you interact with your indices and explore them without writing code.
analytic tools
Elasticsearch fundamenatallyworks via HTTP requests and JSON data. Any language or tool that can handle HTTP can use Elasticsearch.
RESTful API
![Page 11: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/11.jpg)
![Page 12: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/12.jpg)
how elasticsearch
scales
![Page 13: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/13.jpg)
sundog-education.com
an index is split into shards.
page013
1 2 3 …
Shakespeare
Documents are hashed to a particular shard.
Each shard may be on a different node in a cluster.Every shard is a self-contained Lucene index of its own.
![Page 14: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/14.jpg)
sundog-education.com
primary and replica shards
page014
Primary 1
Replica 0
Node 1
Replica 0
Replica 1
Node 2
Primary 0
Replica 1
Node 3
This index has two primary shards and two replicas.Your application should round-robin requests amongst nodes.
Write requests are routed to the primary shard, then replicatedRead requests are routed to the primary or any replica
![Page 15: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/15.jpg)
sundog-education.com
The number of primary shards cannot be changed later.
page015
PUT /testindex{"settings": {"number_of_shards": 3, "number_of_replicas": 1
}}
Not as bad as it sounds – you can add more replica shards for more read throughput.
Worst case you can re-index your data.
The number of shards can be set up front via a PUT command via REST / HTTP
![Page 16: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/16.jpg)
![Page 17: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/17.jpg)
quiz time
![Page 18: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/18.jpg)
01
The schema for your documents are defined by…
• The index• The type• The document itself
![Page 19: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/19.jpg)
01
The schema for your documents are defined by…
• The index• The type• The document itself
![Page 20: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/20.jpg)
02
What purpose do inverted indices serve?
• They allow you search phrases in reverse order
• They quickly map search terms to documents• They load balance search requests across
your cluster
![Page 21: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/21.jpg)
02
What purpose do inverted indices serve?
• They allow you search phrases in reverse order
• They quickly map search terms to documents• They load balance search requests across
your cluster
![Page 22: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/22.jpg)
03
An index configured for 5 primary shards and 3 replicas would have how many shards in total?
• 8• 15• 20
![Page 23: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/23.jpg)
03
An index configured for 5 primary shards and 3 replicas would have how many shards in total?
• 8• 15• 20
![Page 24: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/24.jpg)
04
Elasticsearch is built only for full-text search
of documents.
• true• false
![Page 25: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/25.jpg)
04
Elasticsearch is built only for full-text search
of documents.
• true• false
![Page 26: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/26.jpg)
![Page 27: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/27.jpg)
connecting to your cluster
![Page 28: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/28.jpg)
sundog-education.com
elasticsearchmore setup
page028
Install PuTTY (Windows)
Install openssh-server
Connect to your “cluster”
![Page 29: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/29.jpg)
![Page 30: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/30.jpg)
examiningmovielens
![Page 31: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/31.jpg)
movielens
movielens is a free dataset of movie ratings gathered from movielens.org.
It contains user ratings, movie metadata, and user metadata.
Let’s download and examine the data files from movielens.org
![Page 32: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/32.jpg)
![Page 33: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/33.jpg)
creatingmappings
![Page 34: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/34.jpg)
sundog-education.com
what is a mapping?
page034
curl -XPUT 127.0.0.1:9200/movies -d '
{
"mappings": {
"movie": {
"_all": {"enabled": false},
"properties" : {
"year" : {“type": "date"}
}
}
}
}'
a mapping is a schema definition.elasticsearch has reasonable defaults, but sometimes you need to customize them.
![Page 35: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/35.jpg)
sundog-education.com
commonmappings
page035
do you want this field indexed for full-text search? analyzed / not_analyzed / no
“properties”: {
“genre” : {
“index”: “not_analyzed”
}
}
field indexdefine your tokenizer and token filter. standard / whitespace / simple / english etc.
“properties”: {
“description” : {
“analyzer”: “english”
}
}
field analyzerstring, byte, short, integer, long, float, double, boolean, date
“properties”: {
“user_id” : {
“type”: “long”
}
}
field types
![Page 36: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/36.jpg)
sundog-education.com
more aboutanalyzers
page036
character filters remove HTML encoding, convert & to and
tokenizersplit strings on whitespace / punctuation / non-letters
token filterlowercasing, stemming, synonyms, stopwords
![Page 37: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/37.jpg)
sundog-education.com
choices foranalyzers
page037
standardsplits on word boundaries, removes punctuation,lowercases. good choice if language is unknown
simplesplits on anything that isn’t a letter, and lowercases
whitespacesplits on whitespace but doesn’t lowercase
language (i.e. english)accounts for language-specific stopwords and stemming
![Page 38: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/38.jpg)
![Page 39: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/39.jpg)
importone document
![Page 40: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/40.jpg)
insert
curl -XPUT 127.0.0.1:9200/movies/movie/109487 -d '
{"genre" : ["IMAX","Sci-Fi"],"title" : "Interstellar","year" : 2014}'
![Page 41: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/41.jpg)
![Page 42: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/42.jpg)
importmany
documents
![Page 43: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/43.jpg)
sundog-education.com
json bulk import
page043
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "135569" } }
{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":["Action", "Adventure", "Sci-Fi"] }
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "122886" } }
{ "id": "122886", "title" : "Star Wars: Episode VII - The Force Awakens", "year":2015 , "genre":["Action", "Adventure", "Fantasy", "Sci-Fi", "IMAX"] }
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "109487" } }
{ "id": "109487", "title" : "Interstellar", "year":2014 , "genre":["Sci-Fi", "IMAX"] }
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "58559" } }
{ "id": "58559", "title" : "Dark Knight, The", "year":2008 , "genre":["Action", "Crime", "Drama", "IMAX"] }
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "1924" } }
{ "id": "1924", "title" : "Plan 9 from Outer Space", "year":1959 , "genre":["Horror", "Sci-Fi"] } ‘
curl -XPUT 127.0.0.1:9200/_bulk –d ‘
![Page 44: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/44.jpg)
![Page 45: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/45.jpg)
updating documents
![Page 46: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/46.jpg)
sundog-education.com
versions
page046
Every document has a _version fieldElasticsearch documents are immutable.When you update an existing document:
a new document is created with an incremented _versionthe old document is marked for deletion
![Page 47: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/47.jpg)
sundog-education.com
partial update api
page047
curl -XPOST 127.0.0.1:9200/movies/movie/109487/_update -d '{
"doc": {"title": "Interstellar"
}}'
![Page 48: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/48.jpg)
deletingdocuments
![Page 49: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/49.jpg)
sundog-education.com
it couldn’t be easier.
Just use the DELETE method:
curl -XDELETE 127.0.0.1:9200/movies/movie/58559
![Page 50: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/50.jpg)
![Page 51: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/51.jpg)
sundog-education.com
elasticsearch
page051
insert, update, and then delete a movie of your choice into the movies index!exercise
![Page 52: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/52.jpg)
![Page 53: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/53.jpg)
dealing withconcurrency
![Page 54: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/54.jpg)
sundog-education.com
the problem
page054
Get view count for
page
Get view count for
page
10
10
Increment view count
for page
Increment view count
for page
11
11
But it should be 12!
![Page 55: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/55.jpg)
sundog-education.com
optimistic concurrency control
page055
Get view count for
page
Get view count for
page
10_version: 9
10_version: 9
Increment for _version=9
Increment for _version=9
11
Error! Try
again.
Use retry_on_conflicts=N to automatically retry.
![Page 56: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/56.jpg)
![Page 57: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/57.jpg)
controllingfull-text search
![Page 58: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/58.jpg)
sundog-education.com
using analyzers
page058
sometimes text fields should be exact-match• use no_analyzer mapping
search on analyzed fields will return anything remotely relevant
• depending on the analyzer, results will be case-insensitive, stemmed, stopwordsremoved, synonyms applied, etc.
• searches with multiple terms need not match them all
![Page 59: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/59.jpg)
![Page 60: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/60.jpg)
datamodeling
![Page 61: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/61.jpg)
sundog-education.com
strategies for relational data
page061
RATINGuserID
movieIDrating
MOVIEmovieID
titlegenres
normalized data
Look up
rating
Look up title
Minimizes storage space, makes it easy to change titlesBut requires two queries, and storage is cheap!
![Page 62: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/62.jpg)
sundog-education.com
strategies forrelational data
page062
RATINGuserIDratingtitle
Look up
rating
denormalized data
titles are duplicated, but only one query
![Page 63: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/63.jpg)
sundog-education.com
strategies forrelational data
page063
Star Wars
A New HopeEmpire
Strikes BackReturn of the
JediThe Force Awakens
Parent / Child Relationship
![Page 64: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/64.jpg)
![Page 65: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/65.jpg)
query-linesearch
![Page 66: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/66.jpg)
sundog-education.com
“query lite”
page066
/movies/movie/_search?q=title:star
/movies/movie/_search?q=+year:>2010+title:trek
![Page 67: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/67.jpg)
sundog-education.com
it’s not always simpler.
page067
spaces etc. need to be URL encoded.
/movies/movie/_search?q=%2Byear%3A%3E2010+%2Btitle%3Atrek
/movies/movie/_search?q=+year:>2010+title:trek
![Page 68: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/68.jpg)
sundog-education.com
and it can be dangerous.
page068
• cryptic and tough to debug• can be a security issue if exposed to end users• fragile – one wrong character and you’re hosed.
But it’s handy for quick experimenting.
![Page 69: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/69.jpg)
learn more.
this is formally called “URI Search”. Search for that on the Elasticsearchdocumentation.
it’s really quite powerful, but again is only appropriate for quick “curl tests”.
![Page 70: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/70.jpg)
![Page 71: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/71.jpg)
request bodysearch
![Page 72: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/72.jpg)
sundog-education.com
request body search
page072
how you’re supposed to do it
query DSL is in the request body as JSON(yes, a GET request can have a body!)
curl -XGET 127.0.0.1:9200/movies/movie/_search?pretty -d '{
"query": {"match": {
"title": "star"}
}}'
![Page 73: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/73.jpg)
sundog-education.com
queries and filters
page073
filters ask a yes/no question of your dataqueries return data in terms of relevance
use filters when you can – they are faster and cacheable.
![Page 74: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/74.jpg)
sundog-education.com
example: booleanquery with a filter
page074
curl -XGET 127.0.0.1:9200/movies/movie/_search?pretty -d'
{
"query":{
"bool": {
"must": {"term": {"title": "trek"}},
"filter": {"range": {"year": {"gte": 2010}}}
}
}
}'
![Page 75: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/75.jpg)
sundog-education.com
some types of filters
page075
term: filter by exact values{“term”: {“year”: 2014}}
terms: match if any exact values in a list match{“terms”: {“genre”: [“Sci-Fi”, “Adventure”] } }
range: Find numbers or dates in a given range (gt, gte, lt, lte){“range”: {“year”: {“gte”: 2010}}}
exists: Find documents where a field exists{“exists”: {“field”: “tags”}}
missing: Find documents where a field is missing{“missing”: {“field”: “tags”}}
bool: Combine filters with Boolean logic (must, must_not, should)
![Page 76: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/76.jpg)
sundog-education.com
some types of queries
page076
match_all: returns all documents and is the default. Normally used with a filter.{“match_all”: {}}
match: searches analyzed results, such as full text search.{“match”: {“title”: “star”}}
multi_match: run the same query on multiple fields.{“multi_match”: {“query”: “star”, “fields”: [“title”, “synopsis” ] } }
bool: Works like a bool filter, but results are scored by relevance.
![Page 77: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/77.jpg)
sundog-education.com
syntax reminder
page077
queries are wrapped in a “query”: { } block,filters are wrapped in a “filter”: { } block.
you can combine filters inside queries, or queries inside filters too.
curl -XGET 127.0.0.1:9200/movies/movie/_search?pretty -d'
{
"query":{
"bool": {
"must": {"term": {"title": "trek"}},
"filter": {"range": {"year": {"gte": 2010}}}
}
}
}'
![Page 78: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/78.jpg)
![Page 79: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/79.jpg)
phrasesearch
![Page 80: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/80.jpg)
sundog-education.com
phrase matching
page080
curl -XGET 127.0.0.1:9200/movies/movie/_search?pretty -d '
{
"query": {
"match_phrase": {
"title": "star wars"
}
}
}'
must find all terms, in the right order.
![Page 81: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/81.jpg)
sundog-education.com
slop
page081
curl -XGET 127.0.0.1:9200/movies/movie/_search?pretty -d '
{
"query": {
"match_phrase": {
"title": {"query": "star beyond", "slop": 1}
}
}
}'
order matters, but you’re OK with some words being in between the terms:
the slop represents how far you’re willing to let a term move to satisfy a phrase (in either direction!)
another example: “quick brown fox” would match “quick fox” with a slop of 1.
![Page 82: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/82.jpg)
sundog-education.com
proximity queries
page082
curl -XGET 127.0.0.1:9200/movies/movie/_search?pretty -d '
{
"query": {
"match_phrase": {
"title": {"query": "star beyond", "slop": 100}
}
}
}'
remember this is a query – results are sorted by relevance.
just use a really high slop if you want to get any documents that contain the words in your phrase, but want documents that have the words closer together scored higher.
![Page 83: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/83.jpg)
![Page 84: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/84.jpg)
sundog-education.com
elasticsearch
page084
search for “Star Wars” movies released after 1980, using both a URI search and a request body search.exercise
![Page 85: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/85.jpg)
![Page 86: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/86.jpg)
pagination
![Page 87: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/87.jpg)
sundog-education.com
specify “from” and “size”
page087
result 1result 2result 3result 4result 5result 6result 7result 8
from = 0, size= 3
from = 3, size= 3
![Page 88: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/88.jpg)
sundog-education.com
pagination syntax
page088
curl -XGET '127.0.0.1:9200/movies/movie/_search?size=2&from=2&pretty'
curl -XGET 127.0.0.1:9200/movies/movie/_search?pretty -d'
{
"from": 2,
"size": 2,
"query": {"match": {"genre": "Sci-Fi"}}
}'
![Page 89: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/89.jpg)
sundog-education.com
beware
page089
deep pagination can kill performance.
every result must be retrieved, collected, and sorted.
enforce an upper bound on how many results you’ll return to users.
![Page 90: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/90.jpg)
![Page 91: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/91.jpg)
sorting
![Page 92: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/92.jpg)
sundog-education.com
sorting your results is usually quite simple.
page092
curl -XGET '127.0.0.1:9200/movies/movie/_search?sort=year&pretty'
![Page 93: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/93.jpg)
sundog-education.com
unless you’re dealing with strings.
page093
A string field that is analyzed for full-text search can’t be used to sort documents
This is because it exists in the inverted index as individual terms, not as the entire string.
![Page 94: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/94.jpg)
sundog-education.com
If you need to sort on an analyzed field, map a not_analyzed copy.
page094
curl -XPUT 127.0.0.1:9200/movies/ -d '
{
"mappings": {
"movie": {
"_all": {"enabled": false},
"properties" : {
"title": {
"type" : "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
![Page 95: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/95.jpg)
sundog-education.com
Now you can sort on the not_analyzed “raw” field.
page095
curl -XGET '127.0.0.1:9200/movies/movie/_search?sort=title.raw&pretty'
sadly, you cannot change the mapping on an existing index.
you’d have to delete it, set up a new mapping, and re-index it.
like the number of shards, this is something you should thinkabout before importing data into your index.
![Page 96: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/96.jpg)
![Page 97: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/97.jpg)
more withfilters
![Page 98: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/98.jpg)
sundog-education.com
another filtered query
page098
curl -XGET 127.0.0.1:9200/movies/_search?pretty -d'
{
"query":{
"bool": {
"must": {"match": {"genre": "Sci-Fi"}},
"must_not": {"match": {"title": "trek"}},
"filter": {"range": {"year": {"gte": 2010, "lt": 2015}}}
}
}
}'
![Page 99: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/99.jpg)
![Page 100: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/100.jpg)
sundog-education.com
elasticsearch
page0100
search for science fiction movies before 1960, sorted by title.exercise
![Page 101: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/101.jpg)
![Page 102: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/102.jpg)
fuzziness
![Page 103: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/103.jpg)
sundog-education.com
fuzzy matches
page0103
a way to account for typos and misspellings
the levenshtein edit distance accounts for:
• substitutions of characters (interstellar -> intersteller)• insertions of characters (interstellar -> insterstellar)• deletion of characters (interstellar -> interstelar)
all of the above have an edit distance of 1.
![Page 104: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/104.jpg)
sundog-education.com
the fuzzinessparameter
page0104
curl -XGET 127.0.0.1:9200/movies/movie/_search?pretty -d '
{
"query": {
"fuzzy": {
"title": {"value": "intrsteller", "fuzziness": 2}}
}
}'
![Page 105: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/105.jpg)
sundog-education.com
AUTO fuzziness
page0105
fuzziness: AUTO
• 0 for 1-2 character strings• 1 for 3-5 character strings• 2 for anything else
![Page 106: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/106.jpg)
![Page 107: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/107.jpg)
partialmatching
![Page 108: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/108.jpg)
sundog-education.com
prefix queries on strings
page0108
curl -XGET '127.0.0.1:9200/movies/movie/_search?pretty' -d '
{
"query": {
"prefix": {
"year": "201"
}
}
}'
If we remapped year to be a string…
![Page 109: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/109.jpg)
sundog-education.com
wildcard queries
page0109
curl -XGET '127.0.0.1:9200/movies/movie/_search?pretty' -d '
{
"query": {
"wildcard": {
"year": "1*"
}
}
}'
“regexp” queries also exist.
![Page 110: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/110.jpg)
![Page 111: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/111.jpg)
search asyou type
![Page 112: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/112.jpg)
sundog-education.com
query-time search-as-you-type
page0112
curl -XGET '127.0.0.1:9200/movies/movie/_search?pretty' -d '
{
"query": {
"match_phrase_prefix": {
"title": {
"query": "star trek",
"slop": 10
}
}
}
}'
abusing sloppiness…
![Page 113: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/113.jpg)
sundog-education.com
index-time with N-grams
page0113
“star”:
unigram: [ s, t, a, r ]bigram: [ st, ta, ar ]trigram: [ sta, tar ]4-gram: [ star ]
edge n-grams are built only on the beginning of each term.
![Page 114: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/114.jpg)
sundog-education.com
indexing n-grams
page0114
curl -XPUT '127.0.0.1:9200/movies?pretty' -d '
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}'
1. Create an “autocomplete” analyzer
![Page 115: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/115.jpg)
sundog-education.com
now map your field with it
page0115
curl -XPUT '127.0.0.1:9200/movies/_mapping/movie?pretty' -d '
{
"movie": {
"properties" : {
"title": {
"type" : "string",
"analyzer": "autocomplete"
}
}
}
}'
![Page 116: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/116.jpg)
sundog-education.com
but only use n-grams on the index side!
page0116
curl -XGET 127.0.0.1:9200/movies/movie/_search?pretty -d '
{
"query": {
"match": {
"title": {
"query": "sta",
"analyzer": "standard"
}
}
}
}'
otherwise our query will also get split into n-grams, and we’ll get results foreverything that matches ‘s’, ‘t’, ‘a’, ‘st’, etc.
![Page 117: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/117.jpg)
sundog-education.com
completion suggesters
page0117
You can also upload a list of all possible completions ahead of timeusing completion suggesters.
![Page 118: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/118.jpg)
![Page 119: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/119.jpg)
importingdata
![Page 120: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/120.jpg)
sundog-education.com
you can import from just about anything
page0120
stand-alone scripts can submit bulk documents via REST API
logstash and beats can stream data from logs, S3, databases, and more
AWS systems can stream in data via lambda or kinesis firehose
kafka, spark, and more have Elasticsearch integration add-ons
![Page 121: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/121.jpg)
importingvia script / json
![Page 122: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/122.jpg)
sundog-education.com
hack together a script
page0122
• read in data from some distributed filesystem• transform it into JSON bulk inserts• submit via HTTP / REST to your elasticsearch cluster
![Page 123: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/123.jpg)
sundog-education.com
for completeness:
page0123
import csvimport re
csvfile = open('ml-latest-small/movies.csv', 'r')
reader = csv.DictReader( csvfile )for movie in reader:
print ("{ \"create\" : { \"_index\": \"movies\", \"_type\": \"movie\", \"_id\" : \"" , movie['movieId'], "\" } }", sep='')title = re.sub(" \(.*\)$", "", re.sub('"','', movie['title']))year = movie['title'][-5:-1]if (not year.isdigit()):
year = "2016"genres = movie['genres'].split('|')print ("{ \"id\": \"", movie['movieId'], "\", \"title\": \"", title, "\", \"year\":", year, ", \"genre\":[", end='', sep='')for genre in genres[:-1]:
print("\"", genre, "\",", end='', sep='')print("\"", genres[-1], "\"", end = '', sep='')print ("] }")
![Page 124: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/124.jpg)
![Page 125: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/125.jpg)
importingvia client api’s
![Page 126: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/126.jpg)
sundog-education.com
a less hacky script.
page0126
es = elasticsearch.Elasticsearch()
es.indices.delete(index="ratings",ignore=404)
deque(helpers.parallel_bulk(es,readRatings(),index="ratings",doc_type
es.indices.refresh()
free elasticsearch client libraries are available for pretty much any language.
• java has a client maintained by elastic.co• python has an elasticsearch package• elasticsearch-ruby• several choices for scala• elasticsearch.pm module for perl
You don’t have to wrangle JSON.
![Page 127: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/127.jpg)
sundog-education.com
for completeness:
page0127
import csvfrom collections import dequeimport elasticsearchfrom elasticsearch import helpers
def readMovies():csvfile = open('ml-latest-small/movies.csv', 'r')
reader = csv.DictReader( csvfile )
titleLookup = {}
for movie in reader:titleLookup[movie['movieId']] = movie['title']
return titleLookup
def readRatings():csvfile = open('ml-latest-small/ratings.csv', 'r')
titleLookup = readMovies()
reader = csv.DictReader( csvfile )for line in reader:
rating = {}rating['user_id'] = int(line['userId'])rating['movie_id'] = int(line['movieId'])rating['title'] = titleLookup[line['movieId']]rating['rating'] = float(line['rating'])rating['timestamp'] = int(line['timestamp'])yield rating
es = elasticsearch.Elasticsearch()
es.indices.delete(index="ratings",ignore=404)deque(helpers.parallel_bulk(es,readRatings(),index="ratings",doc_type="rating"), maxlen=0)es.indices.refresh()
![Page 128: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/128.jpg)
![Page 129: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/129.jpg)
sundog-education.com
elasticsearch
page0129
write a script to import the tags.csv data from ml-latest-small into a new “tags” index.exercise
![Page 130: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/130.jpg)
sundog-education.com
my solution
page0130
import csvfrom collections import dequeimport elasticsearchfrom elasticsearch import helpers
def readMovies():csvfile = open('ml-latest-small/movies.csv', 'r')
reader = csv.DictReader( csvfile )
titleLookup = {}
for movie in reader:titleLookup[movie['movieId']] = movie['title']
return titleLookup
def readTags():csvfile = open('ml-latest-small/tags.csv', 'r')
titleLookup = readMovies()
reader = csv.DictReader( csvfile )for line in reader:
tag = {}tag['user_id'] = int(line['userId'])tag['movie_id'] = int(line['movieId'])tag['title'] = titleLookup[line['movieId']]tag['tag'] = line['tag']tag['timestamp'] = int(line['timestamp'])yield tag
es = elasticsearch.Elasticsearch()
es.indices.delete(index="tags",ignore=404)deque(helpers.parallel_bulk(es,readTags(),index="tags",doc_type="tag"), maxlen=0)es.indices.refresh()
![Page 131: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/131.jpg)
![Page 132: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/132.jpg)
introducinglogstash
![Page 133: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/133.jpg)
sundog-education.com
what logstashis for
page0133
files s3 beats kafka …
logstash
elastic-search
mongodb
hadoopaws …
![Page 134: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/134.jpg)
sundog-education.com
it’s more than plumbing
page0134
• logstash parses, transforms, and filters data as it passes through.• it can derive structure from unstructured data• it can anonymize personal data or exclude it entirely• it can do geo-location lookups• it can scale across many nodes• it guarantees at-least-once delivery• it absorbs throughput from load spikes
See https://www.elastic.co/guide/en/logstash/current/filter-plugins.htmlfor the huge list of filter plugins.
![Page 135: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/135.jpg)
sundog-education.com
huge variety of input source events
page0135
elastic beats – cloudwatch – couchdb – drupal – elasticsearch –windows event log – shell output – local files – ganglia – gelf –
gemfire – random generator – github – google pubsub – graphite –heartbeats – heroku – http – imap – irc – jdbc – jmx – kafka –lumberjack – meetup – command pipes – puppet – rabbitmq –rackspace cloud queue – redis – relp – rss – s3 – salesforce –
snmp – sqlite – sqs – stdin – stomp – syslog – tcp – twitter – udp– unix sockets – varnish log – websocket – wmi – xmpp – zenoss
– zeromq
![Page 136: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/136.jpg)
sundog-education.com
huge variety of output “stash” destinations
page0136
boundary – circonus – cloudwatch – csv – datadoghq –elasticsearch – email – exec – local file – ganglia – gelf –bigquery – google cloud storage – graphite – graphtastic –hipchat – http – influxdb – irc – jira – juggernaut – kafka –librato – loggly – lumberjack – metriccatcher – mongodb –nagios – new relic insights – opentsdb – pagerduty – pipe
to stdin – rabbitmq – rackspace cloud queue – redis –redmine – riak – riemann – s3 – sns – solr – sqs – statsd
– stdout – stomp – syslog – tcp – udp – webhdfs –websocket – xmpp – zabbix - zeromq
![Page 137: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/137.jpg)
sundog-education.com
typical usage
page0137
beats filesWeb logs
elasticsearch
Parse into structured fields, geolocate logstash
or
![Page 138: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/138.jpg)
![Page 139: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/139.jpg)
installinglogstash
![Page 140: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/140.jpg)
sundog-education.com
installinglogstash
page0140
sudo apt-get updatesudo apt-get install logstash
![Page 141: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/141.jpg)
sundog-education.com
configuring logstash
page0141
input {
file {
path => "/home/fkane/access_log“
start_position => "beginning"ignore_older => 0
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
stdout {
codec => rubydebug
}
}
sudo vi /etc/logstash/conf.d/logstash.conf
![Page 142: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/142.jpg)
sundog-education.com
runninglogstash
page0142
cd /usr/share/logstash/
sudo bin/logstash -f /etc/logstash/conf.d/logstash.conf
![Page 143: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/143.jpg)
![Page 144: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/144.jpg)
logstashwith mysql
![Page 145: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/145.jpg)
sundog-education.com
install ajdbc driver
page0145
get a mysql connector from https://dev.mysql.com/downloads/connector/j/
wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.42.zip
unzip mysql-connector-java-5.1.42.zip
![Page 146: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/146.jpg)
sundog-education.com
configurelogstash
page0146
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/movielens"
jdbc_user => "root"
jdbc_password => “password"
jdbc_driver_library => "/home/fkane/mysql-connector-java-5.1.42/mysql-connector-java-5.1.42-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT * FROM movies"
}
}
![Page 147: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/147.jpg)
![Page 148: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/148.jpg)
logstashwith s3
![Page 149: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/149.jpg)
sundog-education.com
what is s3
page0149
amazon web services’ simple storage service
cloud-based distributed storage system
![Page 150: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/150.jpg)
sundog-education.com
integration iseasy-peasy.
page0150
input {
s3 {
bucket => "sundog-es"
access_key_id => "AKIAIS****C26Y***Q"
secret_access_key => "d*****FENOXcCuNC4iTbSLbibA*****eyn****"
}
}
![Page 151: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/151.jpg)
![Page 152: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/152.jpg)
logstashwith kafka
![Page 153: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/153.jpg)
sundog-education.com
what iskafka
page0153
• apache kafka• open-source stream processing platform• high throughput, low latency• publish/subscribe• process streams• store streams
has a lot in common with logstash, really.
![Page 154: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/154.jpg)
sundog-education.com
integration iseasy-peasy.
page0154
input {kafka {
bootstrap_servers => "localhost:9092"topics => ["kafka-logs"]
}}
![Page 155: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/155.jpg)
![Page 156: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/156.jpg)
elasticsearchwith spark
![Page 157: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/157.jpg)
sundog-education.com
what isapache spark
page0157
• “a fast and general engine for large-scale data processing”• a faster alternative to mapreduce• spark applications are written in java, scala, python, or r• supports sql, streaming, machine learning, and graph processing
flink is nipping at spark’s heels, and can also integrate with elasticsearch.
![Page 158: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/158.jpg)
sundog-education.com
integration withelasticsearch-spark
page0158
./spark-2.1.1-bin-hadoop2.7/bin/spark-shell --packages org.elasticsearch:elasticsearch-spark-20_2.11:5.4.3
import org.elasticsearch.spark.sql._
case class Person(ID:Int, name:String, age:Int, numFriends:Int)
def mapper(line:String): Person = {
val fields = line.split(',')
val person:Person = Person(fields(0).toInt, fields(1), fields(2).toInt, fields(3).toInt)
return person
}
import spark.implicits._
val lines = spark.sparkContext.textFile("fakefriends.csv")
val people = lines.map(mapper).toDF()
people.saveToEs("spark/people")
![Page 159: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/159.jpg)
![Page 160: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/160.jpg)
sundog-education.com
elasticsearch
page0160
write spark code that imports movie ratings from ml-latest-small into a “spark” index with a “ratings” type.exercise
![Page 161: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/161.jpg)
sundog-education.com
integration withelasticsearch-spark
page0161
./spark-2.1.1-bin-hadoop2.7/bin/spark-shell --packages org.elasticsearch:elasticsearch-spark-20_2.11:5.4.3
import org.elasticsearch.spark.sql._
case class Person(ID:Int, name:String, age:Int, numFriends:Int)
def mapper(line:String): Person = {
val fields = line.split(',')
val person:Person = Person(fields(0).toInt, fields(1), fields(2).toInt, fields(3).toInt)
return person
}
import spark.implicits._
val lines = spark.sparkContext.textFile("fakefriends.csv")
val people = lines.map(mapper).toDF()
people.saveToEs("spark/people")
![Page 162: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/162.jpg)
sundog-education.com
dealing with the header line
page0162
val header = lines.first()
val data = lines.filter(row => row != header)
![Page 163: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/163.jpg)
sundog-education.com
my solution
page0163
import org.elasticsearch.spark.sql._
case class Rating(userID:Int, movieID:Int, rating:Float, timestamp:Int)
def mapper(line:String): Rating= {
val fields = line.split(',')
val rating:Rating = Rating(fields(0).toInt, fields(1).toInt, fields(2).toFloat, fields(3).toInt)
return rating
}
import spark.implicits._
val lines = spark.sparkContext.textFile("ml-latest-small/ratings.csv")
val header = lines.first()
val data = lines.filter(row => row != header)
val ratings= data.map(mapper).toDF()
ratings.saveToEs("spark/ratings")
![Page 164: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/164.jpg)
![Page 165: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/165.jpg)
aggregations
![Page 166: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/166.jpg)
sundog-education.com
it’s not just for search anymore
page0166
4.3
2.5
3.5
4.5
q1 q2 q3 q4
metrics
average, stats, min/max, percentiles,
etc.
buckets
histograms, ranges, distances, significant
terms, etc.
pipelines
moving average, average bucket,
cumulative sum, etc.
matrix
matrix stats
![Page 167: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/167.jpg)
sundog-education.com
aggregationsare amazing
elasticsearch aggregations can sometimes take the place of hadoop /
spark / etc – and return results instantly!
![Page 168: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/168.jpg)
sundog-education.com
it gets better
you can even nest aggregations together!
![Page 169: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/169.jpg)
sundog-education.com
let’s learnby example
page0169
curl -XGET
'127.0.0.1:9200/ratings/rating/_search?size=0&pretty' -d ‘
{
"aggs": {
"ratings": {
"terms": {
"field": "rating"
}
}
}
}'
bucket by rating value:
![Page 170: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/170.jpg)
sundog-education.com
let’s learnby example
page0170
curl -XGET '127.0.0.1:9200/ratings/rating/_search?size=0&pretty' -d ‘{
"query": {"match": {
"rating": 5.0}
},"aggs" : {
"ratings": {"terms": {
"field" : "rating"}
}}
}'
count only 5-star ratings:
![Page 171: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/171.jpg)
sundog-education.com
let’s learnby example
page0171
curl -XGET '127.0.0.1:9200/ratings/rating/_search?size=0&pretty' -d ‘{
"query": {"match_phrase": {
"title": "Star Wars Episode IV"}
},"aggs" : {
"avg_rating": {"avg": {
"field" : "rating"}
}}
}'
average rating for Star Wars:
![Page 172: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/172.jpg)
![Page 173: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/173.jpg)
histograms
![Page 174: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/174.jpg)
sundog-education.com
what is ahistogram
page0174
display totals of documents bucketed by some interval range
![Page 175: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/175.jpg)
sundog-education.com
display ratings by 1.0-rating intervals
page0175
curl -XGET
'127.0.0.1:9200/ratings/rating/_search?size=0&pretty' -d ‘
{
"aggs" : {
"whole_ratings": {
"histogram": {
"field": "rating",
"interval": 1.0
}
}
}
}'
![Page 176: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/176.jpg)
sundog-education.com
count up movies from each decade
page0176
curl -XGET
'127.0.0.1:9200/movies/movie/_search?size=0&pretty' -d ‘
{
"aggs" : {
"release": {
"histogram": {
"field": "year",
"interval": 10
}
}
}
}
![Page 177: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/177.jpg)
![Page 178: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/178.jpg)
time series
![Page 179: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/179.jpg)
sundog-education.com
dealing with time
page0179
Elasticsearch can bucket and aggregate fields that contain time and dates properly. You can aggregate by “year” or “month” and it knows about calendar rules.
![Page 180: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/180.jpg)
sundog-education.com
break down website hits by hour:
page0180
curl -XGET '127.0.0.1:9200/logstash-
2015.12.04/logs/_search?size=0&pretty' -d ‘
{
"aggs" : {
"timestamp": {
"date_histogram": {
"field": "@timestamp",
"interval": "hour"
}
}
}
}'
![Page 181: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/181.jpg)
sundog-education.com
when does google scrape me?
page0181
curl -XGET '127.0.0.1:9200/logstash-
2015.12.04/logs/_search?size=0&pretty' -d ‘
{
"query" : {
"match": {
"agent": "Googlebot"
}
},
"aggs" : {
“timestamp": {
"date_histogram": {
"field": "@timestamp",
"interval": "hour"
}
}
}
}'
![Page 182: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/182.jpg)
![Page 183: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/183.jpg)
sundog-education.com
elasticsearch
page0183
when did my site go down on december 4, 2015? (bucket 500 status codes by the minute in logstash-2015.12.04/logs)
exercise
![Page 184: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/184.jpg)
sundog-education.com
my solution
page0184
GET /logstash-2015.12.04/logs/_search?size=0&pretty
{
"query" : {
"match": {
"response": "500"
}
},
"aggs" : {
"timestamp": {
"date_histogram": {
"field": "@timestamp",
"interval": "minute"
}
}
}
}
![Page 185: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/185.jpg)
![Page 186: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/186.jpg)
nested aggregations
![Page 187: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/187.jpg)
sundog-education.com
nested aggregations
Aggregations can be nested for more powerful queries.
For example, what’s the average rating for each Star Wars movie?
Let’s undertake this as an activity – and show you what can go wrong along the way.
![Page 188: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/188.jpg)
sundog-education.com
for reference, here’s the final query
page0188
curl -XGET '127.0.0.1:9200/ratings/rating/_search?size=0&pretty' -d ‘
{
"query": {
"match_phrase": {
"title": "Star Wars"
}
},
"aggs" : {
"titles": {
"terms": {
"field": "title.raw"
},
"aggs": {
"avg_rating": {
"avg": {
"field" : "rating"
}
}
}
}
}
}'
![Page 189: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/189.jpg)
![Page 190: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/190.jpg)
usingkibana
![Page 191: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/191.jpg)
sundog-education.com
what iskibana
page0191
![Page 192: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/192.jpg)
sundog-education.com
installing kibana
page0192
sudo apt-get install kibanasudo vi /etc/kibana/kibana.yml
change server.host to 0.0.0.0add xpack.security.enabled: false
sudo /bin/systemctl daemon-reloadsudo /bin/systemctl enable kibana.servicesudo /bin/systemctl start kibana.service
kibana is now available on port 5601
![Page 193: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/193.jpg)
![Page 194: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/194.jpg)
playing withkibana
![Page 195: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/195.jpg)
let’s analyze the works of william shakespeare…
because we can.
![Page 196: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/196.jpg)
![Page 197: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/197.jpg)
sundog-education.com
elasticsearch
page0197
find the longest shakespeare plays –create a vertical bar chart that aggregates the count of documents by play name in descending order.
exercise
![Page 198: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/198.jpg)
![Page 199: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/199.jpg)
usingfilebeat
![Page 200: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/200.jpg)
sundog-education.com
filebeat is a lightweight shipper for logs
page0200
log log log
filebeat
logstash
elastic-search
filebeat maintains a read pointer on the logs.every log line acts like a queue.
logs can be from apache, nginx, auditd, or mysql
logstash and filebeat can communicate to maintain “backpressure” when things back up
filebeat can optionally talk directly to elasticsearch.when using logstash, elasticsearch is just one of manypossible destinations!
kibana
![Page 201: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/201.jpg)
sundog-education.com
this is called the elastic stack
page0201
prior to beats, you’d hear about the “ELK stack” –elasticsearch, logstash, kibana.
![Page 202: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/202.jpg)
sundog-education.com
why use filebeat and logstashand not just one or the other?
page0202
• it won’t let you overload your pipeline.
• you get more flexibility on scaling your cluster.
![Page 203: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/203.jpg)
![Page 204: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/204.jpg)
installing filebeat
![Page 205: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/205.jpg)
sundog-education.com
installing and testing filebeat
page0205
sudo apt-get update && sudo apt-get install filebeat
cd /usr/share/elasticsearch/sudo bin/elasticsearch-plugin install ingest-geoipsudo bin/elasticsearch-plugin install ingest-user-agentsudo /bin/systemctl stop elasticsearch.servicesudo /bin/systemctl start elasticsearch.service
sudo vi /etc/filebeat/filebeat.yml
Comment out existing log section, add at the bottom:
filebeat.modules:- module: apache2access:var.paths: ["/home/fkane/logs/access*"]
error:var.paths: ["/home/fkane/logs/error*"]
cd /usr/share/filebeatsudo scripts/import_dashboardssudo /bin/systemctl stop kibana.servicesudo /bin/systemctl start kibana.service
Make /home/<username>/logscd into itwget http://media.sundog-soft.com/es/access_logsudo /bin/systemctl start filebeat.service
![Page 206: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/206.jpg)
![Page 207: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/207.jpg)
analyzing logs with kibana
![Page 208: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/208.jpg)
![Page 209: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/209.jpg)
sundog-education.com
elasticsearch
page0209
between 9:30 – 10:00 AM on May 4, 2017, which cities were generating 404 errors?exercise
![Page 210: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/210.jpg)
![Page 211: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/211.jpg)
elasticsearchoperations
![Page 212: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/212.jpg)
choosing your shards
![Page 213: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/213.jpg)
sundog-education.com
an index is split into shards.
page0213
1 2 3 …
Shakespeare
Documents are hashed to a particular shard.
Each shard may be on a different node in a cluster.Every shard is a self-contained Lucene index of its own.
![Page 214: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/214.jpg)
sundog-education.com
primary and replica shards
page0214
Primary 1
Replica 0
Node 1
Replica 0
Replica 1
Node 2
Primary 0
Replica 1
Node 3
This index has two primary shards and two replicas.Your application should round-robin requests amongst nodes.
Write requests are routed to the primary shard, then replicatedRead requests are routed to the primary or any replica
![Page 215: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/215.jpg)
sundog-education.com
how many shards do i need?
page0215
• you can’t add more shards later without re-indexing• but shards aren’t free – you can just make 1,000 of
them and stick them on one node at first.• you want to overallocate, but not too much• consider scaling out in phases, so you have time to
re-index before you hit the next phase
![Page 216: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/216.jpg)
sundog-education.com
really? that’s kind of hand-wavy.
page0216
• the “right” number of shards depends on your data and your application. there’s no secret formula.
• start with a single server using the same hardware you use in production, with one shard and no replication.
• fill it with real documents and hit it with real queries.• push it until it breaks – now you know the capacity of
a single shard.
![Page 217: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/217.jpg)
sundog-education.com
remember replica shards can be added
page0217
• read-heavy applications can add more replica shards without re-indexing.• note this only helps if you put the new replicas on extra hardware!
![Page 218: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/218.jpg)
![Page 219: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/219.jpg)
adding an index
![Page 220: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/220.jpg)
sundog-education.com
creating a new index
page0220
PUT /new_index{
“settings”: {“number_of_shards”: 10,“number_of_replicas”: 1
}}
You can use index templates to automatically apply mappings, analyzers, aliases, etc.
![Page 221: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/221.jpg)
sundog-education.com
multiple indices as a scaling strategy
page0221
• make a new index to hold new data• search both indices• use index aliases to make this easy to do
![Page 222: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/222.jpg)
sundog-education.com
multiple indices as a scaling strategy
page0222
• with time-based data, you can have one index per time frame
• common strategy for log data where you usually just want current data, but don’t want to delete old data either
• again you can use index aliases, ie “logs_current”, “last_3_months”, to point to specific indices as they rotate
![Page 223: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/223.jpg)
sundog-education.com
alias rotation example
page0223
POST /_aliases{
“actions”: [{ “add”: { “alias”: “logs_current”, “index”: “logs_2017_06” }},{ “remove”: { “alias”: “logs_current”, “index”: “logs_2017_05” }},{ “add”: { “alias”: “logs_last_3_months”, “index”: “logs_2017_06” }},{ “remove”: { “alias”: “logs_last_3_months”, “index”: “logs_2017_03” }}
]}
optionally….DELETE /logs_2017_03
![Page 224: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/224.jpg)
![Page 225: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/225.jpg)
choosing your hardware
![Page 226: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/226.jpg)
RAM is likely your bottleneck
64GB per machine is the sweet spot (32GB to elasticsearch, 32GB to the OS / disk cache for lucene)
under 8GB not recommended
![Page 227: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/227.jpg)
sundog-education.com
other hardware considerations
page0227
• fast disks are better – SSD’s if possible (with deadline or noop i/o scheduler)
• user RAID0 – your cluster is already redundant• cpu not that important• need a fast network• don’t use NAS• use medium to large configurations; too big is bad, and too
many small boxes is bad too.
![Page 228: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/228.jpg)
![Page 229: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/229.jpg)
heap sizing
![Page 230: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/230.jpg)
sundog-education.com
your heap size is wrong
page0230
the default heap size is only 1GB!
half or less of your physical memory should be allocated to elasticsearch• the other half can be used by lucene for caching• if you’re not aggregating on analyzed string fields, consider using less
than half for elasticsearch• smaller heaps result in faster garbage collection and more
memory for caching
export ES_HEAP_SIZE=10g
orES_JAVA_OPTS=“-Xms10g –Xmx10g” ./bin/elasticsearch
don’t cross 32GB! pointers blow up then.
![Page 231: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/231.jpg)
![Page 232: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/232.jpg)
monitoring with x-pack
![Page 233: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/233.jpg)
sundog-education.com
what is x-pack?
page0233
• an elastic stack extension• security, monitoring, alerting, reporting, graph, and machine
learning• formerly shield / watcher / marvel• only parts can be had for free – requires a paid license or trial
otherwise
![Page 234: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/234.jpg)
sundog-education.com
let’s install x-packand mess around with it.
page0234
cd /usr/share/elasticsearchsudo bin/elasticsearch-plugin install x-pack
sudo vi /etc/elasticsearch/elasticsearch.yml(Add xpack.security.enabled:false)
sudo /bin/systemctl stop elasticsearch.service
sudo /bin/systemctl start elasticsearch.servicecd /usr/share/kibana/sudo -u kibana bin/kibana-plugin install x-packsudo /bin/systemctl stop kibana.service
sudo /bin/systemctl start kibana.service
![Page 235: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/235.jpg)
![Page 236: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/236.jpg)
failoverin action
![Page 237: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/237.jpg)
sundog-education.com
in this activity, we’ll…
page0237
• Set up a second elasticsearch node on our virtual machine• Observe how elasticsearch automatically expands across this new node• Stop our original node, and observe everything move to the new one• Restart our original node, and observe everything going back to normal… automatically!
![Page 238: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/238.jpg)
![Page 239: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/239.jpg)
using snapshots
![Page 240: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/240.jpg)
sundog-education.com
snapshots let you back up your indices
page0240
store backups to NAS, Amazon S3, HDFS, Azure
smart enough to only store changes since last snapshot
![Page 241: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/241.jpg)
sundog-education.com
create a repository
page0241
PUT _snapshot/backup-repo{"type": "fs","settings": {
"location": "/home/<user>/backups/backup-repo"}}
add it into elasticsearch.yml:path.repo: ["/home/<user>/backups"]
![Page 242: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/242.jpg)
sundog-education.com
using snapshots
page0242
snapshot all open indices:PUT _snapshot/backup-repo/snapshot-1
get information about a snapshot:GET _snapshot/backup-repo/snapshot-1
monitor snapshot progress:GET _snapshot/backup-repo/snapshot-1/_status
restore a snapshot of all indices:POST /_all/_closePOST _snapshot/backup-repo/snapshot-1/_restore
![Page 243: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/243.jpg)
![Page 244: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/244.jpg)
rollingrestarts
![Page 245: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/245.jpg)
sundog-education.com
restarting your cluster
page0245
sometimes you have to… OS updates, elasticsearch version updates, etc.
to make this go quickly and smoothly, you want to disable index reallocation while doing this.
![Page 246: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/246.jpg)
sundog-education.com
rolling restart procedure
page0246
1. stop indexing new data if possible2. disable shard allocation3. shut down one node 4. perform your maintenance on it and restart, confirm it joins the cluster.5. re-enable shard allocation6. wait for the cluster to return to green status7. repeat steps 2-6 for all other nodes8. resume indexing new data
![Page 247: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/247.jpg)
sundog-education.com
cheat sheet
page0247
PUT _cluster/settings{"transient": {"cluster.routing.allocation.enable": "none"
}}
sudo /bin/systemctl stop elasticsearch.service
PUT _cluster/settings{"transient": {"cluster.routing.allocation.enable": "all"
}}
Disable shard allocation
Stop elasticsearch safely
Enable shard allocation
![Page 248: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/248.jpg)
let’spractice
![Page 249: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/249.jpg)
![Page 250: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/250.jpg)
amazon elasticsearch
service
![Page 251: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/251.jpg)
sundog-education.com
let’s walk through setting this up
page0251
amazon es lets you quickly rent and configure an elasticsearch cluster
this costs real money! Just watch if that bothers you
the main thing that’s different with amazon es is security
![Page 252: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/252.jpg)
![Page 253: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/253.jpg)
amazon es+logstash
![Page 254: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/254.jpg)
sundog-education.com
let’s do something a little more complicated
page0254
• set up secure access to your cluster from kibana and from logstash• need to create a IAM user and its credentials• simultaneously allow access to the IP you’re connecting to kibana from and this user• configure logstash with that user’s credentials for secure communication to the ES cluster
![Page 255: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/255.jpg)
sundog-education.com
our access policy
page0255
{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"AWS": ["arn:aws:iam::159XXXXXXX66:user/estest","arn:aws:iam:: 159XXXXXXX66:user/estest :root"
]},"Action": "es:*","Resource": "arn:aws:es:us-east-1: 159XXXXXXX66:user/estest :domain/frank-test/*"
},{"Effect": "Allow","Principal": {"AWS": "*"
},"Action": ["es:ESHttpGet","es:ESHttpPut","es:ESHttpPost","es:ESHttpHead"
],"Resource": "arn:aws:es:us-east-1: 159XXXXXXX66:user/estest :domain/frank-test/*","Condition": {"IpAddress": {"aws:SourceIp": ["192.168.1.1","127.0.0.1","68.204.31.192"
]}
}}
]}
substitute your own awsaccount ID, IAM user, cluster name, and IP address
![Page 256: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/256.jpg)
sundog-education.com
our logstashconfiguration
page0256
input {
file {
path => "/home/fkane/access_log-2"
}
}
output {
amazon_es {
hosts => ["search-test-logstash-tdjkXXXXXXdtp3o3hcy.us-east-
1.es.amazonaws.com"]
region => "us-east-1"
aws_access_key_id => 'AKIXXXXXXK7XYQQ'
aws_secret_access_key =>
'7rvZyxmUudcXXXXXXXXXgTunpuSyw2HGuF'
index => "production-logs-%{+YYYY.MM.dd}"
}
Substitute your own log path, elasticsearchendpoint, region, and credentials
![Page 257: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/257.jpg)
![Page 258: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/258.jpg)
elasticcloud
![Page 259: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/259.jpg)
sundog-education.com
what is elastic cloud?
page0259
elastic’s hosted solutionbuilt on top of awsincludes x-pack (unlike amazon es)simpler setup uix-pack security simplifies thingsthis costs extra!
![Page 260: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/260.jpg)
let’s set up a trial cluster.
![Page 261: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/261.jpg)
![Page 262: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/262.jpg)
wrapping up
![Page 263: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/263.jpg)
you made it!
you learned a lot:
• installing elasticsearch• mapping and indexing data• searching data• importing data• aggregating data• using kibana• using logstash, beats, and the elastic stack• elasticsearch operations and deployment• using hosted elasticsearch clusters
![Page 264: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/264.jpg)
learning more
• https://www.elastic.co/learn• elasticsearch: the definitive guide• documentation• live training and videos• keep experimenting!
![Page 265: elasticsearch · 2017. 7. 12. · An index powers search into all documents within a collection of types. They contain inverted ... using indices page 010 Most languages have specialized](https://reader036.fdocuments.us/reader036/viewer/2022071416/6111ec1665380875de0a6867/html5/thumbnails/265.jpg)
THANK YOU