Terms of endearment - the ElasticSearch Query DSL explained
-
Upload
clintongormley -
Category
Technology
-
view
64.640 -
download
0
Transcript of Terms of endearment - the ElasticSearch Query DSL explained
Terms of Endearment
The ElasticSearch query language explained
Clinton Gormley, YAPC::EU 2011
DRTECH @clintongormley
search for : DELETE QUERY
We can
search for : DELETE QUERY
and find : deleteByQuery
We can
but you can only find what is stored in the database
Normalise values
deleteByQuery
'delete''by''query''deletebyquery'
Normalise values and search terms
deleteByQuery
DELETE QUERY
'delete''by''query''deletebyquery'
Normalise values and search terms
deleteByQuery
DELETE QUERY
'delete''by''query''deletebyquery'
Analyse values and search terms
deleteByQuery
DELETE QUERY
'delete''by''query''deletebyquery'
What is stored in ElasticSearch?
{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2,}
Document:
{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2,}
Fields:
{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2,}
Values:
{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]" }, tags => ["perl","opinion"], posts => 2,}
Field types:
# object# string# date# nested object# string# string
# array of enums# integer
{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2,}
Nested objects flattened:
{ tweet => "Perl is GREAT!", posted => "2011-08-15",
user.name => "Clinton Gormley", user.email => "[email protected]",
tags => ["perl","opinion"], posts => 2,}
Nested objects flattened
{ tweet => "Perl is GREAT!", posted => "2011-08-15",
user.name => "Clinton Gormley", user.email => "[email protected]",
tags => ["perl","opinion"], posts => 2,}
Values analyzed into terms
{ tweet => ['perl','great'], posted => [Date(2011-08-15)],
user.name => ['clinton','gormley'], user.email => ['drtech','cpan.org'],
tags => ['perl','opinion'], posts => [2],}
Values analyzed into terms
databasetable
row
many tables many rows one schema many columns
In MySQL
indextype
document
many types many documents one mapping many fields
In ElasticSearch
Create index with mappings
$es->create_index( index => 'twitter', mappings => { tweet => { properties => { title => { type => 'string' }, created => { type => 'date' } } } });
Add a mapping
$es->put_mapping( index => 'twitter', type => 'user', mapping => { properties => { name => { type => 'string' }, created => { type => 'date' }, } });
Can add to existing mapping
Can add to existing mapping
Cannot change mapping for field
Core field types
{ type => 'string',
}
Core field types
{ type => 'string', # byte|short|integer|long|double|float # date, ip addr, geolocation # boolean # binary (as base 64)}
Core field types
{ type => 'string', index => 'analyzed', # 'Foo Bar' [ 'foo', 'bar' ]
}
Core field types
{ type => 'string', index => 'not_analyzed', # 'Foo Bar' [ 'Foo Bar' ]
}
Core field types
{ type => 'string', index => 'no', # 'Foo Bar' [ ]
}
Core field types
{ type => 'string', index => 'analyzed', analyzer => 'default',
}
Core field types
{ type => 'string', index => 'analyzed', index_analyzer => 'default', search_analyzer => 'default',
}
Core field types
{ type => 'string', index => 'analyzed', analyzer => 'default', boost => 2, }
Core field types
{ type => 'string', index => 'analyzed', analyzer => 'default', boost => 2, include_in_all => 1 |0}
Standard
Simple
Whitespace
Stop
Keyword
Built in analyzers
Pattern
Language
Snowball
Custom
The Brown-Cow's Part_No.
#A.BC123-456 [email protected]
keyword: The Brown-Cow's Part_No. #A.BC123-456 [email protected]
whitespace: The, Brown-Cow's, Part_No., #A.BC123-456, [email protected]
simple: the, brown, cow, s, part, no, a, bc, joe, bloggs, com
standard: brown, cow's, part_no, a.bc123, 456, joe, bloggs.com
snowball (English): brown, cow, part_no, a.bc123, 456, joe, bloggs.com
Token filters
Standard
ASCII Folding
Length
Lowercase
NGram
Edge NGram
Porter Stem
Shingle
Stop
Word Delimiter
Stemmer
KStem
Snowball
Phonetic
Synonym
Compound Word
Reverse
Elision
Truncate
Unique
Custom Analyzer
$c->create_index( index => 'twitter', settings => { analysis => { analyzer => { ascii_html => { type => 'custom', tokenizer => 'standard', filter => [ qw( standard lowercase asciifolding stop ) ], char_filter => ['html_strip'] } } }});
Searching
$result = $es->search( index => 'twitter', type => 'tweet',
);
Searching
$result = $es->search( index => ['twitter','facebook'], type => ['tweet','post'],
);
Searching
$result = $es->search( # all indices # all types
);
Searching
$result = $es->search( index => 'twitter', type => 'tweet',
query => { text => { _all => 'foo' }},
);
Searching
$result = $es->search( index => 'twitter', type => 'tweet', queryb => 'foo', # b == ElasticSearch::SearchBuilder
);
Searching
$result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }]
);
Searching
$result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }] from => 0, size => 10,);
Query DSL
Queries vs Filters
Queries vs Filters
full text & terms
terms only
Queries vs Filters
full text & terms
relevance scoring
terms only
no scoring
Queries vs Filters
full text & terms
relevance scoring
slower
terms only
no scoring
faster
Queries vs Filters
full text & terms
relevance scoring
slower
no caching
terms only
no scoring
faster
cacheable
Queries vs Filters
full text & terms
relevance scoring
slower
no caching
terms only
no scoring
faster
cacheable
Use filters for anything that doesn't affect the relevance score!
Query only
Query DSL: $es->search( query => { text => { title => 'perl' } } );
SearchBuilder: $es->search( queryb => { title => 'perl' } );
Filter only
Query DSL: $es->search( query => { constant_score => { filter => {term => { tag => 'perl }} } });
SearchBuilder: $es->search( queryb => { -filter => { tag => 'perl' } });
Query and filter
Query DSL: $es->search( query => { filtered => { query => { text => { title => 'perl' }}, filter =>{ term => { tag => 'perl' }} } });
SearchBuilder: $es->search( queryb => { title => 'perl', -filter => { tag => 'perl' } });
Filters
Filters : equality
Query DSL: { term => { tags => 'perl' }} { terms => { tags => ['perl','ruby'] }}
SearchBuilder: { tags => 'perl' } { tags => ['perl','ruby'] }
Filters : range
Query DSL: { range => { date => { gte => '2010-11-01', lt => '2010-12-01' }}SearchBuilder: { date => { gte => '2010-11-01', lt => '2011-12-01' }}
Filters : range (many values)
Query DSL: { numeric_range => { date => { gte => '2010-11-01', lt => '2010-12-01 }}SearchBuilder: { date => { '>=' => '2010-11-01', '=' => '2011-07-01', '