Terms of endearment - the ElasticSearch Query DSL explained

download Terms of endearment - the ElasticSearch Query DSL explained

If you can't read please download the document

Transcript of Terms of endearment - the ElasticSearch Query DSL explained

Terms of Endearment

The ElasticSearch query language explained

Clinton Gormley, YAPC::EU 2011
DRTECH @clintongormley

search for : DELETE QUERY

We can

search for : DELETE QUERY

and find : deleteByQuery

We can

but you can only find what is stored in the database

Normalise values

deleteByQuery

'delete''by''query''deletebyquery'

Normalise values and search terms

deleteByQuery

DELETE QUERY

'delete''by''query''deletebyquery'

Normalise values and search terms

deleteByQuery

DELETE QUERY

'delete''by''query''deletebyquery'

Analyse values and search terms

deleteByQuery

DELETE QUERY

'delete''by''query''deletebyquery'

What is stored in ElasticSearch?

{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2,}

Document:

{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2,}

Fields:

{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2,}

Values:

{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]" }, tags => ["perl","opinion"], posts => 2,}

Field types:

# object# string# date# nested object# string# string

# array of enums# integer

{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2,}

Nested objects flattened:

{ tweet => "Perl is GREAT!", posted => "2011-08-15",

user.name => "Clinton Gormley", user.email => "[email protected]",

tags => ["perl","opinion"], posts => 2,}

Nested objects flattened

{ tweet => "Perl is GREAT!", posted => "2011-08-15",

user.name => "Clinton Gormley", user.email => "[email protected]",

tags => ["perl","opinion"], posts => 2,}

Values analyzed into terms

{ tweet => ['perl','great'], posted => [Date(2011-08-15)],

user.name => ['clinton','gormley'], user.email => ['drtech','cpan.org'],

tags => ['perl','opinion'], posts => [2],}

Values analyzed into terms

databasetable

row

many tables many rows one schema many columns

In MySQL

indextype

document

many types many documents one mapping many fields

In ElasticSearch

Create index with mappings

$es->create_index( index => 'twitter', mappings => { tweet => { properties => { title => { type => 'string' }, created => { type => 'date' } } } });

Add a mapping

$es->put_mapping( index => 'twitter', type => 'user', mapping => { properties => { name => { type => 'string' }, created => { type => 'date' }, } });

Can add to existing mapping

Can add to existing mapping

Cannot change mapping for field

Core field types

{ type => 'string',

}

Core field types

{ type => 'string', # byte|short|integer|long|double|float # date, ip addr, geolocation # boolean # binary (as base 64)}

Core field types

{ type => 'string', index => 'analyzed', # 'Foo Bar' [ 'foo', 'bar' ]

}

Core field types

{ type => 'string', index => 'not_analyzed', # 'Foo Bar' [ 'Foo Bar' ]

}

Core field types

{ type => 'string', index => 'no', # 'Foo Bar' [ ]

}

Core field types

{ type => 'string', index => 'analyzed', analyzer => 'default',

}

Core field types

{ type => 'string', index => 'analyzed', index_analyzer => 'default', search_analyzer => 'default',

}

Core field types

{ type => 'string', index => 'analyzed', analyzer => 'default', boost => 2, }

Core field types

{ type => 'string', index => 'analyzed', analyzer => 'default', boost => 2, include_in_all => 1 |0}

Standard

Simple

Whitespace

Stop

Keyword

Built in analyzers

Pattern

Language

Snowball

Custom

The Brown-Cow's Part_No.
#A.BC123-456 [email protected]

keyword: The Brown-Cow's Part_No. #A.BC123-456 [email protected]

whitespace: The, Brown-Cow's, Part_No., #A.BC123-456, [email protected]

simple: the, brown, cow, s, part, no, a, bc, joe, bloggs, com

standard: brown, cow's, part_no, a.bc123, 456, joe, bloggs.com

snowball (English): brown, cow, part_no, a.bc123, 456, joe, bloggs.com

Token filters

Standard

ASCII Folding

Length

Lowercase

NGram

Edge NGram

Porter Stem

Shingle

Stop

Word Delimiter

Stemmer

KStem

Snowball

Phonetic

Synonym

Compound Word

Reverse

Elision

Truncate

Unique

Custom Analyzer

$c->create_index( index => 'twitter', settings => { analysis => { analyzer => { ascii_html => { type => 'custom', tokenizer => 'standard', filter => [ qw( standard lowercase asciifolding stop ) ], char_filter => ['html_strip'] } } }});

Searching

$result = $es->search( index => 'twitter', type => 'tweet',

);

Searching

$result = $es->search( index => ['twitter','facebook'], type => ['tweet','post'],

);

Searching

$result = $es->search( # all indices # all types

);

Searching

$result = $es->search( index => 'twitter', type => 'tweet',

query => { text => { _all => 'foo' }},

);

Searching

$result = $es->search( index => 'twitter', type => 'tweet', queryb => 'foo', # b == ElasticSearch::SearchBuilder

);

Searching

$result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }]

);

Searching

$result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }] from => 0, size => 10,);

Query DSL

Queries vs Filters

Queries vs Filters

full text & terms

terms only

Queries vs Filters

full text & terms

relevance scoring

terms only

no scoring

Queries vs Filters

full text & terms

relevance scoring

slower

terms only

no scoring

faster

Queries vs Filters

full text & terms

relevance scoring

slower

no caching

terms only

no scoring

faster

cacheable

Queries vs Filters

full text & terms

relevance scoring

slower

no caching

terms only

no scoring

faster

cacheable

Use filters for anything that doesn't affect the relevance score!

Query only

Query DSL: $es->search( query => { text => { title => 'perl' } } );

SearchBuilder: $es->search( queryb => { title => 'perl' } );

Filter only

Query DSL: $es->search( query => { constant_score => { filter => {term => { tag => 'perl }} } });

SearchBuilder: $es->search( queryb => { -filter => { tag => 'perl' } });

Query and filter

Query DSL: $es->search( query => { filtered => { query => { text => { title => 'perl' }}, filter =>{ term => { tag => 'perl' }} } });

SearchBuilder: $es->search( queryb => { title => 'perl', -filter => { tag => 'perl' } });

Filters

Filters : equality

Query DSL: { term => { tags => 'perl' }} { terms => { tags => ['perl','ruby'] }}

SearchBuilder: { tags => 'perl' } { tags => ['perl','ruby'] }

Filters : range

Query DSL: { range => { date => { gte => '2010-11-01', lt => '2010-12-01' }}SearchBuilder: { date => { gte => '2010-11-01', lt => '2011-12-01' }}

Filters : range (many values)

Query DSL: { numeric_range => { date => { gte => '2010-11-01', lt => '2010-12-01 }}SearchBuilder: { date => { '>=' => '2010-11-01', '=' => '2011-07-01', '