Elasticsearch in Zalando

Post on 13-Jan-2017

213 views 3 download

Transcript of Elasticsearch in Zalando

Elasticsearch MeetupAlaa Elhadbaalaa.elhadba@gmail.com

A

C P

REST

Distributed

Scalable

Queryable

Search Engine

Red basketball tshirt...

SearchEngine

Red basketball tshirt...

SearchEngine

Red basketball tshirt...

SearchEngine

Red basketball tshirt...

SearchEngine

Red basketball tshirt...

SearchEngine

Red basketball tshirt...

SearchEngine

Red basketball tshirt...

SearchEngine

SearchEngine

SearchEngine

SearchEngine

SearchEngine

SearchEngine

SearchEngine

Red basketball tshirt...

SearchEngine

Red basketball tshirt...

Feature Extraction

Extracting features

It’s all about…….. TOKENS

This parka is crafted from sturdy cotton in classic army green and comes with a removable wool gilet that we've printed in leopard and can be worn inside or as an outer. The parka features all the essentials: a quilted hood, a drawstring waist, a fishtail, and utilitarian pockets. Detailed with silky padded sleeves for extra warmth and superb comfort.

?

Char Filters Tokenizer Token

Filters

Analyzer

Char Filters Tokenizer Token

Filters

Analyzer

& => and, ph => f

Mapping char filter

Char Filters Tokenizer Token

Filters

Analyzer

<b> Elasticsearch </b> -> Elasticsearch

HTML strip filter

Char Filters Tokenizer Token

Filters

Analyzer

Elasticsearch is an awesome technology

White space tokenizer

Char Filters Tokenizer Token

Filters

Analyzer

“Foo”, “bar”, “baz”

Pattern Tokenizer

[^\\w]+foo,bar baz

Char Filters Tokenizer Token

Filters

Analyzer

Stemmer Token Filter

Playing, Played, Player => play

Char Filters Tokenizer Token

Filters

Analyzer

Shingle Token Filter

"please divide", "divide this", "this sentence", "sentence into", "into shingles"

Please divide this sentence into shingles

Char Filters Tokenizer Token

Filters

Analyzer

Stop Token Filter

a, about, above, after, again, against, all, am, an, and, any, are, aren't, as, at, be

Char Filters Tokenizer Token

Filters

Analyzer

Synonyms Token Filter

- america, usa- british, english- blue, duke blue, jade blue - cuisine, food

Char Filters Tokenizer Token

Filters

Analyzer

<p> in the U.S.A. anyone can become president. that’s the problem </p>

Char Filters Tokenizer

Analyzer

HTML strip filter White space Stop

Stemmer

Synonyms

Token Filters

Char Filters Tokenizer Token

Filters

Analyzer

{ “america” “anyone” “become” “president” “problem” }

<p> in the U.S.A. anyone can become president. that’s the problem </p>

Extracting features

It’s all about…….. TOKENS

fishtail, utilitarian pocketsparka, military, army green, army, green, wool, hoodie, silky sleeves, 100% cotton, winter, leopard, jacket, coat, winter, coat, tiger, warm, casual, hiking, …

This parka is crafted from sturdy cotton in classic army green and comes with a removable wool gilet that we've printed in leopard and can be worn inside or as an outer. The parka features all the essentials: a quilted hood, a drawstring waist, a fishtail, and utilitarian pockets. Detailed with silky padded sleeves for extra warmth and superb comfort.

Design for user expectations

Acronyms: “I.B.M” , “Wi-Fi”, “U.S.A” , “IT” , “AFAIK” , “LOL”

Telephone Numbers: (+49) 152-02434977, (0049)15202434977, 015202434977

Names: “John Smith”, “John A. Smith”, “John Adam Smith”, “John S.”

1(800)867-5209

Tailored analysis per field

The Art of Ranking

Ranking

● Filtering

● Boosting

● Scoring

Ranking

User Query

white sneakers

Color Category

Ranking

User Query

Color Category Recency Availability Location Business Value

Ranking

User Query

Color Category Recency Availability Location Business Value

Ranking

User Query

Color Category Recency Availability Location Business Value

Ranking

Boosting Boosting Score Func. Filtering Score Func.Filtering

User Query

Boosting Boosting Score Func. Filtering Score Func.Filtering

Color Category Recency Availability Location Business Value

Ranking

Color Category Recency Availability Location Business Value

Boosting Boosting Score Func. Filtering Score Func.Filtering

User Query

Boosting

Base score

Base Score

Total score = Base score + Additive Score Total score = Base score X Multiplicative Score

Adding Scores Multiplying Scores

Scoring in Elasticsearch

Function Score Query

● weight● field_value_factor● random_score● Decay functions● script_score

SearchEngine

Data ingestion & enrichment

Data retrieval & ranking

Elasticsearch in Zalando

Shop The Look

Shop The Look

Product Service

● Fetch articles by sku

● Fetch articles by urlkey

● Fetch articles by family_sku

Key-Value Store ?

Catalog use-case

example:100 Articles X

Catalog use-case

example:100 Articles X 5 Colors

Catalog use-case

example:100 Articles X 5 Colors X 1000 RPS

Catalog use-case

example:100 Articles X 5 Colors X 1000 RPS = 500,000 RPS

Catalog use-case

Key-Value Store

example:100 Articles X 5 Colors X 1000 RPS = 500,000 RPS

Product Service

Auto Scaling

shards_per_node: 3

Auto Scaling

shards_per_node: 1

The New PDP

Reviews Shop The LookProducts

Elasticsearch Express

● Easy deployment across multiple AZs

● Start serving data in less than 10 minutes

● Full data availability guarantee on each AZ

● Role separation of nodes

● Stable master election

● No manual configuration on AWS

● Automatic data backups in S3 bucket

● ES Monitoring dashboard template

www.search-relevancy-workshop.com

A hands-on workshop for building killer search applications with Elasticsearch.

?