Open Source Search Evolution

29
[Open Source] Search Evolution Otis Gospodnetić @otisg

description

From Gopher, WAIS, and Harvest to Lucene, Solr, SolrCloud, and Elasticsearch.

Transcript of Open Source Search Evolution

Page 1: Open Source Search Evolution

[Open Source]Search Evolution

Otis Gospodnetić @otisg

Page 2: Open Source Search Evolution

Today

Page 3: Open Source Search Evolution

The Early Days

Page 4: Open Source Search Evolution

Even Earlier Days

Page 5: Open Source Search Evolution

Foci

1974 1995 now()________________________________________________________________________________________________________________________

SEARCH

Page 6: Open Source Search Evolution

Otis Who?

SEARCH

Page 7: Open Source Search Evolution

Then & Now

1990s 2014WebGlimpse

Swish

Harvest

Ht://Dig

freeWAIS elasticsearch.

Page 8: Open Source Search Evolution

Still New?

elasticsearch.

…………………... 2000

…………………... 2004

…………………... 2010

Page 9: Open Source Search Evolution

Dominance

[Open Source]Search Evolution

Page 10: Open Source Search Evolution

Big Cake

Big DataBeyond Text

Memory FootprintDistributed ModelLanguage Support

Indexing Speed, NRTRelevance Algorithms

Page 11: Open Source Search Evolution

Language Support: Stemming

Page 12: Open Source Search Evolution

Language Support: Lemmatization

Page 13: Open Source Search Evolution

Language Support: Morphology

Page 14: Open Source Search Evolution

Language Support

Lucene 2004: ~ 20 languagesLucene 2014: ~ 40 languages

most are stemmers

Page 15: Open Source Search Evolution

Relevance Models: VSM

TF IDFFor term i in document j

wi,j = tfi,j x log(N/dfi)

tfi,j = number of occurrences of i in jdfi = number of document containing i

N = total number of documents

Page 16: Open Source Search Evolution

Relevance Models: Pluggable

Lucene until 2011: 1 relevance modelLucene 2014: 6 relevance models

got more?

Page 17: Open Source Search Evolution

Distributed Architecture

1 Master - N Slavesgood for scaling queriesnot good for scaling data

Sharded index with replicationgood for scaling queries

good for scaling data

Page 18: Open Source Search Evolution

Indexing Speed & NRT Search

Page 19: Open Source Search Evolution

Memory Footprint

Page 20: Open Source Search Evolution

Beyond Text

Geospatial SearchClassifier

Recommendation EngineKey Value Store

NoSQL DBAnalytical DB

Page 21: Open Source Search Evolution

Geospatial Search

Page 22: Open Source Search Evolution

Classifier

Page 23: Open Source Search Evolution

Recommender

Content Similarity

Collaborative Filtering

Page 24: Open Source Search Evolution

Key Value Store

id123 ⇒ manu:Apple desc:foo bar price:$111

id234 ⇒ manu:Sony desc:baz bam price:$222

Page 25: Open Source Search Evolution

NoSQL DB

DistributedReplicated

Horizontally ScalableFast RetrievalSearchable?

Page 26: Open Source Search Evolution

Slicing & Dicing

Page 27: Open Source Search Evolution

Analytical Queries

Page 28: Open Source Search Evolution

Gobble Gobble

If software is eating the world,then [open source] search is gobbling it.

And has been for years.