Full Text search in Django with Postgres

Full Text SearchDjango + Postgres

Search is everywhere

Search expectations● FAST● Full Text search● Linguistic support (“craziness | crazy”)● Ranking● Fuzzy Searching● More like this

Django

● SLOW● `icontains` is dumbed down version of

search● Searching across tables is pain● No relevancy, ranking or similar words

unless done manually● No easy way for fuzzy searching

Other Alternatives

● Solr● ElasticSearch● AWS CloudSearch● Sphinx● etc*

If you’re using any of the above, use Haystack

Postgres Search

● FAST● Simple to implement● Supports Search features like Full Text,

Ranking, Boosting, Fuzzy etc..

Django

Live Example● Search Students by name or by course● Use South migration to create tsvector

column● Store title in Search table● Update Search table via Celery on Save of

Student data

https://github.com/Syerram/postgres_search

GIN, GIST

● GIST is Hash based, GIN is B-trees● GINs = GISTs * 3 , s = Speed● GINu = GISTu * 3 , u = update time● GINkb = GISTkb * 3, kb = sizeA gin indexCREATE INDEX student_index ON students USING gin(to_tsvector('english'name));

Source http://www.postgresql.org/docs/9.2/static/textsearch-indexes.html

Full Text Search● All text should be preprocessed using tsvector and queried using tsquery

● Both reduce the text to lexemesSELECT to_tsvector('How much wood would a woodchuck chuck If a woodchuck could chuck wood?')"'chuck':7,12 'could':11 'much':2 'wood':3,13 'woodchuck':6,10 'would':4"

● Both are required for searching to work on normal text

SELECT to_tsvector('How much wood would a woodchucks chucks If a woodchucks could chucks woods?') @@ 'chucks' -- False

SELECT to_tsvector('How much wood would a woodchucks chucks If a woodchucks could chucks woods?') @@ to_tsquery('chucks') -- True

Full Text Search (Contd.)

● Technically you don’t need index, but for large tables it will be slow

SELECT * FROM students where to_tsvector('english', name) @@ to_tsquery('english', 'Kirk')

● GIN or GIST IndexCREATE INDEX <index_name> ON <table_name> USING gin(<col_name>);

● Expression BasedCREATE INDEX <index_name> ON <table_name> USING gin(to_tsvector(COALESCE(col_name,'') || COALESCE(col_name,'')));

Boosting

● Boost certain results over others● Still matching● Use ts_rank to boost resultse.g.…ORDER BY ts_rank(document, to_tsquery('python')) DESC

Ranking● Importance of search term within documente.g.Search term found in title > description > tag

● Use setweight to assign importance to each field when preparing Document

e.g.setweight(to_tsvector(‘english’, post.title), 'A') || setweight(to_tsvector(‘english’, post.description), 'B') || setweight(to_tsvector('english', post.tags), 'C'))...--In search query use ‘ts_rank’ to order by ranking

Trigram

● Group of 3 consecutive chars from String● Similarity between strings is matched by # of

trigrams they sharee.g. "hello": "h", "he", "hel", "ell", "llo", "lo", and "o”

"hallo": "h", "ha", "hal", "all", "llo", "lo", and "o”Number of matches: 4

● Use similarity to find related terms. Returns value between 0 to 1 where 0 no match and 1 is exact match

Soundex/Metaphone

● Oldest and only good for English names● Converts to a String of Length 4. e.g. “Anthony == Anthoney” => “A535 == A535”

● Create index itself with Soundex or Metaphone

e.g. CREATE INDEX idx_name ON tb_name USING GIN(soundex(col_name));

SELECT ... FROM tb_name WHERE soundex(col_name) = soundex(‘...’)

Pro & Con

Pros● Quick implementation● Lot easier to change document format and call refresh index● Speed comparable to other search engines● Cost effective

Cons● Not as flexible as pure search engines, like Solr● Not as fast as Solr though pretty fast for humans● Tied to Postgres● Indexes can get pretty large, but so can search engine indexes

Django ORM

● Implements Full text Searchclass StudentCourse(models.Model): ... search_index = VectorField() objects = SearchManager( fields = ('student__user__name', 'course__name'), config = 'pg_catalog.english', # this is default search_field = 'search_index', # this is default auto_update_search_field = True )● StudentCourse.objects.search("David")

https://github.com/djangonauts/djorm-ext-pgfulltext

Next Steps

● Add Ranking, Boosting, Fuzzy Search to djorm pgfulltext

e.g. StudentCourse.objects.search("David & Python").rank("Python")StudentCourse.objects.fuzzy_search("Jython").rank("Python")StudentCourse.objects.soundex("Davad").rank("Java") & More

● Continue to add examples to postgres_search

Tips● Use separate DB if necessary or use

Materialized Views● Don’t index everything. Limit your

searchable data● Analyze using `Explain` and ts_stat● Create indexes on fly using concurrently● Don’t pull Foreign Key objects in search

• https://github.com/Syerram/postgres_search

• Stack• AngularJS, Django, Celery, Postgres

• Feel free to Fork, Pull Request

@agileseeker, github/syerram, syerram.silvrback.com/

Full Text search in Django with Postgres

Technology

Transcript of Full Text search in Django with Postgres

Postgres Enterprise Manager · 2020. 12. 8. · Postgres Enterprise Manager, Release 8.0 This document provides an introduction to Postgres Enterprise Manager™ (PEM). Postgres Enter-prise

Django Rest Framework vs Django TastyPie

Scaling postgres

Postgres Presentation

Postgres User

Introduction to Django - OSCON 2012 - O'Reilly Mediaassets.en.oreilly.com/1/event/80/Introduction to Django... · Django Training. Section 1: Introduction to Django. 1 “ ” Django

Without Django - Pocoomitsuhiko.pocoo.org/django-without-django.pdf · Without Django applying django principles to non django projects. I Love Django ‣ Using Django since the very

Postgres (for non-Postgres people) · 2020-01-04 · Greg Sabino Mullane End Point Corporaon greg@endpoint.com PgCon 2010 Postgres (for non-Postgres people)

Using VMware vFabric Postgres - vFabric Postgres 9.3 · PDF fileUsing VMware vFabric Postgres vFabric Postgres 9.3.5 This document supports the version of each product listed and supports

Django Facebook Documentation - media.readthedocs.orgmedia.readthedocs.org/pdf/django-facebook/latest/django-facebook.pdf · Django Facebook enables your users to easily register

Using VMware vFabric Postgres - vFabric Postgres 9.1

Meet Django - Django Webframework in Python

Directory search performance ... - docs.belle2.org · database and to retrieve the result set in the postgres sub-processor. Figure 4. Bottleneck in search workflow . 3. Enhancement

John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Postgres Enterprise Manager Installation Guide · Postgres Enterprise Manager™ Installation Guide, Version 6.0.0 ... 2 Postgres Enterprise Manager™ ... Postgres Enterprise Manager

Full-Text Search in Django with PostgreSQL - EuroPython · Full-Text Search in Django with PostgreSQL EuroPython 2017 - Rimini, 2017-07-12 Paolo Melchiorre - @pauloxnet PaoloMelchiorre

Django Jobs | Django developers | Freelance Jobs

Configuring EDB Postgres Advanced Server Streaming ......Configuring EDB Postgres Advanced Server Streaming Replication EDB Postgres Advanced Server 9.5 formerly Postgres Plus Advanced

Django - Sevenmentor Pvt. Ltd · 2019-08-20 · Django Prerequisite: Introduction to HTML5, CSS3 and Bootstrap Django Framework Introduction to Django Installing Django Setting up

Postgres Plus Advanced Server ECPG Plus Guideget.enterprisedb.com/docs/Postgres_Plus_Advanced_Server_ecpgPlus...Postgres Plus Advanced Server ECPGPlus Guide Postgres Plus Advanced