Full Text search in Django with Postgres
-
Upload
syerram -
Category
Technology
-
view
1.117 -
download
2
description
Transcript of Full Text search in Django with Postgres
![Page 1: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/1.jpg)
Full Text SearchDjango + Postgres
![Page 2: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/2.jpg)
Search is everywhere
Search expectations● FAST● Full Text search● Linguistic support (“craziness | crazy”)● Ranking● Fuzzy Searching● More like this
![Page 3: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/3.jpg)
Django
● SLOW● `icontains` is dumbed down version of
search● Searching across tables is pain● No relevancy, ranking or similar words
unless done manually● No easy way for fuzzy searching
![Page 4: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/4.jpg)
Other Alternatives
● Solr● ElasticSearch● AWS CloudSearch● Sphinx● etc*
If you’re using any of the above, use Haystack
![Page 5: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/5.jpg)
Postgres Search
● FAST● Simple to implement● Supports Search features like Full Text,
Ranking, Boosting, Fuzzy etc..
![Page 6: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/6.jpg)
Django
Live Example● Search Students by name or by course● Use South migration to create tsvector
column● Store title in Search table● Update Search table via Celery on Save of
Student data
https://github.com/Syerram/postgres_search
![Page 7: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/7.jpg)
GIN, GIST
● GIST is Hash based, GIN is B-trees● GINs = GISTs * 3 , s = Speed● GINu = GISTu * 3 , u = update time● GINkb = GISTkb * 3, kb = sizeA gin indexCREATE INDEX student_index ON students USING gin(to_tsvector('english'name));
Source http://www.postgresql.org/docs/9.2/static/textsearch-indexes.html
![Page 8: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/8.jpg)
Full Text Search● All text should be preprocessed using tsvector and queried using tsquery
● Both reduce the text to lexemesSELECT to_tsvector('How much wood would a woodchuck chuck If a woodchuck could chuck wood?')"'chuck':7,12 'could':11 'much':2 'wood':3,13 'woodchuck':6,10 'would':4"
● Both are required for searching to work on normal text
SELECT to_tsvector('How much wood would a woodchucks chucks If a woodchucks could chucks woods?') @@ 'chucks' -- False
SELECT to_tsvector('How much wood would a woodchucks chucks If a woodchucks could chucks woods?') @@ to_tsquery('chucks') -- True
![Page 9: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/9.jpg)
Full Text Search (Contd.)
● Technically you don’t need index, but for large tables it will be slow
SELECT * FROM students where to_tsvector('english', name) @@ to_tsquery('english', 'Kirk')
● GIN or GIST IndexCREATE INDEX <index_name> ON <table_name> USING gin(<col_name>);
● Expression BasedCREATE INDEX <index_name> ON <table_name> USING gin(to_tsvector(COALESCE(col_name,'') || COALESCE(col_name,'')));
![Page 10: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/10.jpg)
Boosting
● Boost certain results over others● Still matching● Use ts_rank to boost resultse.g.…ORDER BY ts_rank(document, to_tsquery('python')) DESC
![Page 11: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/11.jpg)
Ranking● Importance of search term within documente.g.Search term found in title > description > tag
● Use setweight to assign importance to each field when preparing Document
e.g.setweight(to_tsvector(‘english’, post.title), 'A') || setweight(to_tsvector(‘english’, post.description), 'B') || setweight(to_tsvector('english', post.tags), 'C'))...--In search query use ‘ts_rank’ to order by ranking
![Page 12: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/12.jpg)
Trigram
● Group of 3 consecutive chars from String● Similarity between strings is matched by # of
trigrams they sharee.g. "hello": "h", "he", "hel", "ell", "llo", "lo", and "o”
"hallo": "h", "ha", "hal", "all", "llo", "lo", and "o”Number of matches: 4
● Use similarity to find related terms. Returns value between 0 to 1 where 0 no match and 1 is exact match
![Page 13: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/13.jpg)
Soundex/Metaphone
● Oldest and only good for English names● Converts to a String of Length 4. e.g. “Anthony == Anthoney” => “A535 == A535”
● Create index itself with Soundex or Metaphone
e.g. CREATE INDEX idx_name ON tb_name USING GIN(soundex(col_name));
SELECT ... FROM tb_name WHERE soundex(col_name) = soundex(‘...’)
![Page 14: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/14.jpg)
Pro & Con
Pros● Quick implementation● Lot easier to change document format and call refresh index● Speed comparable to other search engines● Cost effective
Cons● Not as flexible as pure search engines, like Solr● Not as fast as Solr though pretty fast for humans● Tied to Postgres● Indexes can get pretty large, but so can search engine indexes
![Page 15: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/15.jpg)
Django ORM
● Implements Full text Searchclass StudentCourse(models.Model): ... search_index = VectorField() objects = SearchManager( fields = ('student__user__name', 'course__name'), config = 'pg_catalog.english', # this is default search_field = 'search_index', # this is default auto_update_search_field = True )● StudentCourse.objects.search("David")
https://github.com/djangonauts/djorm-ext-pgfulltext
![Page 16: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/16.jpg)
Next Steps
● Add Ranking, Boosting, Fuzzy Search to djorm pgfulltext
e.g. StudentCourse.objects.search("David & Python").rank("Python")StudentCourse.objects.fuzzy_search("Jython").rank("Python")StudentCourse.objects.soundex("Davad").rank("Java") & More
● Continue to add examples to postgres_search
![Page 17: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/17.jpg)
Tips● Use separate DB if necessary or use
Materialized Views● Don’t index everything. Limit your
searchable data● Analyze using `Explain` and ts_stat● Create indexes on fly using concurrently● Don’t pull Foreign Key objects in search
![Page 18: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/18.jpg)
Code
• https://github.com/Syerram/postgres_search
• Stack• AngularJS, Django, Celery, Postgres
• Feel free to Fork, Pull Request
![Page 19: Full Text search in Django with Postgres](https://reader035.fdocuments.us/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/19.jpg)
@agileseeker, github/syerram, syerram.silvrback.com/
Sai