Steam Learn: Full text search with PostgreSQL
Transcript of Steam Learn: Full text search with PostgreSQL
![Page 1: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/1.jpg)
7th of August 2014
Full Text SearchWith PostgreSQL
by Vincent Desmares
![Page 2: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/2.jpg)
7th of August 2014
Summary
1) What is a Full Text Search
2) A basic PostgreSQL example
3) FTS advanced features
4) FTS advanced configuration
![Page 3: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/3.jpg)
7th of August 2014
What is a Full Text Search?
● Searching for documents
● Use the whole document
● Be able to set the precision
![Page 4: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/4.jpg)
7th of August 2014
Why using a Full Text Search?● Basic methods are = and ILIKE
○ No linguistic support for textual search operators
■ “countries” should be the same as “country”
○ Can’t compare matches relevance
○ Basic search too slow for complex queries
● Why PostgreSQL
○ Native and sufficiently performant
![Page 5: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/5.jpg)
7th of August 2014
Library / Database
FTS basic usage
Search for:
● A document
● A business object
● A rowOtherdocuments
Relevantdocuments
![Page 6: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/6.jpg)
7th of August 2014
What is a search?
● A query to run● On a parsed text
SELECT *
FROM document
WHERE to_tsvector(title || content) @@ to_tsquery(‘Car’)
![Page 7: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/7.jpg)
7th of August 2014
How to_tsvector works?
● Text is separated into tokens
# select * from to_tsvector('Hello my name is vincent. I am very happy to be vincent.')
"'happi':9 'hello':1 'name':3 'vincent':5,12"
![Page 8: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/8.jpg)
7th of August 2014
How to_tsquery works?
● Parse a formated query
#select to_tsquery('Vincent is Happy')
ERROR: syntax error in tsquery: "Vincent is Happy"
#select to_tsquery('Vincent & is & Happy')
"'vincent' & 'happi'"
![Page 9: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/9.jpg)
7th of August 2014
Résultat
@@ operator is the same as = for the FTS
#select content from document where to_tsquery('Vincent & is & Happy') @@ to_tsvector(content) limit 1
'Hello my name is vincent. I am very happy to be vincent.'
![Page 10: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/10.jpg)
7th of August 2014
And it’s faaaaaaaaaaast
# select count(*) FROM document;count | 11909475
# select count(*) FROM document where content_vector @@ to_tsquery(‘countries’);count | 424813Time: 454.709 ms
# select count(*) FROM document whereILIKE '%countries%';count | 116734Time: 11672.649 ms
![Page 11: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/11.jpg)
7th of August 2014
Why it’s faaaaaaaaaaaast?
● Indexed
● GIN (Generalized Inverted Index)
○ Longer to build, faster
● GiST (Generalized Search Tree)
○ Quicker to update, slower,
CREATE INDEX document_tsvector_idx ON document USING gin to_tsvector(title || content);
![Page 12: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/12.jpg)
7th of August 2014
Advanced Features
![Page 13: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/13.jpg)
7th of August 2014
Ranked results
#select content, ts_rank_cd( to_tsvector(content), to_tsquery('Happy'), 1|8) as rankfrom document where to_tsquery('Happy') @@ to_tsvector(content)ORDER BY
rank DESC
![Page 14: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/14.jpg)
7th of August 2014
Google style results
# SELECT id, ts_headline( body, q,
‘StartSel=<b>, StopSel=</b>,MaxWords=5, MinWords=4, ShortWord=3, HighlightAll=FALSE,MaxFragments=0, FragmentDelimiter=" ... "’
) FROM document WHERE to_tsquery('Happy') @@ to_tsvector(content)
“<b>Vincent</b> is <b>happy</b> because … very <b>happy</b> to be with ...”
![Page 15: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/15.jpg)
7th of August 2014
Advanced Configuration
![Page 16: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/16.jpg)
7th of August 2014
Simplest workflow
Original Content ts_vector
Vincent is very very Happy
‘happy’:5 ‘is’:2 ‘very’:3,4 ‘vincent’:1
![Page 17: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/17.jpg)
7th of August 2014
Useless words? Stop Words!
● Just a file with a list of words● Must be in the postgres tsearch directory
CREATE TEXT SEARCH DICTIONARY documentIspell ( TEMPLATE = ispell, stopwords = 'my_file');
/usr/share/postgresql/9.3/tsearch_data/my_file
![Page 18: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/18.jpg)
7th of August 2014
With stop words
Original Content ts_vector
Vincent is very very Happy
‘happy’:5 ‘very’:3,4 ‘vincent’:1
Remove useless Words
With only “is” in the .stop
![Page 19: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/19.jpg)
7th of August 2014
Custom dictfile
● Just a file with a list of words● Contain suffix/Affix metadata (can be custom)
CREATE TEXT SEARCH DICTIONARY documentIspell ( [...]dictfile = ‘my.dict’
);
# cat my.dict | grep fryfry/NGDS
# cat en_us.affix | grep SSFX S Y 4SFX S y ies [^aeiou]ySFX S 0 s [aeiou]ySFX S 0 es [sxzh]SFX S 0 s [^sxzhy]
![Page 20: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/20.jpg)
7th of August 2014
With linguistic dictionaries
Original Content ts_vector
Vincent is very very Happy
‘happi’:5 ‘very’:3,4 ‘vincent’:1
Remove useless Words
Reduce Words to their roots
With as custom .affix and .dict
![Page 21: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/21.jpg)
7th of August 2014
The thesaurus
● Link business terms
# cat /var/postgresql/9.3/tserach_data/inovia_learn.thsMcDo : *McDonaldsMc do : *McDonaldsvery happy : *blessed
![Page 22: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/22.jpg)
7th of August 2014
The final chain
Original Content ts_vector
Vincent is very very Happy
‘very’:3, ‘blessed’:4, ‘vincent’:1
Remove useless Words
Reduce Words to their roots
Reduce Words to their syn.
With as custom .ths
![Page 23: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/23.jpg)
7th of August 2014
How to debug?
# Select * FROM ts_debug(‘Vincent is very very happy’)
![Page 24: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/24.jpg)
7th of August 2014
The drawbacks (yes, last slide)
● Transformed words (lexem) Indexed○ Only full or suffix match available
Solution: autocomplete● Business have custom meaningEx: fry (third-person singular simple present fries)Solution: Custom dictionary● Indexes are long to build
![Page 25: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/25.jpg)
7th of August 2014
Merci !
Sources:http://www.postgresql.org/docs/9.3/static/textsearch.htmlhttp://en.wikipedia.org/wiki/Full_text_searchhttp://en.wikipedia.org/wiki/Precision_and_recall
For online questions, please leave a comment on the article.
Questions ?
![Page 26: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/26.jpg)
7th of August 2014
For online questions, please leave a comment on the article.
Questions ?
![Page 27: Steam Learn: Full text search with PostgreSQL](https://reader031.fdocuments.us/reader031/viewer/2022032002/55a9311d1a28ab792a8b4606/html5/thumbnails/27.jpg)
7th of August 2014
Join the community !(in Paris)
Social networks :● Follow us on Twitter : https://twitter.com/steamlearn● Like us on Facebook : https://www.facebook.com/steamlearn
SteamLearn is an Inovia initiative : inovia.fr
You wish to be in the audience ? Contact us at [email protected]