GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation...
Transcript of GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation...
![Page 1: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/1.jpg)
PostgreSQL developer meeting: GIN generalization 1
GIN generalizationAlexander Korotkov, Oleg Bartunov
Work supported by Federal Unitary Enterprise Scientific-Research Institute of Economics, Informatics and Control Systems
![Page 2: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/2.jpg)
PostgreSQL developer meeting: GIN generalization 2
Presentation plan
● New storage (additional infromation + compression)
● Fast scan (skip parts of posting trees)● Ordering in index● Planner optimization● Applications
![Page 3: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/3.jpg)
PostgreSQL developer meeting: GIN generalization 3
Additional informationstorage
Posting list
![Page 4: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/4.jpg)
PostgreSQL developer meeting: GIN generalization 4
Additional information interface
Datum *extractValue(
Datum itemValue,int32 *nkeys,bool **nullFlags,Datum *addInfo,bool *addInfoIsNull
)
bool consistent(
bool check[],StrategyNumber n,Datum query,int32 nkeys,Pointer extra_data[],bool *recheck,Datum queryKeys[],bool nullFlags[],Datum addInfo[],bool addInfoIsNull[]
)
void config(
GinConfig *config)
![Page 5: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/5.jpg)
PostgreSQL developer meeting: GIN generalization 5
ItemPointer
typedef struct BlockIdData{ uint16 bi_hi; uint16 bi_lo;} BlockIdData;typedef struct ItemPointerData{ BlockIdData ip_blkid; OffsetNumber ip_posid;}
6 bytes!
![Page 6: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/6.jpg)
PostgreSQL developer meeting: GIN generalization 6
BlockIdData compressionBlockNumber delta is a small number:
use varbyte encoding
![Page 7: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/7.jpg)
PostgreSQL developer meeting: GIN generalization 7
OffsetNumber compression
O0-O15 – OffsetNumber bitsN – Additional information NULL bit
Offset is typically low: use varbyte encoding
![Page 8: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/8.jpg)
PostgreSQL developer meeting: GIN generalization 8
New storage example
![Page 9: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/9.jpg)
PostgreSQL developer meeting: GIN generalization 9
GIN partial match
Without additional information partial match beaves so:
● Save ItemPointers of all matching entries into bitmap
● Use bitmap instead of partial match entry
![Page 10: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/10.jpg)
PostgreSQL developer meeting: GIN generalization 10
GIN partial match with additional information
Datum joinAddInfo(
Datum addInfos[])
Join additional information for all entries of each ItemPointer together?
Where to store it?It might be much more huge than bitmap.
![Page 11: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/11.jpg)
PostgreSQL developer meeting: GIN generalization 11
Fast scan
entry1 && entry2
![Page 12: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/12.jpg)
PostgreSQL developer meeting: GIN generalization 12
Fast scan: interface
bool preCconsistent(
bool check[],StrategyNumber n,Datum query,int32 nkeys,Pointer extra_data[]
)
preConsistent: consistent which allows false positives
![Page 13: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/13.jpg)
PostgreSQL developer meeting: GIN generalization 13
Sorting in index
Uses KNN infrastructure. Woks so:1. Scan + calc rank2. Sort3. Return using gingettuple one by one
It is right design at all? Should we do sorting in a separate node?
![Page 14: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/14.jpg)
PostgreSQL developer meeting: GIN generalization 14
Sorting in index: interface
float8 calcRank(
bool check[],StrategyNumber n,Datum query,int32 nkeys,Pointer extra_data[],bool *recheck,Datum queryKeys[],bool nullFlags[],Datum addInfo[],bool addInfoIsNull[]
)
![Page 15: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/15.jpg)
PostgreSQL developer meeting: GIN generalization 15
Planner optimization:before
test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY slow_func(x,y) LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..3.09 rows=10 width=16) (actual time=11.344..103.443 rows=10 loops=1) Output: x, y, (slow_func(x, y)) -> Index Scan using test_idx on public.test (cost=0.00..309.25 rows=1000 width=16) (actual time=11.341..103.422 rows=10 loops=1) Output: x, y, slow_func(x, y) Total runtime: 103.524 ms(5 rows)
![Page 16: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/16.jpg)
PostgreSQL developer meeting: GIN generalization 16
Planner optimization:after
test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY slow_func(x,y) LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..3.09 rows=10 width=16) (actual time=0.062..0.093 rows=10 loops=1) Output: x, y -> Index Scan using test_idx on public.test (cost=0.00..309.25 rows=1000 width=16) (actual time=0.058..0.085 rows=10 loops=1) Output: x, y Total runtime: 0.164 ms(5 rows)
![Page 17: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/17.jpg)
PostgreSQL developer meeting: GIN generalization 17
Applications
● Better FTS● Array similarity search (better smlr)● Positioned n-grams (better pg_trgm)● Index on regexes (inverted task)
![Page 18: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/18.jpg)
PostgreSQL developer meeting: GIN generalization 18
Better FTS
● tsvector >< tsquery = 1/ts_rank(tsvector, tsquery)
● gin_tsvector_config — set bytea additional information type
● gin_extract_tsquery — extract compressed word positions as additional information
● gin_tsquery_pre_consistent — evaluate tsquery ignoring NOTs
● gin_tsquery_relevance — calculate ><
![Page 19: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/19.jpg)
PostgreSQL developer meeting: GIN generalization 19
Better FTS: query
SELECT itemid, titleFROM itemsWHERE fts @@ plainto_tsquery('russian', 'квартира арбат')ORDER BY fts >< plainto_tsquery('russian', 'квартира арбат')LIMIT 10;
![Page 20: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/20.jpg)
PostgreSQL developer meeting: GIN generalization 20
Better FTS: plan
Limit (cost=40.00..80.22 rows=10 width=400) (actual time=1.559..1.5 Buffers: shared hit=1236 -> Index Scan using fts_idx on items (cost=40.00..7425.31 rows=1 Index Cond: (fts @@ '''квартир'' & ''арбат'''::tsquery) Order By: (fts >< '''квартир'' & ''арбат'''::tsquery) Buffers: shared hit=1236 Total runtime: 1.579 ms
![Page 21: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/21.jpg)
PostgreSQL developer meeting: GIN generalization 21
Avito.ru: testing
Without patch
With patch With patch without tsvector
Sphinx
Table size 6.0 GB 6.0 GB 2.87 GB -
Index size 1.29 GB 1.27 GB 1.27 GB 1.12 GB
Index build time
216 sec 303 sec 718sec 180 sec*
Queries in 8 hours
3,0 M 42.7 M 42.7 M 32.0 M
![Page 22: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/22.jpg)
PostgreSQL developer meeting: GIN generalization 22
It flies!
![Page 23: GIN generalization - PostgreSQL · PostgreSQL developer meeting: GIN generalization 2 Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of](https://reader030.fdocuments.us/reader030/viewer/2022040905/5e7963c5aad21e5db9185db1/html5/thumbnails/23.jpg)
PostgreSQL developer meeting: GIN generalization 23
Thank you for attention!Questions?