Benchmark of KVP vs. Hstore - Slides

download Benchmark of KVP vs. Hstore - Slides

of 32

Transcript of Benchmark of KVP vs. Hstore - Slides

  • Key/Value Pair versus hstore - Benchmarking Entity-Attribute-Value Structures in PostgreSQL.

    Michel Ott

    June 17, 2011 1University of Applied Science Rapperswil

  • What is KVP?

    A key-value pair (KVP) is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is

    either the data that is identified or a pointer to the location of that data.

    (Source: Techtarget, http://searchenterprisedesktop.techtarget.com/definition/key-value-pair)

    an open-ended data structure that allows for future extension without modifying existing code or data.

    (Source: Wikipedia, http://en.wikipedia.org/wiki/Attribute-value_pair)

    June 17, 2011 University of Applied Science Rapperswil 2

    1. { data : [

    2. { amenity : restaurant,3. name : Godman },4. { amenity : university,5. name : Harvard University }6. ] }

  • Agenda

    Introduction

    What is hstore?

    KVP Schema in PostgreSQL

    Setup / Environment

    Performance Benchmark Design

    Test Environment

    Benchmark (May 2011)

    Results

    Findings

    Conclusion

    June 17, 2011 3University of Applied Science Rapperswil

  • INTRODUCTION What is hstore?

    KVP Schema in PostgreSQL

    June 17, 2011 4University of Applied Science Rapperswil

  • What is hstore?

    Hstore in PostgreSQL

    Storage for semistructural data (a'la perl hash)

    Stores associative arrays in a attribute of a table

    Is an abstract data type in PostgreSQL

    Provides a bunch of PostgreSQL functions for querying, transforming, manipulating,

    Usage of hstore

    June 17, 2011 University of Applied Science Rapperswil 5

    1. CREATE TABLE bench_hstore (

    id BIGINT PRIMARY KEY,

    (),kvp_hstore HSTORE

    );

  • What is hstore?

    Usage of hstore

    June 17, 2011 University of Applied Science Rapperswil 6

    1. SELECT hstore(kvp_hstore)->name as nameFROM bench_hstore

    WHERE hstore(kvp_hstore)->amenity = restaurant;

    id : BIGINT kvp_hstore : HSTORE

    1 amenity=>restaurant, name=>Goodman

    1. INSERT INTO bench_hstore(kvp_hstore) VALUES (

    hstore(amenity=>restaurantname=>Godman

    ));

  • KVP Schema in PostgreSQL

    Schema

    Two tables needed (one for the unforeseen arbitrary data = KVP and one for the additional data)

    Usage

    June 17, 2011 University of Applied Science Rapperswil 7

    1. CREATE TABLE bench_kvp_info(

    id BIGINT PRIMARY KEY,

    ());

    2. CREATE TABLE bench_kvp(

    id BIGINT REFERENCES bench_kvp_main(id),

    key TEXT NOT NULL,

    value TEXT

    );

  • KVP Schema in PostgreSQL

    Usage

    June 17, 2011 University of Applied Science Rapperswil 8

    1. INSERT INTO bench_kvp_info(id)

    VALUES(1)

    2. INSERT INTO bench_kvp(id, key, value)

    VALUES(1, amenity, restaurant);3. INSERT INTO bench_kvp(id, key, value)

    VALUES(1, name, Godman);

    1. SELECT * FROM bench_kvp WHERE id = (

    SELECT bench_id FROM bench_kvp

    WHERE key = amenity' AND value = restaurant);

    id : BIGINT key : TEXT value : TEXT

    1 amenity restaurant

    1 name Godman

  • SETUP / ENVIRONMENT Performance Benchmark Design

    Table Schemas

    Test Data

    Test Environment

    June 17, 2011 9University of Applied Science Rapperswil

  • Performance Benchmark Design

    Table Schema

    June 17, 2011 University of Applied Science Rapperswil 10

    1. CREATE TABLE bench_kvp_info(

    id BIGINT PRIMARY KEY,

    ());

    2. CREATE TABLE bench_kvp(

    id BIGINT REFERENCES bench_kvp_main(id),

    key TEXT NOT NULL,

    value TEXT

    );

    1. CREATE TABLE bench_hstore (

    id BIGINT PRIMARY KEY,

    (),kvp_hstore HSTORE

    );

  • Performance Benchmark Design

    Test Data

    12 different data sets with the following amount of records

    Each data set testes twice (once with index and once without)

    GiST (Generalized Search Tree) is used for hstore (basis for B-Tree and R-Tree)

    Hence

    June 17, 2011 University of Applied Science Rapperswil 11

    10 100 500

    1000 2500 5000

    10000 20000 35000

    50000 100000 250000

    [cicles] 144 start] [warm 3indices] of [# 2 types]of [# 2length] of [# 12

  • Test Data

    Test Data Schema

    Example

    June 17, 2011 University of Applied Science Rapperswil 12

    Column Description

    id : integer, sequence Mandatory. A unique sequence identifier.

    surname : Text Mandatory. A fancy name.

    forename : Text Optional: A fancy name. Can be empty to have a

    variable KVP length.

    zip : Integer Optional: A number between 1000 and 9000.

    comment : Text Optional: A dummy text.

    1,cucyp,ecnalehad,6593,lorem ipsum dolor sit amet 2,kasarzyc,,6593,

  • Test Environment

    June 17, 2011 University of Applied Science Rapperswil 13

  • Test Environment

    June 17, 2011 University of Applied Science Rapperswil 14

    Technical Specification

    Intel(R) Xeon(R) CPU E5520 @ 2.27GHz 64-bit

    3CPUs, 4 cores and 8 threads

    24 GB RAM

    Software

    Ubuntu 10.04.2 LTS

    PostgreSQL 9.0.4

    Python 2.6.5, Numpy, Scipy, Matplotlib

    No tuning of Software

  • BENCHMARK MAY 2011 Results

    Findings

    Analyze statements

    Functionality of hstore

    Conclusion

    June 17, 2011 15University of Applied Science Rapperswil

  • Results

    June 17, 2011 University of Applied Science Rapperswil 16

  • Results

    June 17, 2011 University of Applied Science Rapperswil 17

  • Results

    June 17, 2011 University of Applied Science Rapperswil 18

  • Results

    June 17, 2011 University of Applied Science Rapperswil 19

  • Results

    June 17, 2011 University of Applied Science Rapperswil 20

  • Results

    June 17, 2011 University of Applied Science Rapperswil 21

  • Findings

    Table size

    hstore:

    KVP , whereas

    Explain Analyze for KVP

    June 17, 2011 University of Applied Science Rapperswil 22

    tuplesentriesarray

    array in the valuetuples nullvalue

    KVP hstore

    Without

    index

    Index on

    keyCombined

    index

    Without

    index

    GiST index

    Cost 0..437.03 0..406.88 0..215.91 0..213.72 0..11.33

    Runtime 3.607 ms 2.770 ms 2.028 ms 1.883 ms 0.721 ms

    Scans Seq scan 1 heap & 1

    index scan

    1 index

    scan

    Seq scan 1 heap & 1

    index scan

  • Functionality of hstore

    Buffers the whole hstore

    Each hstore key value pair knows:

    its position in the string

    its length

    value and its length

    Hstore data type

    June 17, 2011 University of Applied Science Rapperswil 23

    1. CREATE TYPE hstore (

    INTERNALLENGTH = -1,

    INPUT = hstore_in,

    OUTPUT = hstore_out,

    RECEIVE = hstore_recv,

    SEND = hstore_send,

    STORAGE = extended

    );

  • Functionality of hstore

    -> as an example for the available operator

    Procedure is linked to a PostgreSQL function

    June 17, 2011 University of Applied Science Rapperswil 24

    1. CREATE OPERATOR -> (

    LEFTARG = hstore,

    RIGHTARG = text,

    PROCEDURE = fetchval

    );

    1. CREATE OR REPLACE FUNCTION fetchval(hstore,text)

    2. RETURNS text

    3. AS 'MODULE_PATHNAME','hstore_fetchval4. LANGUAGE C STRICT IMMUTABLE;

  • Functionality of hstore

    PostgreSQL function is linked to a C method

    hstore_fetchval method returns the value by

    calling get_val method, which loops over the buffer and returns the position

    Example

    June 17, 2011 University of Applied Science Rapperswil 25

    id : BIGINT kvp_hstore : HSTORE

    1 zip=>8000, surname=>ebsaveq

    2 zip=>6489, surname=>epofod

    3 zip=>8000, surname=>kjuefs

    1. SELECT hstore(bench_hstore)->surnameFROM bench_hstore

    WHERE hstore(bench_hstore)->zip=8000;

  • Conclusion

    For small data sets (< 500 records) KVP is preferable

    However

    500 records is easily exceeded

    Changing schema involves huge effort

    Transposing data

    Changing database table schema

    Possibly refactoring software to new schema

    June 17, 2011 University of Applied Science Rapperswil 26

    If unsure about size use hstore as data typeKVP is only 0.45 ms faster at 500 records

  • June 17, 2011 University of Applied Science Rapperswil 27

    Thank You

    MerciGrazie

    Gracias

    Obrigado

    Danke

    Japanese

    English

    French

    Russian

    German

    Italian

    Spanish

    Brazilian PortugueseArabic

    Traditional Chinese

    Simplified Chinese

    Hindi

    Tamil

    Thai

    Korean

  • BACKUP

    June 17, 2011 University of Applied Science Rapperswil 28

  • Findings

    Table size

    hstore:

    KVP , whereas

    Explain Analyze for KVP

    June 17, 2011 University of Applied Science Rapperswil 29

    tuplesentriesarray

    array in the valuetuples nullvalue

    1. Seq Scan on bench_kvp

    (cost=229.38..437.03 rows=3 width=60)

    (actual time=3.125..3.579 rows=2 loops=1)

    2. Filter: (bench_id = $0)

    3. InitPlan 1 (returns $0)

    4. -> Seq Scan on bench_kvp

    (cost=0.00..229.38 rows=1 width=8)

    (actual time=1.406..2.162 rows=1 loops=1)

    5. Filter: ((key = 'id'::text) AND

    (value = '1735'::text))

    6. Total runtime: 3.607 ms

  • Findings

    Explain Analyze for KVP with index on attribute key

    June 17, 2011 University of Applied Science Rapperswil 30

    1. Seq Scan on bench_kvp

    (cost=199.48..406.88 rows=3 width=60)

    (actual time=2.268..2.730 rows=2 loops=1)

    2. Filter: (bench_id = $0)

    3. InitPlan 1 (returns $0)

    4. -> Bitmap Heap Scan on bench_kvp

    (cost=62.99..199.48 rows=1 width=8)

    (actual time=0.925..1.227 rows=1 loops=1)

    5. Recheck Cond: (key = 'id'::text)

    6. Filter: (value = '1735'::text)

    7. -> Bitmap Index Scan on kvpidx

    (cost=0.00..62.99 rows=2499 width=0)

    (actual time=0.373..0.373 rows=2500 loops=1)

    8. Index Cond: (key = 'id'::text)

    9. Total runtime: 2.770 ms

  • Findings

    Explain Analyze for KVP with combined index

    June 17, 2011 University of Applied Science Rapperswil 31

    1. Seq Scan on bench_kvp

    (cost=8.27..215.91 rows=3 width=60)

    (actual time=1.376..1.954 rows=5 loops=1)

    2. Filter: (bench_id = $0)

    3. InitPlan 1 (returns $0)

    4. -> Index Scan using kvpidx2 on bench_kvp

    (cost=0.00..8.27 rows=1 width=8)

    (actual time=0.048..0.049 rows=1 loops=1)

    5. Index Cond: ((key = 'id'::text) AND

    (value = '1735'::text))

    6. Total runtime: 2.028 ms

    7. (6 rows)

  • Findings

    Explain Analyze for hstore

    Explain Analyze for hstore with index

    June 17, 2011 University of Applied Science Rapperswil 32

    1. Seq Scan on bench_hstore

    (cost=0.00..213.72 rows=45 width=40)

    (actual time=1.318..1.778 rows=1 loops=1)

    2. Filter: ((bench_hstore -> 'id'::text) = '1735'::text)

    3. Total runtime: 1.883 ms

    1. Bitmap Heap Scan on bench_hstore

    (cost=4.27..11.33 rows=2 width=218)

    (actual time=0.481..0.534 rows=1 loops=1)

    2. Recheck Cond: (bench_hstore @> '"id"=>"1735"'::hstore)

    3. -> Bitmap Index Scan on hidx_2_5k

    (cost=0.00..4.27 rows=2 width=0)

    (actual time=0.308..0.308 rows=70 loops=1)

    4. Index Cond: (bench_hstore @> '"id"=>"1735"'::hstore)

    5. Total runtime: 0.721 ms