Streaming PostgreSQL with...

67
Streaming PostgreSQL with PipelineDB Derek Nelson

Transcript of Streaming PostgreSQL with...

Page 1: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Streaming PostgreSQL with

PipelineDB

Derek Nelson

Page 2: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

What is PipelineDB?

Page 3: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

What is PipelineDB?

● Continuous SQL on streams (continuous views)

Page 4: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

What is PipelineDB?

● Continuous SQL on streams (continuous views)

● High-throughput, incremental materialized views

Page 5: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

What is PipelineDB?

● Continuous SQL on streams (continuous views)

● High-throughput, incremental materialized views

● Based on PostgreSQL 9.4.4

Page 6: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

What is PipelineDB?

● Continuous SQL on streams (continuous views)

● High-throughput, incremental materialized views

● Based on PostgreSQL 9.4.4

● No special client libraries

Page 7: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

What is PipelineDB?

● Continuous SQL on streams (continuous views)

● High-throughput, incremental materialized views

● Based on PostgreSQL 9.4.4

● No special client libraries

● Eventually factored out as extension

Page 8: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

What is PipelineDB?

● Continuous SQL on streams (continuous views)

● High-throughput, incremental materialized views

● Based on PostgreSQL 9.4.4

● No special client libraries

● Eventually factored out as extension

Page 9: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

100,000 feet

Produce

Process

Consume

Page 10: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

100,000 feet

Process

ConsumeSQLProduce

Page 11: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

100,000 feet

Process

ConsumeSQLProduce

AggregationFilteringSliding windows

Page 12: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

100,000 feet

Process

ConsumeSQLProduce

AggregationFilteringSliding windows

= Reduction

Page 13: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Why did we build it?

Page 14: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Produce

Process

Consume

Page 15: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

ProduceConsume

Page 16: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

ProduceConsume

PipelineDBcontinuous view

Page 17: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

ProduceConsume

PipelineDB

Simplicity is nice,but what else?

continuous view

Page 18: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Benefits of continuous SQL on streams

● Aggregate before writing to disk

Page 19: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Benefits of continuous SQL on streams

● Aggregate before writing to disk

total data ingested

database size

CREATE CONTINUOUS VIEW v AS SELECT COUNT(*) FROM stream

Page 20: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Benefits of continuous SQL on streams

● Aggregate before writing to disk

total data ingested

database size

CREATE CONTINUOUS VIEW v AS SELECT COUNT(*) FROM stream

(winning)

Page 21: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Benefits of continuous SQL on streams

● Sliding window queries

● Any information outside of the window is excluded from results and deleted from disk

● Essentially automatic TTL

CREATE CONTINUOUS VIEW v WITH (max_age = ‘1 hour’) AS SELECT COUNT(*) FROM stream

Page 22: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Benefits of continuous SQL on streams

CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream

● Probabilistic computations on infinite inputs

● Streaming Top-K, Percentiles, distincts, large set cardinalities● Constant space● No sorting● Small margin of error

Page 23: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

(Demo #1)

Page 24: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Internals (part 1/2)

Streams andWorkers

Page 25: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Streams

● Internally, a stream is Foreign Table

CREATE STREAM stream (x int, y int, z int);INSERT INTO stream (x, y, z) VALUES (0, 1, 2);

Page 26: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Streams

CREATE STREAM stream (x int, y int, z int);INSERT INTO stream (x, y, z) VALUES (0, 1, 2);

● Internally, a stream is Foreign Table

● System-wide Foreign Server called pipeline_streams

● stream_fdw reads from/writes to the Stream Buffer

● No permanent storage

● Stream rows only exist until they’ve been fully read

Page 27: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

Page 28: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

HeapTuple

HeapTuple

HeapTuple

HeapTuple

HeapTuple

● INSERT INTO ...

Page 29: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

HeapTuple

HeapTuple

HeapTuple

HeapTuple

HeapTuple

● INSERT INTO ...

● Concurrent circular buffer

Page 30: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

HeapTuple

HeapTuple

HeapTuple

HeapTuple

HeapTuple

● INSERT INTO ...

● Concurrent circular buffer

● Preallocated block of shared memory

Page 31: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

HeapTuple

HeapTuple

HeapTuple

HeapTuple

HeapTuple

● INSERT INTO ...

● Concurrent circular buffer

● Preallocated block of shared memory

HeapTuple {0,1,0,1,1}

Page 32: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

HeapTuple

HeapTuple

HeapTuple

HeapTuple

HeapTuple

● INSERT INTO ...

● Concurrent circular buffer

● Preallocated block of shared memory

HeapTuple {0,1,0,1,1}

HeapTuple {1,1,1,1,1}

Page 33: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

HeapTuple

HeapTuple

HeapTuple

HeapTuple

HeapTuple

● INSERT INTO ...

● Concurrent circular buffer

● Preallocated block of shared memory

HeapTuple {0,1,0,1,1}

HeapTuple {1,1,1,1,1}✗

Page 34: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

/* At Postmaster startup time ... */worker.bgw_main = any_function;worker.bgw_main_arg = (Datum) arg;

RegisterDynamicBackgroundWorker(&worker, &handle);

Page 35: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

HeapTuple

HeapTuple

HeapTuple

...

HeapTuple

SELECT count(*), avg(x) FROM stream

Page 36: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

HeapTuple

HeapTuple

HeapTuple

...

HeapTuple

SELECT count(*), avg(x) FROM stream

AGG

stream_fdw#GetStreamScanPlan

while (!BatchDone(node)){ tup = PinNext(buf) yield MarkAsRead(tup)}

1000

count

{1000, 4000}

avg

microbatch_result

Page 37: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Worker proc 0

Worker proc 1

Worker proc ...

Worker proc n

tuples round-robin’d acrossn worker procs

Worker process parallelism

Stream buffer

Page 38: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Internals (part 2/2)

IncrementalUpdates

Page 39: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

● transition_state = combine(microbatch_tstate, existing_tstate)

Page 40: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

● transition_state = combine(microbatch_tstate, existing_tstate)

● pipeline_combine catalog table maps combine functions to aggregates

Page 41: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

● transition_state = combine(microbatch_tstate, existing_tstate)

● pipeline_combine catalog table maps combine functions to aggregates

● No changes to pg_aggregate catalog table or existing aggregate functions

Page 42: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

● transition_state = combine(microbatch_tstate, existing_tstate)

● pipeline_combine catalog table maps combine functions to aggregates

● No changes to pg_aggregate catalog table or existing aggregate functions

● User-defined aggregates just need a combinefunc to be combinable

CREATE AGGREGATE combinable_agg(x)( sfunc=sfunc, finalfunc=finalfunc, combinefunc=combinefunc,);

Page 43: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

● transition_state = combine(microbatch_tstate, existing_tstate)

● pipeline_combine catalog table maps combine functions to aggregates

microbatch_result

{1000, 4000}

avg

1000

count

Page 44: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

● transition_state = combine(microbatch_tstate, existing_tstate)

● pipeline_combine catalog table maps combine functions to aggregates

microbatch_result

{1000, 4000}

avg

1000

count

{1000, 4000}

avg

1000

count

{5000, 10000}5000

combine()

Page 45: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

● transition_state = combine(microbatch_tstate, existing_tstate)

● pipeline_combine catalog table maps combine functions to aggregates

microbatch_result

{1000, 4000}

avg

1000

count

{1000, 4000}

avg

1000

count

{5000, 10000}5000

combine()

existing on-disk row

Page 46: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

● transition_state = combine(microbatch_tstate, existing_tstate)

● pipeline_combine catalog table maps combine functions to aggregates

microbatch_result

{1000, 4000}

avg

1000

count

{1000, 4000}

avg

1000

count

{5000, 10000}5000

combine()

{6000, 14000}

avg

6000

count

updated_result

existing on-disk row

Page 47: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))

Page 48: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))

/* dynamically generate a VALUES node */foreach(row, microbatch) values = lappend(values, hash_group(row));

Page 49: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))

/* dynamically generate a VALUES node */foreach(row, microbatch) values = lappend(values, hash_group(row));

set_values(lookup_plan, values)

Page 50: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))

/* dynamically generate a VALUES node */foreach(row, microbatch) values = lappend(values, hash_group(row));

set_values(lookup_plan, values)

existing = PortalRun(lookup_plan, ...)

/* now we’re reading to combine these on-disk tuples with the incoming batch result */

Page 51: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

● This query needs to be as fast as possible

● Continuous views indexed on a 32-bit hash of grouping

● Pro: maximize cardinality of the index keyspace, great for random perf

● Con: must deal with collisions programmatically

SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)

Page 52: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)

● If the grouping contains a time-based column, we can do better

CREATE ... AS SELECT day(timestamp), count(*) FROM stream GROUP BY day

Page 53: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)

● If the grouping contains a time-based column, we can do better

● These continuous views are indexed with 64 bits:hash of grouping

Timestamp from group (32 bits) Regular 32-bit grouping hash

CREATE ... AS SELECT day(timestamp), count(*) FROM stream GROUP BY day

Page 54: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)

● If the grouping contains a time-based column, we can do better

● These continuous views are indexed with 64 bits:hash of grouping

● Pro: most incoming groups will have a similar timestamp, so better index caching

● Con: larger index footprint

Timestamp from group (32 bits) Regular 32-bit grouping hash

CREATE ... AS SELECT day(timestamp), count(*) FROM stream GROUP BY day

Page 55: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

microbatch result generated from stream by worker✔

Page 56: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

microbatch result generated from stream by worker

existing result retrieved from disk

✔✔

Page 57: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

microbatch result generated from stream by worker

existing result retrieved from disk

✔✔

combine_plan = get_plan(SELECT group, combine(count), combine(avg) FROM microbatch_result UNION existing GROUP BY group);

combined = PortalRun(combine_plan, ...)

foreach(row, combined){ if (new_tuple(row)) heap_insert(row, …); else heap_update(row, …);}

Page 58: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

stream buffer query on microbatch incremental table update

microbatch result generated from stream by worker

existing result retrieved from disk

✔✔

combine_plan = get_plan(SELECT group, combine(count), combine(avg) FROM microbatch_result UNION existing GROUP BY group);

combined = PortalRun(combine_plan, ...)

foreach(row, combined){ if (new_tuple(row)) heap_insert(row, …); else heap_update(row, …);}

Page 59: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

grouping (a, b, c)

grouping (d, e, f)

grouping (g, h, i)

grouping (j, k, l)

On-disk groupings are sharded over combiners by group

Each row is guaranteed to only ever be updated by one combiner process

Combiner process parallelism

Continuous view

Page 60: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

(Demo #2)

Page 61: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Demo #2

● Wikipedia data summarized by hour and url● <hour | project | page title | view count | bytes served>

CREATE STREAM wiki_stream(

hour timestamp,project text,title text,view_count bigint,size bigint

);

CREATE CONTINUOUS VIEW wiki_stats ASSELECT project::text, sum(view_count::bigint) AS total_views,

fss_agg(title, 3) AS titlesFROM wiki_streamGROUP BY hour, project;

Page 62: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Tuning Tips

● continuous_query_num_workers○ n workers per server○ n workers reading microbatches for the same query

Page 63: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Tuning Tips

● continuous_query_num_workers○ n workers per server○ n workers reading microbatches for the same query

● continuous_query_num_combiners○ n combiners per server○ n combiners per continuous query

Page 64: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Tuning Tips

● continuous_query_num_workers○ n workers per server○ n workers reading microbatches for the same query

● continuous_query_num_combiners○ n combiners per server○ n combiners per continuous query

● continuous_query_batch_size○ Larger batch size will give better performance for aggregations

Page 65: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Tuning Tips

● continuous_query_num_workers○ n workers per server○ n workers reading microbatches for the same query

● continuous_query_num_combiners○ n combiners per server○ n combiners per continuous query

● continuous_query_batch_size○ Larger batch size will give better performance for aggregations

● step_size (string)○ CREATE CONTINUOUS VIEW v WITH (step_size = ‘1 second’, max_age = ‘1 minute’)...○ Larger step size will yield less granular sliding but better performance

Page 66: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

Tuning Tips

● continuous_query_num_workers○ n workers per server○ n workers reading microbatches for the same query

● continuous_query_num_combiners○ n combiners per server○ n combiners per continuous query

● continuous_query_batch_size○ Larger batch size will give better performance for aggregations

● step_size (string)○ CREATE CONTINUOUS VIEW v WITH (step_size = ‘1 second’, max_age = ‘1 minute’)...○ Larger step size will yield less granular sliding but better performance

● step_factor (integer in (0, 100])○ CREATE CONTINUOUS VIEW v WITH (step_factor = 10, max_age = ‘1 minute’)...○ Larger step factor will yield less granular sliding but better performance

Page 67: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental

?github.com/pipelinedb

[email protected]

Thanks!