SFPUG Lightning Talk
-
Upload
dan-robinson -
Category
Engineering
-
view
142 -
download
2
description
Transcript of SFPUG Lightning Talk
Retroactive Analytics With Distributed PostgreSQL
Dan RobinsonHeap
● Heap: web/iOS analytics that captures everything
● Heap: web/iOS analytics that captures everything
● Making this interactive is hard!
app_id user_id
properties HSTORE events HSTORE[]
12345 102756 email=>’[email protected]’, ab_test_grp=>’A’ ...
12345 300732 ab_test_grp=>’B’ ...
67890 628537 ...
49964 368868 utm_campaign=>’social’ ...
app_id user_id properties HSTORE events HSTORE[]
12345 102756 email=>’[email protected]’, ab_test_grp=>’A’
...
12345 300732 ab_test_grp=>’B’ ...
67890 628537 ...
49964 368868 utm_campaign=>’social’ ...
app_id user_id
properties HSTORE events HSTORE[]
75632 257186 ... ...
75632 120554 ... ...
app_id user_id
properties HSTORE
events HSTORE[]
……
users_001
users_002
users
app_id user_id properties HSTORE events HSTORE[]
12345 102756 email=>’[email protected]’, ab_test_grp=>’A’
...
12345 300732 ab_test_grp=>’B’ ...
67890 628537 ...
49964 368868 utm_campaign=>’social’ ...
……
users_001
users_002
users
app_id user_id
properties HSTORE events HSTORE[]
75632 257186 ... ...
75632 120554 ... ...
SELECT COUNT(*)FROM usersWHERE app_id = 12345GROUP BY events[1]->'path'
users_001
users
SELECT COUNT(*)FROM usersWHERE app_id = 12345GROUP BY events[1]->'path'
SELECT COUNT(*)FROM users_001WHERE app_id = 12345GROUP BY events[1]->'path'
● Denormalized → fast, no joins.
● Subqueries are just postgres.
● Add UDFs for more expressiveness.
funnel_events(events hstore[], pattern_array text[]) RETURNS int[]-- Returns an array with 1s corresponding to steps completed-- in the funnel, 0s in the other positions
funnel_events(events hstore[], pattern_array text[]) RETURNS int[]-- Returns an array with 1s corresponding to steps completed-- in the funnel, 0s in the other positions
SELECT sum( funnel_events( events, ARRAY['"path"=>"/","object"=>"pageview"', '"type"=>"submit","hierarchy"=>like "%@form;#signup;%"'] )) AS "funnel_results"FROM usersWHERE app_id = 12345
● Denormalized schema. (No joins.)
● CitusDB to distribute queries.
● Express any analysis with UDFs.
Questions?