T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya...
-
Upload
stanley-tate -
Category
Documents
-
view
216 -
download
0
Transcript of T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya...
![Page 1: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/1.jpg)
TOWARD PRACTICAL QUERY PRICING WITH QUERYMARKET
Paraschos KoutrisPrasang UpadhyayaMagdalena BalazinskaBill HoweDan Suciu
University of WashingtonSIGMOD 2013
![Page 2: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/2.jpg)
MOTIVATION
• Data is increasingly sold and bought on the web• Websites that sell data:
– Xignite (financial)– Gnip (social)
• Data marketplace services:– Windows Azure Marketplace – Infochimps– Factual – DataMarket
2
![Page 3: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/3.jpg)
A PRICING SCENARIO (1)
3
English-German dictionary T
PRICING SCHEMES
Sell the whole table T for a fixed price• Q: translate only the word “thanks”• The user pays for redundant information
Price per output tuple• Q: Does the word “thanks” translate to “Auto” ?• An empty result still carries information
english german
thanks Danke
car Auto
day Tag
road Strasse
Road Weg
… …
![Page 4: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/4.jpg)
A PRICING SCENARIO (2)
4
English-German dictionary T Word Frequency Stats UF
word frequency genre rank
rock 0.025 music 20
pop 0.030 music 10
database 0.001 science 1453
… … .. …
• Current systems do not sell queries that combine datasets• Queries issued by a user may have overlapping content
Q1: Return all translations to German of top 10 words in the genre “music”
Q2: Return all translations to German of top 20 words in the genre “music”
english german
thanks Danke
car Auto
day Tag
road Strasse
Road Weg
… …
![Page 5: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/5.jpg)
HOW TO PRICE DATA
5
english german
thanks Danke
car Auto
day Tag
road Strasse
road Weg
… …
English-German dictionary T
p(σT.english=‘thanks’)=$0.1
p(σT.english=‘day’)=$0.1
p(σT.english=‘road’)=$0.15
p(σT.english=‘cat’)=$0.05
Price points• selection queries on single table• exhaust the possible values (ColA) of some attribute A• may select on values not in the active domain
p(σT.english=‘car’)=$0.1 p(σT.german=‘Auto’)=$0.5
…
![Page 6: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/6.jpg)
QUERYMARKET: CONTRIBUTIONS
• A formal pricing framework where:– sellers specify a set of price points as selection queries– buyers can purchase any query on the database– the system automatically computes the price of the query
• Support efficient computation of prices for a large class of SQL queries
• Support the necessary functionality for a marketplace:– Pricing queries with overlapping information content– Database updates– Revenue sharing among different sellers?
6
![Page 7: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/7.jpg)
OUTLINE
1. The Pricing Framework2. Computing the Price 3. Query History4. Revenue Sharing
7
![Page 8: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/8.jpg)
THE PRICING FRAMEWORK
• The seller defines price points (view-price pairs): S = { (V1,p1), (V2,p2), … }
• A buyer can buy any query Q • The system will compute priceD
S(Q)
Seller
Price points
Buyer Q(D) ?
Pricing System+
Database D
priceDS(Q)
8
[Koutris et al., PODS 2012]
![Page 9: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/9.jpg)
PROPERTIES OF PRICES
Arbitrage-free: Given D, priceD(Q) is arbitrage-free if for all views V1, …, Vk that determine Q:
priceD(Q) ≤ priceD(V1) + … + priceD(Vk)
Discount-free: priceD(Q) must not offer additional discounts except for the explicit price points defined by the seller
9
We say that the views V1,…, Vk determine Q if one can compute Q(D) from V1(D),…, Vk(D) without access to D
![Page 10: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/10.jpg)
THE PRICING FORMULA
1010
Arbitrage-Price:• The price of the cheapest set of views from price points
S that determine the query Q• unique + arbitrage-free + discount-free + agrees with
price points
A
a1
A B
a1 b
a2 b
Table R Table SColA = { a1, a2, a3 }ColB = { b }
price = $1 price = $2 price = $3
• {σ[R.A=a1], σ[S.B=b] } determines Q • cost = 1 + 3 = 4
• {σ[R.A=a1], σ[S.A=a1] } also determines Q• cost = 1 + 2 = 3 (cheapest possible)
Q(y) = R(x),S(x,y)
![Page 11: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/11.jpg)
OUTLINE
1. The Pricing Framework2. Computing the Price 3. Query History4. Revenue Sharing
11
![Page 12: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/12.jpg)
COMPUTING THE PRICE
1212
• The problem of computing the arbitrage price even for SELECT-PROJECT-JOIN queries is coNP-complete
• For some queries, the price can be computed fast:• Selections, joins w/o projection
• We describe pricing as an Integer Linear Program (ILP) and then use fast ILP solvers (e.g. GLPK, CPLEX)
• Classes of queries supported:• Selections/Projections/Joins• Unions• User-Defined Functions (UDF)• Bundles of queries
![Page 13: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/13.jpg)
ILP CONSTRUCTION (1)
1313
• Price the query Q(x,y) = R(x), S(x,y)• Introduce a {0/1} variable x[attribute,value] for each
price point: x[R.A, a2], x[S.A, a1], x[S.B, b], …
A
a1
A B
a1 b
a2 b
Table R Table S ColA = { a1, a2, a3 }ColB = { b }
price = $1 price = $2 price = $3
![Page 14: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/14.jpg)
ILP CONSTRUCTION (2)
1414
• Minimize (independent of the query):price = x[R.A,a1] + x[R.A,a2] + x[R.A,a3] +2x[S.A,a1] + 2x[S.A,a2] + 2x[S.A,a3] +3x[S.B,b]
• Constraints:• (a1,b) in Q: x[R.A,a1] ≥ 1 x[S.A,a1] + x[S.B,b] ≥ 1• (a2,b) not in Q: x[R.A,a2] ≥ 1 • (a3,b) not in Q: x[R.A,a3] + x[S.A,a3] + x[S.B,b] ≥ 1
A
a1
A B
a1 b
a2 b
Table R Table S ColA = { a1, a2, a3 }ColB = { b }
Q(x,y) = R(x), S(x,y)
![Page 15: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/15.jpg)
ILP CONSTRUCTION (3)
1515
• Projection: Q(y) = R(x), S(x,y)• Constraints:
• (a1,b) in Qfull: x[R.A,a1] ≥ z1 x[S.A,a1] + x[S.B,b] ≥ z1
• (a2,b) in Qfull: x[R.A,a2] ≥ z2 x[S.A,a2] + x[S.B,b] ≥ z2
• (b) in Q : z1 + z2 ≥ 1
A
a1
a2
A B
a1 b
a2 b
Table R Table S ColA = { a1, a2, a3}ColB = { b}
New variable for eachtuple in Qfull
![Page 16: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/16.jpg)
QUERYMARKET SYSTEM
• Runs on top of any SQL database• Information stored in the database:
– Price points are stored in the database in price tables– Keeping track of price tables with an index table
• The dataset:– English-german translation: Ten,gr(w, w’)
– English-french translation : Ten,fr(w, w’)
– UDF to find hashtags : IsHashtag(w)– Word frequency stats : WF(w, genre, frequency, rank)
16
![Page 17: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/17.jpg)
PRICE COMPUTATION (1)
17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
2
4
6
8
10ILP solving time
ILP construction time
Query
Tim
e in
sec
ond
s
• Small dataset where columns have size ~ 102
selections 2-way joinsw/o projections
2-way joinswith projections
3-way join
![Page 18: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/18.jpg)
PRICE COMPUTATION (2)
18
• Larger dataset where columns have size ~ 103
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
10
20
30
40
50
60
70
80
ILP solving time
ILP construction time
Query
Tim
e in
sec
ond
s
selections 2-way joinsw/o projections
2-way joinswith projections
3-way join
![Page 19: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/19.jpg)
OUTLINE
1. The Pricing Framework2. Computing the Price 3. Query History4. Revenue Sharing
19
![Page 20: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/20.jpg)
QUERY HISTORY
• A user asks a sequence of queries over time of varying information overlap Q = Q1, Q2, …, Qk
• Experiment with 30 selection/join queries
20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
2
4
6
8
10
12
14
16
18
High Overlap
query
pric
e in
do
llar
s
Oblivious pricing: each query priced independently
Bundle pricing: each query Qi priced p(Q1,…,Qi)- p(Q1,…,Qi-1)
View pricing: when a query is purchased, the purchased views are free for later queries
![Page 21: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/21.jpg)
QUERY HISTORY (2)
21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
5
10
15
20
25
Moderate OverlapOblivious pricing
View pricing
Bundle pricing
query
pric
e in
do
llar
s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
2
4
6
8
10
12
14
16
Weak Overlap
Oblivious pricingBundle pricingView pricing
query
pric
e in
do
llar
s
![Page 22: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/22.jpg)
VIEW PRICING
• View Pricing is our proposed strategy:– Computationally efficient– Low storage overhead– Close to optimal (bundle) price
• View Pricing can be used for dynamic databases: if view V is purchased at some point and then updated, the user pays only an update price
22
![Page 23: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/23.jpg)
OUTLINE
1. The Pricing Framework2. Computing the Price 3. Query History4. Revenue Sharing
23
![Page 24: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/24.jpg)
REVENUE SHARING
• How is the revenue shared between sellers if several datasets contribute to the answer?
• What if the cheapest set of views to determine a query is not unique ?
• Example: – Q(‘sigmod13’) = isHashtag(‘sigmod13’), isNoun(‘sigmod13’)– Seller 1 prices $1 per entry for isHashtag, so does seller 2– If both isHashtag, isNoun are false and each costs $1, purchasing
either of the entries answers Q
24
![Page 25: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/25.jpg)
REVENUE SHARING: SOLUTION
• For a seller s, share(s, Q) is the maximum revenue of s over all minimum-cost set of price points that determine Q
• share(s, Q) can be computed in our framework• Solution: split price(Q) among sellers proportionally to
their shares• Example:
– Both shares are $1– The revenue of each seller will be $0.5, since their shares are equal
25
![Page 26: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/26.jpg)
CONCLUSIONS
• QueryMarket: the first system that supports pricing a large class of SQL queries within a formal framework
• We presented solutions to address the requirements of a real-world marketplace
• Future work includes:– Scaling the price computation (bucketization)– Full SQL Support (aggregates, negation)– Query answering under limited budget
26
![Page 27: T OWARD P RACTICAL Q UERY P RICING W ITH Q UERY M ARKET Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697c0011a28abf838cc27c6/html5/thumbnails/27.jpg)
Thank you !
27