Load-Balanced Query Dissemination in Privacy-Aware Online Communities
description
Transcript of Load-Balanced Query Dissemination in Privacy-Aware Online Communities
![Page 1: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/1.jpg)
Emiran Curtmola @ UC San DiegoAlin Deutsch @ UC San Diego
K.K. Ramakrishnan @ at&tDivesh Srivastava @ at&t
![Page 2: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/2.jpg)
SIGMOD, June 2010
DATA ONLINE COMMUNITIES
2
Typical such applications are centralized Hosted online communities Search engines
Limitations Disintermediation of publishers from queriers
Publishers need to give up their data Central site controls visibility of publishers to queriers
Publishers loose their right to privacy
![Page 3: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/3.jpg)
Free data exchange within the community Some users want to remain autonomous
User privacy (i.e., not all users may want to reveal their true identity)▪ Publishers express their opinions anonymously to
avoid association with sensitive or controversial issues (e.g., political, race, religion..)
User autonomy + privacy suggest a decentralized infrastructure
SIGMOD, June 2010 3
![Page 4: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/4.jpg)
Make safer to join and post data for publishers Prevent association of sensitive topics with publishers
that contribute to them even if compromised nodes
Publisher k-anonymity: For every publisher p and data item d, hide p in a
k-protected crowd of publishers: there are at least other k-1 potential publishers of the same d
SIGMOD, June 2010 4
![Page 5: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/5.jpg)
News & Blogs
Advertised data items about the publisher’s articles
P1 Beijing, Tibet, stocks, poverty, money
P2 Beijing, yak tea, Hong Kong, poverty
P3 Beijing, Tibet, yak tea, Hong Kong, money
P4 Beijing, Olympics, yak tea, stocks, money
P5 Beijing, Olympics, yak tea, stocks, money
P6 Olympics, Tibet, stocks, money
P7 Olympics, yak tea, stocks, money
P8 Olympics, yak tea, stocks, moneyQuery Q1: find the articles mentioning the Olympics in Beijing
Query Q3: find the articles mentioning poverty
Query Q2: find the articles about Tibet
Query Q4: find the articles that give the money in Hong Kong
P3
P8
P7 P6
P1
P2
P4
P5
The community data collection
local XML data
P3 local XML data
P4
local XML data
P8
local XML data
P2
local XML data
P5
local XML data
P1
local XML data
P6
local XML data
P7
SIGMOD, June 2010 5
How to query ad-hoc distributed data sources while preserving user privacy?How to query ad-hoc distributed data sources while preserving user privacy?
Allow publishers keep complete control over their data Disseminate queries in the network, not data Publishers answer queries at their own discretion Published data is not traceable back to publishers even if
compromised nodes
Allow publishers keep complete control over their data Disseminate queries in the network, not data Publishers answer queries at their own discretion Published data is not traceable back to publishers even if
compromised nodes
![Page 6: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/6.jpg)
Infrastructure setup such that Distribution of data Large nr. of decentralized publishers and
consumers User privacy
Efficient query routing (to avoid flooding the network)
SIGMOD, June 2010 6
![Page 7: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/7.jpg)
Build an overlay network to act as a distributed index
Peers are organized into logical query dissemination trees (QDTs)
Use QDTs to disseminate queries using node summaries
P1’s advertised set of terms: Beijing, Tibet, stocks, poverty, money
P1’s advertised set of terms: Beijing, Tibet, stocks, poverty, money
P2’s advertised set of terms: Beijing, yak tea, Hong Kong, poverty
P2’s advertised set of terms: Beijing, yak tea, Hong Kong, poverty
Node 3’s summary (set of terms) Beijing, Tibet, stocks, poverty, money, yak tea, Hong Kong
Node 3’s summary (set of terms) Beijing, Tibet, stocks, poverty, money, yak tea, Hong Kong
242118
1
8
9
1064 17 20 23
132
3 14 16
P4 P5
P6 P7 P8
P3P2P1
router
P publisher
union of its subtrees’ summariesunion of its subtrees’ summaries
SIGMOD, June 2010 7
![Page 8: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/8.jpg)
1
64
2
3
8
9
P410 17 20 23
242118
13
14 16
P5
P6 P7 P8
P3P2P1
Q3=“poverty”
Q3 Q3 Q3
Q3
Q3
Q3
Q3
Q3
Only P1 and P2
publish articles about poverty …poverty……poverty…
check set inclusion: query into node’s summary
Bloom FilterBloom Filter
SIGMOD, June 2010 8
Pruning
![Page 9: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/9.jpg)
Minimum information at each node▪ No node has global information
▪ Node summaries are vectors of counters (bloom filters) representing hash values of advertised data items
Queries reach publishers in such a manner that users do not know if publisher does not respond vs. does not have matching documents
SIGMOD, June 2010
1
64
2
3
8
9
P410 17 20 23
242118
13
14 16
P5
P6 P7 P8
P3P2P1
poverty…poverty…
9
Q3=“poverty”
![Page 10: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/10.jpg)
▪ If an edge node is compromised▪ Risk: Individual updates of node summaries (from publishers to edge routers) may expose the publishers
▪ Solution: publisher k-anonymity Hide users in protected crowds of at least k-publishers and...
SIGMOD, June 2010
1
644
2
3
8
9
P410 17 20 23
242118
13
14 16
P5
P6 P7 P8
P3P2P1
poverty…poverty…
10
Protected crowd
![Page 11: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/11.jpg)
▪ Solution: publisher k-anonymity Hide users in protected crowds of at least k-publishers and
use secure-multi party (SMP) computation inside crowds to advertise updates of published terms to the edge routers
SIGMOD, June 2010 11
4
P1P2
P3
+Up
d1
+Up
d1
+Upd
2
+Upd
2
+Upd
3
+Upd
3
+R+R
-R-R
Edge router 4
Publisher 3-anonymous protected crowd
Upd1 +Upd2 +Upd3
Upd1 +Upd2 +Upd3
![Page 12: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/12.jpg)
▪ If an internal node is compromised▪ Risk: Node summary of advertised terms is exposed → Downstream may contain sensitive content but the crowd of publishers is even bigger now..
SIGMOD, June 2010
1
64
2
33
8
9
P410 17 20 23
242118
13
14 16
P5
P6 P7 P8
P3P2P1
poverty…poverty…
12
Protected crowd
![Page 13: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/13.jpg)
The tree topology introduces congestion at upper QDT
levelsduring query dissemination
The tree topology introduces congestion at upper QDT
levelsduring query dissemination
How to relieve the congestion? How to relieve the congestion? SIGMOD, June 2010 13
![Page 14: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/14.jpg)
Overlaying multiple logical QDTs over the same underlay network A physical node belongs to multiple
logical QDTs but at different levels
Goal: organize the nodes into QDTs such that the distribution of tree levels for a node is uniform across the QDTs
SIGMOD, June 2010 14
![Page 15: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/15.jpg)
QDT1 QDT2
QDT3 QDT4
11
11
11
11
SIGMOD, June 2010 15
![Page 16: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/16.jpg)
Partition community data collection into disjoint blocks
Build one QDT tree per block B QDTi groups all publishers with terms in Bi
Routing a query Terms in query determine the relevant blocks Send query to the corresponding QDT Check the full query with publishers
Block
Terms
B1 Beijing , Olympics
B2 Tibet , yak tea
B3 Hong Kong , stocks
B4 poverty , money
…poverty……poverty…
QDT1
QDT2
QDT3
QDT4
SIGMOD, June 2010 16
Q3=“poverty” Q3 falls in B4 use QDT4:
![Page 17: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/17.jpg)
QDT1 QDT2
QDT3 QDT4
Q3=“poverty”
Q1=“Olympics”, “Beijing”
SIGMOD, June 2010 17
![Page 18: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/18.jpg)
Q4=“Hong Kong”, “money”
Route Q4 on both trees?
Query selectivity optimization techniques: Choose the selective QDT to route on by maintaining
only 1-3% of popular data items (see paper)
Block
Terms
B1 Beijing, Olympics
B2 Tibet, yak tea
B3 Hong Kong, stocks
B4 poverty, money
QDT3
QDT4
SIGMOD, June 2010 18
![Page 19: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/19.jpg)
Our solution Our solution SIGMOD, June 2010 19
![Page 20: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/20.jpg)
Empirical fact: Upper two levels in a QDT are the most congested
Model: cyclical permutation of nodes on the tree levels
nr of QDTs for load balance = nr of legal permutations (i.e.,
without breaking the fairness property)
Fairness property: all routers appear precisely once in the top two levels of any QDT
Fairness property: all routers appear precisely once in the top two levels of any QDT
SIGMOD, June 2010 20
![Page 21: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/21.jpg)
Overall throughput depends heavily on the most congested node
Look at node stress in terms of nr. of messages going into a node: Processing Load at a
node (PLoad) going out of a node: Forwarding Load at a
node (FLoad)
Throughput indicator: compare how far are
↔
SIGMOD, June 2010 21
PP
FF
peak load (k-QDTs)
ideal load (avg. load for 1-QDT =
)nr.msgsnr.nodes
![Page 22: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/22.jpg)
SIGMOD, June 2010 22
Experiment 1: PLoad for Scribe QDT topology Result: nr. QDTs for load balance found
experimentally coincides with that given by our analytical model
Load balance with▪ How close: 32% closest to ideal PLoad▪ How close: 923% closest to ideal FLoad
To balance FLoad, need node fanouts to be the same
Experiment 2: FLoad for fanout-balanced QDT topologies How close: 18% closest to ideal Pload How close: 130% closest to ideal FLoad
![Page 23: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/23.jpg)
Propose a novel publishing infrastructure
Empowers publishers to join and post without being associated with (sensitive) content
Generic solution: it extracts the maximum load balance supported by the QDT topology
SIGMOD, June 2010 23
![Page 24: Load-Balanced Query Dissemination in Privacy-Aware Online Communities](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813f57550346895daa1fd0/html5/thumbnails/24.jpg)
SIGMOD, June 2010 24