PIRS: Query Verification on Data Streams
description
Transcript of PIRS: Query Verification on Data Streams
![Page 1: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/1.jpg)
PIRS: Query Verification on Data Streams Ke Yi, Hong Kong University of Science and Technology Feifei Li, Florida State University Marios Hadjieleftheriou, AT&T Labs George Kollios, Boston University Divesh Srivastava, AT&T Labs
work done while the 1st and 2nd authors were working at AT&T labs.
![Page 2: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/2.jpg)
Publishing Data and Outsourcing Query Service
2
NetworkNetwork
Gigascope:analysis tool by
IP Traffic Streamcoming from
0 1 1 0 0 1 … 1 1 0 …
statistics
Results
![Page 3: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/3.jpg)
Revisiting the CISCO – AT&T Example
3
NetworkNetworkGigascopeIP Traffic Stream
0 1 1 0 0 1 … 1 1 0 …
statistics
lawyers: sign the trust agreementCould we help? (computer scientists)
![Page 4: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/4.jpg)
Concrete Example
Continuous Query:
SELECT SUM(packet_size) FROM IP_trace
GROUP BY srcIP, destIP
Answer:
4
pm p3 p2 p1. . .
IP Stream:
: srcIP, destIP, packet_size
1 2 3 . . . n
5 10KB 2KB 150KB . . . 5KB
10 11KB 130KB 1MB . . . 20KB
13 . . .Tim
e
Groups
![Page 5: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/5.jpg)
Continuous Query Verification (CQV) on Data Streams
5
1. Client register query2. Server reports answer
upon request Server maintains exact answer
Client maintains synopsis XBoth client
and server monitorthe same stream
Source of streams
Group 1
Group 2
Group 3
…
…
…
SELECT SUM(packet_size) From IP_TraceGROUP BY src_ip, dest_ip
![Page 6: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/6.jpg)
The Model for the Stream
6
n
ii mv
1
9|1 7|iS 1|1 …
0VT 0 0 0…V1 V2 V3 Vn
9 0
Vi
710
T=1 T=2 T=3
agg_attribute | group_id
![Page 7: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/7.jpg)
Continuous Query Verification: CQV
TVA
7
0VT 0 0 0…V1 V2 V3 Vn
9 0
Vi
710
9|1 7|iS 1|1 …
T=1 T=2 T=3
Update V
XT
Synopsis
Update X
0 0 2 0…V1 V2 V3 Vn
9 0
Vi
510 1
Alarm
TVA
0 0 0…V1 V2 V3 VnVi
710 1
no alarm
![Page 8: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/8.jpg)
PIRS: Polynomial Identity Random Synopsis
,max2,max mnpmn
PZa
pnaaaVX nvvvT mod)()2()1()( 21
8
choose prime p:
chose a random number :
)()(?
TT VXVXA
raise alarm if not equal
o/w no alarm
)()()(:ilityDecomposab baba VXVXVVX
![Page 9: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/9.jpg)
Incremental Update to PIRS
9
91 )1( aX
9|1 7|iS …
T=1 T=2
update to v1 update to vi
712 )( iaXX
An update to group i with value u could be done in logu time (exponential by squaring): )(1 iaXX
1|1
update to v1
123 )1( aXX
![Page 10: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/10.jpg)
It Solves CQV problem!
TT WV
alarm no raisesobvously W,V if 1. WV if 2.
10
Theorem: Given any PIRS raises an alarm
with probability at least 1-δ
nwnx
wx
wxxwf
nvnx
vx
vxxvf )(2)2(1)1()( ,)(2)2(1)1()(
WV iff )()( xfxf wv
a polynomial with 1 as the leading coefficient is completely determinedby its zeroes
Due to the fundamental theorem of algebra.
)()( ,WV if xfxf wv happens at no more than m values of x
Since we have p>m/ δ choices for a: the probability that X(V)=X(W) is at most δ
![Page 11: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/11.jpg)
Optimality of PIRS
11
Theorem: PIRS occupies O(log m/δ + log n) bits of space (3 words only at most, i.e., p, a, X(V)), spends O(1) time to process a tuple for count query, or O(log u) time to processa tuple for sum query.
Theorem: Any synopsis for solving the CQV problem witherror probability at most δ has to keep Ω(log minn,m/δ) bits.
![Page 12: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/12.jpg)
Multiple Queries
12
Q1 Q2
X1 X2
Q1 Q2
X
9|1,8S …
update to v1 update to v8
Theorem: our synopses use constant space for multiple queries.
V1..n1V1..n2 V1..(n1+n2)
![Page 13: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/13.jpg)
Handle the Load Shedding
13
Semantic Load Shedding: drop tuples from certain groups Small number of groups having errors
Random Load Shedding: All groups have small amount of errors
![Page 14: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/14.jpg)
CQV with Semantic Load Shedding
14
Randomly drop certain tuples according to groups
9|1 7|i 2|j 1|1 4|k …5|1
Server claims at most γ number of groups have errors
To detect if more than γ groups having errors!
We have designed synopses using O(γ log 1/δ log n) bits of space and achieve the error probability at most δ
![Page 15: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/15.jpg)
PIRSγ: An Exact Solution819.4for 1
21 cck
15
k mod p mody xie.g., ,,...,1 touniformly
,...,1 mapsfucntion hash t independen wise-pair a ,
k
nb
PIRS PIRS PIRS…
k buckets Alarm
v8
b(8)=2
If at least buckets raise alarms
PIRS PIRS PIRS…
…
log 1/δ
AlarmIf at least one layer raises alarms
21
![Page 16: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/16.jpg)
PIRSγ: An Exact Solution
16
Theorem: PIRSγ requires O(γ2 log1/δ logn) bits, spendsO( log1/δ ) time to process a tuple and solves CQV with semantic load shedding.
![Page 17: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/17.jpg)
Intuition on Approximation
17
number of errors
probability to raise alarm
γ
the ideal synopsis
γ- γ+
the approximation
![Page 18: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/18.jpg)
PIRS±γ: An Approximate Solution
18
Theorem: PIRS±γ requires O(γ log1/δ logn) bits, spendsO(γ log1/δ ) time to process a tuple.
![Page 19: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/19.jpg)
CQV with Random Load Shedding
19
Randomly drop tuples
All groups have small errors
To detect if any group has error greater than a claimed threshold
Theorem: Any synopsis solves this problem with errorprobability at most δ requires at least Ω(n) bits (reducingto the problem of estimating infinite frequency moment: the number of occurrence of the most frequent item).
![Page 20: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/20.jpg)
Sliding Window and Other Queries It is easy to extend PIRS to work with sliding
window model since it is decomposable, i.e., X(v1+v2)=X(v1)*X(v2).
Other queries that can be transformed into Group By aggregation queries.
Details in the paper.
20
![Page 21: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/21.jpg)
Some Experiments
21
We use real streams: World Cup Data (WC) IP traces from the AT&T network (IP)
We perform the following query: WC: Aggregate on response size and group
by client id/object id (50M groups) IP: Aggregate on packet size and group by
source IP/destination IP (7M groups) Hardware for the client:
2.8GHz Intel Pentium 4 CPU 512 MB memory Linux Machine
![Page 22: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/22.jpg)
Detection Accuracy
22
groups ofnumber
actual not the groups, ofnumber potential by the determined is
105.0/ hence, ,10,1022 9101964
n
pmmn
Over 100,000 random attacks, PIRS identifies all of them.
![Page 23: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/23.jpg)
Memory Usage of Exact
23PIRS using only constant 3 words (27 bytes) at all time.
Exact’s memory usage is linear and expensive.
![Page 24: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/24.jpg)
Update Time (per tuple) of Exact
24
1. Exact is fast when memory usage is small.2. It becomes extremely slow due to cache misses and memory
swap operations.
Cache misses and memory swap
![Page 25: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/25.jpg)
Running Time Analysis
25
WC IPs
Count 0.98 μs 0.98 μs
Sum 8.01 μs 6.69 μs
Average Update Time
IPs exhibits smaller update cost for sum query as the average value of u is smaller than that of WC
![Page 26: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/26.jpg)
Multiple Queries: Exact Memory Usage
26PIRS always using only constant 3 words (27 bytes).
Exact’s memory usage is linear w.r.t number of queries and increasing over time.
![Page 27: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/27.jpg)
Multiple Queries: Exact Update Time Per Tuple
27
![Page 28: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/28.jpg)
Multiple Queries: PIRS Update Time Per Tuple
28
![Page 29: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/29.jpg)
The Library
29
Download PIRS and other synopses at:
http://www.cs.fsu.edu/~lifeifei/pirs/
![Page 30: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/30.jpg)
Conclusion Space and Update efficient synopsis for
verifying continuous group-by aggregation queries on streaming data;
Could be generalized to handle selection query, and sliding-window semantics;
How about more complicated queries?
30
![Page 31: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/31.jpg)
Thanks!
31
Questions
![Page 32: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/32.jpg)
Problem and Goals
32
Assumption: Client and DSMS observe the same stream
Problem: Client needs to verify the results
Goals: Be memory, update efficient Tolerance for a limited number of errors Tolerance for small errors Support multiple queries
![Page 33: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/33.jpg)
Related Techniques to PIRS
33
Incremental Cryptography Block operation (insert, delete), cannot support
arithmetic operation Program Verification Server may pass the program execution but
simply return random outputs Fingerprinting Technique PIRS is a fingerprinting technique
![Page 34: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/34.jpg)
CQV with Semantic Load Shedding
|),( ii vviVVE
),( iff VVEVV
VV if -1least at alarm raises s.t. synopsisDesign
34
),( iff VVEVV
VV if alarm no raises and
![Page 35: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/35.jpg)
PIRS±γ: An Approximate Solution
)ln
1( wherec
VV
)ln
1( wherec
VV
35
Theorem: PIRS±γ: 1.raises no alarm with probability at least 1- δ on any
2.raises an alarm with probability at least 1- δ on any
For any c>-lnln2=0.367
Using the intuition of coupon collector problem
and the Chernoff bound.
![Page 36: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/36.jpg)
PIRS±γ: An Approximate Solution
kk ln s.t.,k choose
36
numbers randomt independne wise-n , ...1nbb
k,...,1in ddistributeuniformly
PIRS PIRS PIRS…
k buckets Alarm
vi
bi=2
If all k buckets raise alarms
PIRS PIRS PIRS…
…
log 1/δ
AlarmIf majority layers raise alarms
![Page 37: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/37.jpg)
Information Disclosure on Multiple Attacks
ron alarman raises PIRS |),( :witness RrVVW
WRVVW ),( :witness-non
PIRSby used seeds random of space :R
VVRVVW if , |),(|
37
VVRVVW if , |),(|
R
VV if
PIRS: X(V) on r
V turnsRe
Learns nothing about ralarman received and if VV ),( VVW
),( Learns VVWr
Insight: server could potentially gets rid of δ portion of seeds from each notified failed attack!
![Page 38: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/38.jpg)
Information Disclosure on Multiple Attacks
38
Bob
Theorem: For the total of k attacks made by Bob to PIRS, the probability that none of them succeeds is at least 1-kδ.
![Page 39: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/39.jpg)
Proof of the Optimality
..., 21)( ffFff ifp
i
39
MUfX : )()( assuming 21 fpfp
F fromfunction thedescribe tobits log
andoutput for bits logleast at needs
F
MX
)()(|,
VfVfFfFVV
VVF
fp
,f
)( :Xfor
![Page 40: PIRS: Query Verification on Data Streams](https://reader035.fdocuments.us/reader035/viewer/2022062410/568159c7550346895dc71948/html5/thumbnails/40.jpg)
Proof of the Optimality
)(1
k
i ifp
n2U
)log(log :else nF
)n(MlogF
40
kffFk ...consider and ,1 1
functionsk theseof outputs for the nscombinatio possible M of totalk
kMU hole,pigeon by
M1)logF(Ulog
))((log)log()1(Flog nMn