Authenticated Join Processing in Outsourced Databases
description
Transcript of Authenticated Join Processing in Outsourced Databases
Authenticated Join Processing in Outsourced Databases
Yin Yang, Dimitris Papadias, Stavros PapadopoulosHKUST, Hong KongPanos KalnisKAUST, Saudi Arabia
Providence, USA, 2009
Database Outsourcing
Advantages The data owner does not need the hardware /
software / personnel to run a DBMS The service provider achieves economy of scale The client enjoys better quality of service
A main challenge The service provider is not trusted, and may
return incorrect query results2
initial data
data updates
query
query resultsService Provider ClientData Owner
Query Authenticationinitial data
& signaturesdata updates
& signature updates
query
query results & VO
Service Provider ClientData Owner
The owner signs its data with a digital signature scheme
Given a query, the service provider attaches a VO (Verification Object) to the results
The client verifies query results with the VO and the owner’s signature soundness completeness
3
Example Queries
Purchase Customer pid cid quantity cid name cityp1 c1 20 c1 Tom New Yorkp2 c3 50 c2 Brian Londonp3 c2 80 c3 Susan Tokyop4 c1 200 c4 Jane New Yorkp5 c2 500 c5 Carl London
Range: σquantity>100PurchaseJoin: Purchase cidCustomerRange & Join :(σquantity>100Purchase) cid(σcity=“New York”Customer)
4
State of the Art
Range authentication: many solutions
Join authentication: few proposals Materializing join results into views AINL (presented in detail later)
Joins are inherently more complex than ranges A join combines information from
multiple tables Only individual tables are signed 5
Previous Work
Multi-dimensional range authentication Y. Yang, S. Papadopoulos, D. Papadias, G.
Kollios (BU) ICDE’08, VLDB J.
Continuous range authentication S. Papadopoulos, Y. Yang, D. Papadias VLDB’07, VLDB J.
Novel authentication framework S. Papadopoulos, D. Saccharidis, D. Papadias ICDE’09
6
Background
Concepts in CryptographyAuthenticated Data Structure (ADS)
Merkle Hash Tree MB-Tree
AINL
7
Concepts in Cryptography
One-way, collision-resistant hash functions h = H(m) Computationally infeasible to infer m from h, or to find two
m1, m2 with the same hash value h Example: SHA1, SHA2, …
Public-key encryption Two keys: private key sk, public key pk Public key to encrypt, private key to decrypt Example: RSA
Digital Signature Hard to forge without the secret key Signing: s = encrypt(H(m), sk) Verifying: check if H(m) = decrypt(s, pk)
8
Merkle Hash Tree (Merkle, Crypto’89)
A binary tree with hash values satisfying hn = H(hn.lc | hn.rc) Authenticates 1D range queries
Example: a query Q retrieves d4, d5
VO(Q) = {sroot, h1-2, d3, d4, d5, d6, h7-8} The client re-constructs hRoot bottom-up, and verifies the
signature
h1 h2 h3 h4 h5 h6 h7 h8
h1-2 h3-4 h5-6 h7-8
h1-4 h5-8
hRoot signed by the owner
d1 d2 d3 d4 d5 d6 d7 d8
N1-4
N3-4
N4
N1-2
N3
sent to the client
Q
9
Merkle B-Tree (Li et al. SIGMOD’06)
Merkle Hash Tree + B-TreeConceptually, a Merkle Hash Tree
with a large fanout (>2)10
s i s jsj-1s i+1 ...... ....
N3
s i-1 ... j+1s
......N4
......
N1 N2
...... ...
boundary records
roothash values by left traversal
hash values by right traversal
N3pointer toright sibling
N4
N1 N2
AINL
For binary joinsRequires ADS on the join attribute of
the inner relationReduces a join query into multiple
rangesAlgorithm
For every tuple in the outer relationPerform an authenticated range on the inner relation
11
Example of AINL
12
s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s12 s13 s14 s15
A B C D E
F G
RootSS
R.a (S.a )s11
R r1
1. r1, hF, h10, s11, s12, hE
2. r2, h1, s2, s3, s4, h5, h6, hC, hG
3. …
r2
Drawbacks of AINL
Large VO size |R| records from R (outer relation) 2|R|+|RS| records from |S| (inner relation) Numerous hash values Often larger than the combined size of R
and SHigh computation overhead at the
server and the client
13
NAI: A Naïve solution
The server transmits all the data to the client
The client performs the join locallyNAI often outperforms AINL
14
Proposed Methods
Binary join authentication AISM: requires ADS on one relation AIM: requires ADSs on both relations ASM: requires no ADS
Complex join query authentication Multi-way join Select-project-join
15
AISM: Query Processing
Sort the outer relation R on the join attribute
Transmit all tuples in R to the client in their verifiable order
Transmit the sort order R of R tuples on the join attribute
Incrementally traverse the ADS on S once with the R records
16
Example of AISM
17
s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s12 s13 s14 s15
A B C D E
F G
RootSS
R.a (S.a )s11
R[2]=4
VO: signature of R, root signature of TS, r1-r6 in their verifiable order1. R[1], h1, s2, s3, s4;2. R[2], h5, h6, hC, s10, s11, s12;3. R[3];4. R[4];5. R[5], h13, h14, s15; 6. R[6];
R[1]=2R[3]=6 R[4]=1 R[5]=3 R[6]=5
r2r1 r3
r4
r6
r5
AISM: Verification
The client checks R records correctness of the sort order R of R boundary records whether the re-constructed root hash of
TS matches its signature
18
AIM Query processing
Require ADSs on both relations Start with one relation R, traverse its ADS TR down
to the first tuple r1
Traverse TS until reaching the right boundary record s of r1
Traverse TR until reaching the right boundary record r of s
Alternatively traverse TS and TR similarly to the above
Verification: similar to AISM19
Example of AIM
s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s12 s13 s14 s15
A B C D E
F G
RootS
R
S
R.a(S.a)
H I
r1 r2 r5 r6
RootR
r4
s11
r3
20
VO: root signature of TS, root signature of TR, r1
1. hs1, s2, s3, s4;
2. r2;3. hs5
, hs6, hC, s10, s11, s12;
4. r3, r4;5. r5;6. hs13
, hs14, s15;
7. hr6;
ASM Idea
Sort-Merge-Join, sort at the server, merge at the client Query processing
Require no ADS Transmit both R and S in their verifiable order Sort R and S respectively on the join attribute Transmit the sort orders of R and S to the client Transmit bitmaps BR and BS to the client, indicating the
tuples with join partners Verification
correctness of the base relations / sort-orders / the bitmaps
21
Complex Query AuthenticationMulti-way joinsSelection-Projection-Join queries
22
Build a tree of binary join operators m-ASM / m-AISM / m-AIM optimized for multi-way joins Example:
A specialized algorithm AST applies when all relations are joined on the same attribute One single VO
Multi-Way Join
R S T
R S
R S T
AIM
AISM
VO(RS) VO(RST)
m-
m-
Op 1
Op 2
23
Example of m-AIM and m-AISM
A B C
RootR
T
R
R.a/S.a
t1 t2 t3
RootT
RS/S
r3r2 r4 r5 r6 r7 r8 r9
s1 s2 s3 s5
S
S.b/T.b[1] [2] [3]
RootSD E
r1
s4
[4]
Op
Op
1
2
VO(RS):{root signature of TR and TS, s1, s2; hA, r4, r5, r6; s3; s4; s5; hC}VO(RST):{root signature of TT, [1], t1, t2; [2]; [3]; [4]; ht3}
24
Example of AST
S[3]
r1 r2 r3 r4 r5 r6 r7 r8 r9
A B C
RootR
T
R
R.a/S.a/T.a
t 1 t 2 t 3
RootT
S S[1] S [2] S[4]
,VO: {root signature of TR and TT, signature of relation S, bitmap BS = “1000”, s1-s4 in a verifiable order, S[1], hr1, r2, r3, r4; t1, t2; S[2]; S[3]; S[4]; hr5, hr6, hC; ht3}
25
Selection-Projection-Join Query
cid
2: city=“New York ”
1: quantity>100
CustomerPurchase
cid
Purchase Customer
1: quantity>100
2: city=“New York”
cid
Purchase Customer
1: quantity>100
2: city=“New York ”
26
Selection Use the m- algorithms for joins
Projection Build a Merkle Hash Tree for each record
Query optimization
Experiments
27
Three synthetic relations R(a1, a2) S(a1, a2, b1, b2) T(b1, b2)
Queries R a1 S R a2 S (R a1 S) b1 T (R a2 S) b2 T
Foreign keys S.a1 references
R.a1
S.b1 references T.b1
Parameters Tuple size Cardinality of |
S|
Repeatability and Workability We participated in the ACM SIGMOD 2009
Repeatability & Workability Evaluation (cf., http://homepages.cwi.nl/~manegold/SIGMOD-2009-RWE/).
The reviewers were able to repeat all the experiments presented in our paper, yielding results that match the ones published in our
paper, except from insignificant and to be expected
variation due to randomness and/or hardware/software differences.
The detailed reports will shortly be made publicly available by ACM SIGMOD.
28
Evaluations of AINL
29
Tuple size (bytes) 32 64 128 256 512CVO (Gbytes) 8.9 9.0 9.2 9.6 10.3
CClient (seconds) 205 207 210 214 219CDSP (seconds) 262 271 429 1728 4603
|R| / |S| 0.1 0.5 1 2 5CVO (Gbytes) 7.8 8.9 9.2 9.5 9.7
CClient (seconds) 196 205 210 218 223CDSP (seconds) 296 311 429 540 647
Binary Join: Effect of Tuple Size
0
200
400
600
800
1000
1200
32 64 128 256 512
VO size (Mbytes)
32 64 128 256 5120
200
400
600
800
1000
1200 VO size (Mbytes)
NAIAISMASM AIM optimal
0
204060
80100120
140
32 64 128 256 51232 64 128 256 512
Total running time for the client (seconds)
0.1
1
10
100
1000
32 64 128 256 512
Total running time for the client (seconds)
0
20
40
60
80
100
120
32 64 128 256 512
Total running time for the DSP (seconds)
0
20
40
60
80
32 64 128 256 512
Total running time for the DSP (seconds)
30
Binary Join: Effect of |R| / |S|
NAIAISMASM AIM optimal
0.1 0.5 1 2 50
200
400
600
800VO size (Mbytes)
0. 1 0. 5 1 2 50
200
400
600
800 VO size (Mbytes)
0
20
40
60
80
100
0.1 0.5 1 2 5
Total running time for the client (seconds)
0
20
40
60
80
100
0. 1 0. 5 1 2 5
Total running time for the client (seconds)
0
20
40
60
80
100
0. 1 0. 5 1 2 5
Total running time for the DSP (seconds)
0
20
40
60
80
100
0. 1 0. 5 1 2 5
Total running time for the DSP (seconds)
31
Multi-way Join: Effect of Tuple Size
NAI-AISM+m -AISMm-ASM+m -ASMm -AIM+m -AISMm optimal
32 64 128 256 5120
200
400
600
800 VO size (Mbytes)
32 64 128 256 5120
200
400
600
800 VO size (Mbytes)
0
20
40
60
80
100
120
0. 1 0. 5 1 2 5
Total running time for the client (seconds)
0
20
40
60
80
32 64 128 256 512
Total running time for the client (seconds)
0
20
40
60
80
100
32 64 128 256 512
Total running time for the DSP (seconds)
0
20
40
60
32 64 128 256 512
Total running time for the DSP (seconds)
32
Multi-way Join: Effect of |S| / |R|
NAI-AISM+m -AISMm-ASM+m -ASMm -AIM+m -AISMm optimal
0. 1 0. 5 1 2 50
200
400
600
800 VO size (Mbytes)
0. 1 0. 5 1 2 5
VO size (Mbytes)
0
200
400
600
800
0
20
40
60
80
100
0. 1 0. 5 1 2 5
Total running time for the client (seconds)
0
20
40
60
80
100
0. 1 0. 5 1 2 5
Total running time for the client (seconds)
0
20
40
60
80
100
0. 1 0. 5 1 2 5
Total running time for the DSP (seconds)
0
20
40
60
80
100
0. 1 0. 5 1 2 5
Total running time for the DSP (seconds)
33
Conclusion Binary join authentication
AISM: authenticated structure on one relation AIM: authenticated structures on both relations ASM: no authenticated structure
Complex query authentication Multi-way join: eliminate unnecessary
intermediate VO elements Selection-projection-join query
Future Work Authenticated Structures specialized to joins Hash join instead of SMJ
34
Thank you!
Questions?
35