Authenticated Join Processing in Outsourced Databases

35
Authenticated Join Processing in Outsourced Databases Yin Yang, Dimitris Papadias, Stavros Papadopoulos HKUST, Hong Kong Panos Kalnis KAUST, Saudi Arabia Providence, USA, 2009

description

Providence, USA, 2009. Yin Yang , Dimitris Papadias, Stavros Papadopoulos H KUST, Hong Kong Panos Kalnis KAUST, Saudi Arabia. Authenticated Join Processing in Outsourced Databases. Database Outsourcing. Advantages - PowerPoint PPT Presentation

Transcript of Authenticated Join Processing in Outsourced Databases

Page 1: Authenticated Join Processing in Outsourced Databases

Authenticated Join Processing in Outsourced Databases

Yin Yang, Dimitris Papadias, Stavros PapadopoulosHKUST, Hong KongPanos KalnisKAUST, Saudi Arabia

Providence, USA, 2009

Page 2: Authenticated Join Processing in Outsourced Databases

Database Outsourcing

Advantages The data owner does not need the hardware /

software / personnel to run a DBMS The service provider achieves economy of scale The client enjoys better quality of service

A main challenge The service provider is not trusted, and may

return incorrect query results2

initial data

data updates

query

query resultsService Provider ClientData Owner

Page 3: Authenticated Join Processing in Outsourced Databases

Query Authenticationinitial data

& signaturesdata updates

& signature updates

query

query results & VO

Service Provider ClientData Owner

The owner signs its data with a digital signature scheme

Given a query, the service provider attaches a VO (Verification Object) to the results

The client verifies query results with the VO and the owner’s signature soundness completeness

3

Page 4: Authenticated Join Processing in Outsourced Databases

Example Queries

Purchase Customer pid cid quantity cid name cityp1 c1 20 c1 Tom New Yorkp2 c3 50 c2 Brian Londonp3 c2 80 c3 Susan Tokyop4 c1 200 c4 Jane New Yorkp5 c2 500 c5 Carl London

Range: σquantity>100PurchaseJoin: Purchase cidCustomerRange & Join :(σquantity>100Purchase) cid(σcity=“New York”Customer)

4

Page 5: Authenticated Join Processing in Outsourced Databases

State of the Art

Range authentication: many solutions

Join authentication: few proposals Materializing join results into views AINL (presented in detail later)

Joins are inherently more complex than ranges A join combines information from

multiple tables Only individual tables are signed 5

Page 6: Authenticated Join Processing in Outsourced Databases

Previous Work

Multi-dimensional range authentication Y. Yang, S. Papadopoulos, D. Papadias, G.

Kollios (BU) ICDE’08, VLDB J.

Continuous range authentication S. Papadopoulos, Y. Yang, D. Papadias VLDB’07, VLDB J.

Novel authentication framework S. Papadopoulos, D. Saccharidis, D. Papadias ICDE’09

6

Page 7: Authenticated Join Processing in Outsourced Databases

Background

Concepts in CryptographyAuthenticated Data Structure (ADS)

Merkle Hash Tree MB-Tree

AINL

7

Page 8: Authenticated Join Processing in Outsourced Databases

Concepts in Cryptography

One-way, collision-resistant hash functions h = H(m) Computationally infeasible to infer m from h, or to find two

m1, m2 with the same hash value h Example: SHA1, SHA2, …

Public-key encryption Two keys: private key sk, public key pk Public key to encrypt, private key to decrypt Example: RSA

Digital Signature Hard to forge without the secret key Signing: s = encrypt(H(m), sk) Verifying: check if H(m) = decrypt(s, pk)

8

Page 9: Authenticated Join Processing in Outsourced Databases

Merkle Hash Tree (Merkle, Crypto’89)

A binary tree with hash values satisfying hn = H(hn.lc | hn.rc) Authenticates 1D range queries

Example: a query Q retrieves d4, d5

VO(Q) = {sroot, h1-2, d3, d4, d5, d6, h7-8} The client re-constructs hRoot bottom-up, and verifies the

signature

h1 h2 h3 h4 h5 h6 h7 h8

h1-2 h3-4 h5-6 h7-8

h1-4 h5-8

hRoot signed by the owner

d1 d2 d3 d4 d5 d6 d7 d8

N1-4

N3-4

N4

N1-2

N3

sent to the client

Q

9

Page 10: Authenticated Join Processing in Outsourced Databases

Merkle B-Tree (Li et al. SIGMOD’06)

Merkle Hash Tree + B-TreeConceptually, a Merkle Hash Tree

with a large fanout (>2)10

s i s jsj-1s i+1 ...... ....

N3

s i-1 ... j+1s

......N4

......

N1 N2

...... ...

boundary records

roothash values by left traversal

hash values by right traversal

N3pointer toright sibling

N4

N1 N2

Page 11: Authenticated Join Processing in Outsourced Databases

AINL

For binary joinsRequires ADS on the join attribute of

the inner relationReduces a join query into multiple

rangesAlgorithm

For every tuple in the outer relationPerform an authenticated range on the inner relation

11

Page 12: Authenticated Join Processing in Outsourced Databases

Example of AINL

12

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s12 s13 s14 s15

A B C D E

F G

RootSS

R.a (S.a )s11

R r1

1. r1, hF, h10, s11, s12, hE

2. r2, h1, s2, s3, s4, h5, h6, hC, hG

3. …

r2

Page 13: Authenticated Join Processing in Outsourced Databases

Drawbacks of AINL

Large VO size |R| records from R (outer relation) 2|R|+|RS| records from |S| (inner relation) Numerous hash values Often larger than the combined size of R

and SHigh computation overhead at the

server and the client

13

Page 14: Authenticated Join Processing in Outsourced Databases

NAI: A Naïve solution

The server transmits all the data to the client

The client performs the join locallyNAI often outperforms AINL

14

Page 15: Authenticated Join Processing in Outsourced Databases

Proposed Methods

Binary join authentication AISM: requires ADS on one relation AIM: requires ADSs on both relations ASM: requires no ADS

Complex join query authentication Multi-way join Select-project-join

15

Page 16: Authenticated Join Processing in Outsourced Databases

AISM: Query Processing

Sort the outer relation R on the join attribute

Transmit all tuples in R to the client in their verifiable order

Transmit the sort order R of R tuples on the join attribute

Incrementally traverse the ADS on S once with the R records

16

Page 17: Authenticated Join Processing in Outsourced Databases

Example of AISM

17

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s12 s13 s14 s15

A B C D E

F G

RootSS

R.a (S.a )s11

R[2]=4

VO: signature of R, root signature of TS, r1-r6 in their verifiable order1. R[1], h1, s2, s3, s4;2. R[2], h5, h6, hC, s10, s11, s12;3. R[3];4. R[4];5. R[5], h13, h14, s15; 6. R[6];

R[1]=2R[3]=6 R[4]=1 R[5]=3 R[6]=5

r2r1 r3

r4

r6

r5

Page 18: Authenticated Join Processing in Outsourced Databases

AISM: Verification

The client checks R records correctness of the sort order R of R boundary records whether the re-constructed root hash of

TS matches its signature

18

Page 19: Authenticated Join Processing in Outsourced Databases

AIM Query processing

Require ADSs on both relations Start with one relation R, traverse its ADS TR down

to the first tuple r1

Traverse TS until reaching the right boundary record s of r1

Traverse TR until reaching the right boundary record r of s

Alternatively traverse TS and TR similarly to the above

Verification: similar to AISM19

Page 20: Authenticated Join Processing in Outsourced Databases

Example of AIM

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s12 s13 s14 s15

A B C D E

F G

RootS

R

S

R.a(S.a)

H I

r1 r2 r5 r6

RootR

r4

s11

r3

20

VO: root signature of TS, root signature of TR, r1

1. hs1, s2, s3, s4;

2. r2;3. hs5

, hs6, hC, s10, s11, s12;

4. r3, r4;5. r5;6. hs13

, hs14, s15;

7. hr6;

Page 21: Authenticated Join Processing in Outsourced Databases

ASM Idea

Sort-Merge-Join, sort at the server, merge at the client Query processing

Require no ADS Transmit both R and S in their verifiable order Sort R and S respectively on the join attribute Transmit the sort orders of R and S to the client Transmit bitmaps BR and BS to the client, indicating the

tuples with join partners Verification

correctness of the base relations / sort-orders / the bitmaps

21

Page 22: Authenticated Join Processing in Outsourced Databases

Complex Query AuthenticationMulti-way joinsSelection-Projection-Join queries

22

Page 23: Authenticated Join Processing in Outsourced Databases

Build a tree of binary join operators m-ASM / m-AISM / m-AIM optimized for multi-way joins Example:

A specialized algorithm AST applies when all relations are joined on the same attribute One single VO

Multi-Way Join

R S T

R S

R S T

AIM

AISM

VO(RS) VO(RST)

m-

m-

Op 1

Op 2

23

Page 24: Authenticated Join Processing in Outsourced Databases

Example of m-AIM and m-AISM

A B C

RootR

T

R

R.a/S.a

t1 t2 t3

RootT

RS/S

r3r2 r4 r5 r6 r7 r8 r9

s1 s2 s3 s5

S

S.b/T.b[1] [2] [3]

RootSD E

r1

s4

[4]

Op

Op

1

2

VO(RS):{root signature of TR and TS, s1, s2; hA, r4, r5, r6; s3; s4; s5; hC}VO(RST):{root signature of TT, [1], t1, t2; [2]; [3]; [4]; ht3}

24

Page 25: Authenticated Join Processing in Outsourced Databases

Example of AST

S[3]

r1 r2 r3 r4 r5 r6 r7 r8 r9

A B C

RootR

T

R

R.a/S.a/T.a

t 1 t 2 t 3

RootT

S S[1] S [2] S[4]

,VO: {root signature of TR and TT, signature of relation S, bitmap BS = “1000”, s1-s4 in a verifiable order, S[1], hr1, r2, r3, r4; t1, t2; S[2]; S[3]; S[4]; hr5, hr6, hC; ht3}

25

Page 26: Authenticated Join Processing in Outsourced Databases

Selection-Projection-Join Query

cid

2: city=“New York ”

1: quantity>100

CustomerPurchase

cid

Purchase Customer

1: quantity>100

2: city=“New York”

cid

Purchase Customer

1: quantity>100

2: city=“New York ”

26

Selection Use the m- algorithms for joins

Projection Build a Merkle Hash Tree for each record

Query optimization

Page 27: Authenticated Join Processing in Outsourced Databases

Experiments

27

Three synthetic relations R(a1, a2) S(a1, a2, b1, b2) T(b1, b2)

Queries R a1 S R a2 S (R a1 S) b1 T (R a2 S) b2 T

Foreign keys S.a1 references

R.a1

S.b1 references T.b1

Parameters Tuple size Cardinality of |

S|

Page 28: Authenticated Join Processing in Outsourced Databases

Repeatability and Workability We participated in the ACM SIGMOD 2009

Repeatability & Workability Evaluation (cf., http://homepages.cwi.nl/~manegold/SIGMOD-2009-RWE/).

The reviewers were able to repeat all the experiments presented in our paper, yielding results that match the ones published in our

paper, except from insignificant and to be expected

variation due to randomness and/or hardware/software differences.

The detailed reports will shortly be made publicly available by ACM SIGMOD.

28

Page 29: Authenticated Join Processing in Outsourced Databases

Evaluations of AINL

29

Tuple size (bytes) 32 64 128 256 512CVO (Gbytes) 8.9 9.0 9.2 9.6 10.3

CClient (seconds) 205 207 210 214 219CDSP (seconds) 262 271 429 1728 4603

|R| / |S| 0.1 0.5 1 2 5CVO (Gbytes) 7.8 8.9 9.2 9.5 9.7

CClient (seconds) 196 205 210 218 223CDSP (seconds) 296 311 429 540 647

Page 30: Authenticated Join Processing in Outsourced Databases

Binary Join: Effect of Tuple Size

0

200

400

600

800

1000

1200

32 64 128 256 512

VO size (Mbytes)

32 64 128 256 5120

200

400

600

800

1000

1200 VO size (Mbytes)

NAIAISMASM AIM optimal

0

204060

80100120

140

32 64 128 256 51232 64 128 256 512

Total running time for the client (seconds)

0.1

1

10

100

1000

32 64 128 256 512

Total running time for the client (seconds)

0

20

40

60

80

100

120

32 64 128 256 512

Total running time for the DSP (seconds)

0

20

40

60

80

32 64 128 256 512

Total running time for the DSP (seconds)

30

Page 31: Authenticated Join Processing in Outsourced Databases

Binary Join: Effect of |R| / |S|

NAIAISMASM AIM optimal

0.1 0.5 1 2 50

200

400

600

800VO size (Mbytes)

0. 1 0. 5 1 2 50

200

400

600

800 VO size (Mbytes)

0

20

40

60

80

100

0.1 0.5 1 2 5

Total running time for the client (seconds)

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the client (seconds)

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the DSP (seconds)

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the DSP (seconds)

31

Page 32: Authenticated Join Processing in Outsourced Databases

Multi-way Join: Effect of Tuple Size

NAI-AISM+m -AISMm-ASM+m -ASMm -AIM+m -AISMm optimal

32 64 128 256 5120

200

400

600

800 VO size (Mbytes)

32 64 128 256 5120

200

400

600

800 VO size (Mbytes)

0

20

40

60

80

100

120

0. 1 0. 5 1 2 5

Total running time for the client (seconds)

0

20

40

60

80

32 64 128 256 512

Total running time for the client (seconds)

0

20

40

60

80

100

32 64 128 256 512

Total running time for the DSP (seconds)

0

20

40

60

32 64 128 256 512

Total running time for the DSP (seconds)

32

Page 33: Authenticated Join Processing in Outsourced Databases

Multi-way Join: Effect of |S| / |R|

NAI-AISM+m -AISMm-ASM+m -ASMm -AIM+m -AISMm optimal

0. 1 0. 5 1 2 50

200

400

600

800 VO size (Mbytes)

0. 1 0. 5 1 2 5

VO size (Mbytes)

0

200

400

600

800

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the client (seconds)

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the client (seconds)

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the DSP (seconds)

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the DSP (seconds)

33

Page 34: Authenticated Join Processing in Outsourced Databases

Conclusion Binary join authentication

AISM: authenticated structure on one relation AIM: authenticated structures on both relations ASM: no authenticated structure

Complex query authentication Multi-way join: eliminate unnecessary

intermediate VO elements Selection-projection-join query

Future Work Authenticated Structures specialized to joins Hash join instead of SMJ

34

Page 35: Authenticated Join Processing in Outsourced Databases

Thank you!

Questions?

35