Research Case in Cloud Computing IST 501 Fall 2014 Dongwon Lee, Ph.D.

27
Research Case in Cloud Computing IST 501 Fall 2014 Dongwon Lee, Ph.D.

Transcript of Research Case in Cloud Computing IST 501 Fall 2014 Dongwon Lee, Ph.D.

Research Case in Cloud Computing

IST 501Fall 2014

Dongwon Lee, Ph.D.

2

Learning Objectives

Understand an important research case in cloud computing “Database as a Service”

Identify research methods used in the research case

cloudtimes.org

Research Case: The DAS Project

Database as a Service Project @ UC Irvine http://www-db.ics.uci.edu/pages/research/das/

Pioneering work of SaaS type research in data management Recipient of SIGMOD 10-year best paper

Two representative works: “Providing Database as a Service,” ICDE 2002 “Executing SQL over Encrypted Data in Database

Service Provider Model,” SIGMOD 2002

3

HERE

4

Objective

Want to store the data on “a server”

User Encrypted User DatabaseServer

User Data

But the problem is we do not trust “the server” for sensitive information!

encrypt the data and store it but still be able to run queries over the encrypted data do most of the work at the server

If the server is trusted, ICDE 2002, else SIGMOD 2002

Distrusted

5

Database as a Service

SaaS model for Database DB management transferred to service provider for

o backup, administration, restoration, space management, upgrades etc.

use the database “as a service” provided by Cloud computingo use SW, HW, human resources of CC, instead of your own

User Encrypted User Database

(Distrusted) Application Service Provider

User Data

Distrusted Server

6

Service Provider Architecture

Encrypted User Database

Query Translator

Server Site

Temporary Results

Query Executer

MetadataOriginal Query

Server Side Query

Encrypted Results

Actual Results

Service Provider

User

Client Site

Client Side Query ?

? ?

7

Relational Encryption

NAME SALARY

PID

John 50000 2

Marry 110000 2

James 95000 3

Lisa 105000 4

etuple N_ID S_ID P_ID

fErf!$Q!!vddf>></|

50 1 10

F%%3w&%gfErf!$ 65 2 10

&%gfsdf$%343v<l

50 2 20

%%33w&%gfs##! 65 2 20

Server Site

Store an encrypted string – etuple – for each tuple in the original table

This is called “row level encryption” Any kind of encryption technique can be used

Create an index for each (or selected) attribute(s) in the original table

8

Building the Index Partition function divides domain values into partitions (= buckets)

Partition (R.A) = { [0,200], (200,400], (400,600], (600,800], (800,1000] }

partitioning function has an impact on performance as well as privacy

2000 400 600 800 1000

2 7 5 1 4

Domain Values

Partition (Bucket) ids

Identification function assigns a partition id to each partition of attribute A

e.g. identR.A( (200,400] ) = 7 Any function can be use as identification function, e.g., hash

functions

9

Mapping Functions Mapping function maps a value v in the domain of attribute A to the

id of the partition which value v belongs to

e.g. MapR.A( 250 ) = 7, MapR.A( 620 ) = 1

2000 400 600 800 1000

2 7 5 1 4

Domain Values

Partition (Bucket) ids

10

Storing Encrypted DataR = < A, B, C > RS = < etuple, A_id, B_id, C_id >

etuple = encrypt ( A | B | C )

A_id = MapR.A( A ), B_id = MapR.B( B ), C_id = MapR.C( C )

NAME SALARY

PID

John 50000 2

Marry 110000 2

James 95000 3

Lisa 105000 4

Etuple N_ID S_ID P_ID

fErf!$Q!!vddf>></|

50 1 10

F%%3w&%gfErf!$ 65 2 10

&%gfsdf$%343v<l

50 2 20

%%33w&%gfs##! 65 2 20

Table: EMPLOYEE

Table: EMPLOYEES

11

Mapping Conditions

Q: SELECT name, pname FROM emp, proj

WHERE emp.pid=proj.pid AND salary > 100k

Server stores attribute indices determined by mapping functions

Client stores metadata and utilizes that to translate the query

Conditions: Condition Attribute op Value Condition Attribute op Attribute

Condition (Condition or Condition) | (Condition and Condition) | (not Condition)

12

Mapping Conditions (2)

Example:

Attribute = Value Mapcond( A = v ) AS = MapA( v )

Mapcond( A = 250 ) AS = 7

2000 400 600 800 1000

2 7 5 1 4

Domain Values

Partition Ids

13

Mapping Conditions (3) Attribute1 = Attribute2

Mapcond( A = B ) N (AS = identA( pk ) BS = identB( pl ))

where N is pk partition (A), pl partition (B), pk pl

Partitions

A_id

[0,100] 2

(100,200] 4

(200,300] 3

Partitions

B_id

[0,200] 9

(200,400] 8

C : A = B C’ : (AS = 2 and BS = 9) or (AS = 4 and BS = 9) or (AS = 3 and BS = 8)

14

Relational Operators over Encrypted Relations

Partition the computation of the operators across client and server Compute (possibly) superset of answers at the server Filter the answers at the client Objective : minimize the work at the client and process the answers

as soon as they arrive without requiring storage at the client

Operators studied: Selection Join Grouping and Aggregation Sorting Duplicate Elimination Set Difference Union Projection

15

Selection Operator

A=250

TABLE

2000 400 600 800 1000

2 7 5 1 4

Example:A=250

D

E_TABLE

A_id = 7

Client Query

Server Query

c( R ) = c( D (S

Mapcond(c)( RS

) )

16

Join Operator

C

EMP PROJ

C : A = B C’ :(A_id = 2 and B_id = 9)Or (A_id = 4 and B_id =

9)Or (A_id = 3 and B_id =

8)

Partitions A_id

[0,100] 2

(100,200] 4

(200,300] 3

Partitions B_id

[0,200] 9

(200,400] 8

R c T = c( D ( RS S

Mapcond(c) TS

)

Example:

C’

E_EMP E_PROJ

A=B

D

Client Query

Server Query

17

Query Decomposition

Client Query

Q: SELECT name, pname FROM emp, proj

WHERE emp.pid=proj.pid AND salary > 100k

Server Query

Encrypted(EMP)

Encrypted(PROJ)

salary >100k

name,pname

D

D

e.pid = p.pid

EMP

PROJsalary >100k

name,pname

e.pid = p.pid

18

Query Decomposition (2)

E_EMP

E_PROJ

salary >100k

D

D

E_EMP

E_PROJ

salary >100k

D

D

s_id = 1 v s_id =

2

e.pid = p.pid

e.pid = p.pid

name,pname

name,pname

Client Query

Server Query

Client Query

Server Query

19

Query Decomposition (3)

e.p_id = p.p_id

E_EMP

E_PROJ

salary >100k and e.pid = p.pid

D

s_id = 1 v s_id =

2

e.pid = p.pid

E_EMP

E_PROJ

salary >100k

D

D

s_id = 1 v s_id =

2

name,pname name,pname

Client QueryClient Query

Server Query Server Query

20

Query Decomposition (4)

Q: SELECT name, pname FROM emp, proj

WHERE emp.pid=proj.pid

AND salary > 100k

QS: SELECT e_emp.etuple, e_proj.etuple FROM e_emp, e_proj

WHERE e.p_id=p.p_id AND

s_id = 1 OR s_id = 2

QC: SELECT name, pname FROM temp

WHERE emp.pid=proj.pid

AND salary > 100k

e.p_id = p.p_id

E_EMP

E_PROJ

salary >100k and e.pid = p.pid

D

s_id = 1 v s_id =

2

name,pname

Client Query

Server Query

21

Experimental Evaluation

Data TPC-H database

Queries TPC-H Queries, versions of Q#6 and Q#3

Partitioning Strategy Equi-depth histograms for the first set of

experiments Equi-width histograms for the second set of

experiments

22

Effect of # of Buckets in Non-Join Query

Client and communications costs decreases with increasing number of buckets due to better filtering at the server

2 80

5

10

15

20

25

30

35

40

Cost Factors for Query Response Time

Client Side

Network

Server Side

Number of Buckets

Qu

ery

Re

spo

nse

Tim

e

23

Effect of # of Buckets in Non-Join Query

Single Server: Server is trusted and performs all operations including decryption on site

Shows that proposed query execution protocol doesn’t introduce significant overhead

2 80

5

10

15

20

25

30

35

Client/Server v.s. Single Server

Single Server Server Side

Client Side

Number of Buckets

Que

ry R

espo

nse

Tim

e

24

Effect of # of Buckets in Join Query

Single Server: Server is trusted and performs all operations including decryption on site

Consistent with the previous results showing proposed communication protocol doesn’t introduce significant overhead

1 75 100 150 250 300 500 750 1500

Client/Server v.s. Single Server

C/S

Single Server

Number of Buckets

Que

ry R

espo

nse

Tim

e

25

Findings Database-as-a-Service model is a promising

solution Proposed algebraic solution to provide data

privacy when cloud computing host is not trusted encrypts data, creates “coarse indexes” and stores

the data at the server allows only data owner to decrypt the data

With query decomposition most of query execution performed at the server client only performs filtering

CI Research Methods Used

26

Method SIGMOD 02Lit-Survey

FormalSolution

ExperimentalBuild

ProcessModel

Reference

“Executing SQL over Encrypted Data in Database Service Provider Model,” SIGMOD 2002

27