Secure Database System. Introduction Database-as-a-Service is gaining popularity – Amazon...
-
Upload
johnathan-heath -
Category
Documents
-
view
217 -
download
1
Transcript of Secure Database System. Introduction Database-as-a-Service is gaining popularity – Amazon...
Secure Database System
Introduction
• Database-as-a-Service is gaining popularity– Amazon Relational Database Service (RDS)– Microsoft SQL Azure
DB
Service provider (SP)
User
Query Query
AnswerAnswer
Security concerns
• Security now relies on SP solely– Will SP observe sensitive data of user and use the
data without authorization?– Will SP perform security measures as strict as its
own data?• User can also use encryption to protect its
own data, but then– How queries can be computed on encrypted
database?
Encrypt-Decrypt-Query (EDQ) model
• Baseline solution (but can handle all queries)
DB
Service provider (SP)User
Query
Answer
DB
DB
Weakness of EDQ model
• All query computation is done indeed by the user– High processing cost (due to decryption of large
portion of the database)– High communication cost
• SP has actually nothing to do, but just acts as a remote storage without processing power
Encrypt-Query-Decrypt (EQD) model
• More suitable to cloud environment
DB
Service provider (SP)User
Query Query
AnswerAnswer
Strengths of EQD model
• The answer is supposed to be much smaller than the entire database– Lower communication cost– Lower processing cost at user
• Challenge:– How to compute query on encrypted database?
Single EQD method approach
• A standalone encryption system is developed to address a particular query pattern
• Example: – Order-preserving encryption scheme (OPES) supports
comparison (E(x) > E(y) iff x > y)– RSA (E(x)E(y) = E(xy))
• Problem– We need to research and design a specific encryption for each
application!• Need a new encryption system for supporting WHERE X+Y > q• Need a new encryption system for supporting WHERE XY > q• …
Building database system based on single EQD method approach
• Example systems: – ODB model (NetDB2 with encryption) [SIGMOD
02, ICDE 02]– CryptDB [Commun. ACM 12]
• Limitations– Cannot support complex queries• Need to develop a new encryption method to support
each query pattern
Extensibility of the encryption system
• Each method (e.g. OPES, RSA) has its own encryption mechanism. The encrypted values by each method are not interoperable– The following query cannot be supported:• SELECT * WHERE price * quantity > 1000
– Attempt: first compute price*quantity (can be done by RSA)• The output is encrypted by RSA, but cannot be used by
OPES (not the same encryption)
How to achieve extensibility?
• Relational algebra– A few primitives are enough to build any queries
• Observation– Data interoperability (aka data interchangeability):
the result of one primitive operator can be used as input by other primitive operators
– Complex functions can be done using compositions of primitive operators
To enforce data interoperability
• There is only one encrypted data format• All operations operate on this format
• A similar secure mechanism with data interoperability– Using secure multiparty computation (SMC) with
secret sharing (Example: ShareMind)• Each data is split into shares and is distributed to multiple
parties. A distributed algorithm among all parties is executed and gives the result in shared form.
Illustration of SMC + secret sharing
Party 1
x: 3 y:8
Party 2
x: 2 y:4
Party 3
x: 5 y:-7
After some communications
Party 1
z: 13
Party 2
z: 6
Party 3
z: 6
Plain values:x = 10y = 5
Note:10 = 3 + 2 + 55 = 8 + 4 + (-7)
Plain values:z = (x – y)2
z = 25
SMC algorithms
Secret sharing
Generic operations in SMC
• Basic:– Addition– Multiplication
• Any operations that can be expressed as circuit can be computed– Addition on binary data can be regarded as XOR gate– Multiplication on binary data can be regarded as AND
gate– The two gates can form a universal gate which can
express any circuit
Weakness of SMC
• Require multiple non-colluding service providers– Higher cost to user due to more SPs– The assumption on non-colluding parties is hard to
realize in practice
Using the idea of SMC + secret sharing on encrypted database?
• Multiple parties vs client-server
• Same storage size (= original database size) for all parties– Secure share generation reduces the storage cost at user
Data Owner / User Cloud server
User Cloud server User Cloud server
Development of new operators
• Why?
• Our goal:– To develop (i) a secure share generator with (ii) its
corresponding operators
SMC Secure database system
Operations are done between multiple parties
Operations are done between user and service provider (SP)
No privileged party User is privileged. Can observe any plain data and should always have a low cost in any computation
Shares in secret sharing are materialized in each party
Shares at user are not materialized but can be generated
Attack model
• Security is defined w.r.t. to an attack model• Chosen ciphertext attack (CPA)– Formally: an attacker can observe the ciphertext
of any chosen plaintext. But it is still computationally hard to recover the key
• Some remarks on CPA– CPA is also used in RSA– OPES cannot guard against CPA
DESCRIPTION OF ENCRYPTION MECHANISM
Encryption procedure
• Secret sharing– Multiplicative secret sharing– Given a plain value v, the share at user vk, and the share at
SP v’• v = vkv’ mod n (n is a parameter in share generating function)
• The share at user vk is called the item key of the value v– The item key of each cell in the table is different– Each item key can be identified by the row and column
• The encrypted value v’ is stored at SP
Encryption illustration
A B
1 2 3
2 4 1
Plain data
A B
1 8 9
2 16 11
A B
1 9 12
2 9 16
n=35
Item keys at user Encrypted values at SP
Number of item keys = number of values in the table
Secure item key generator
• We extend RSA as our generator• Each column has a column key <m, x> (private
values)• Each row has a row ID r (public value)• Item key: mgxr mod n– g: system parameter, chosen by user, can be public– n: the system parameter generated in RSA; n is a
composite number with two big prime factors• n is public
– m, x, r are non-zero random values < n• Note: n is at least 1024-bit value
Secure item key generator
• Item key: mgxr mod n• Example: Column key <1, 1>, g = 2, n = 35
Essentially a single parameter y = gx Keep in this form so as to support update
<1, 1>
User’s storage
Row ID Encrypted value
1 9
2 5
3 12
SP’s storage
Row ID Value
1 18
2 20
3 26
Plain data
Row ID Item key
1 2
2 4
3 8
Security of our item key generator
• Our generating function extends RSA function– Ours: mgxr mod n (r, n are public, gx is private)– RSA: ye mod n (e, n are public, x is private)
• Imagine m = 1, y = gx, e = r, the functions are equivalent
• Conclusion: – Our encryption is secure w.r.t. CPA
PRIMITIVE OPERATORS
General Procedure
C
1 y
2 z
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16Table schema, and column keys at user Encrypted values at SP
User interacts with SP to compute the answer.Cost at user must be lowC
<m, x>
1
2
The result is always a new column Note: The result C is interoperable with
encrypted columns A, B, e.g., B+C can be computed by a further addition
Overview of primitive operators
• Operations between columns– Multiplication (SELECT A * B)– Addition (SELECT A + B)– We will show that the above two are enough to support generic
function evaluation• Note: above operations assume both inputs are encrypted
– We are interested in operations between plain and encrypted columns• Non-sensitive columns should not be encrypted
– Special case: one of the operands is constant• Encrypt-constant operation (SELECT 10 * A)
List of basic primitive operators
• Encrypt-encrypt multiplication• Encrypt-encrypt addition• Encrypt-constant multiplication• Encrypt-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation (to support some of the above
operators)– Power– Key shuffling
Illustration 1Encrypt-encrypt multiplication
• C=AB (SELECT A*B AS C)• In some row r, the values of A, B are a, b
– a = aka’ (ak: item key at user, a’ encrypted value of a)
– b = bkb’ (bk: item key at user, b’ encrypted value of b)
• c=ab = (akbk) (a’b’) mod nA B
1 2 3
2 4 1
Plain data
A<4, 1>
B<1, 3>
A B
1 9 31
2 9 29Table schema, and column keys at user Encrypted values at SP
C
34
16g=2n=35
Can be done by SP
Item keys are not materialized at user. User operates on column key level
Encrypt-encrypt multiplication
A B
… … …
r 4*gr mod 35 1*g3r mod 35
… … …
A<4, 1>
B<1, 3>
Table schema, and column keys at user
Item key table
C
…
(4*1)*(g1+3)r mod 35
…
C<4, 4>
Encrypt-encrypt multiplication - Result
A B
1 2 3
2 4 1
Plain data
A<4, 1>
B<1, 3>
A B
1 9 31
2 9 29Table schema, and column keys at user Encrypted values at SPn=35
g=2
C<4, 4>
C
1 34
2 16
Result: C
1 29
2 9
C=AB
6
4
Answer
Security: No information about item keys of A and B is sent to SP
User SP
Illustration 2Encrypt-encrypt addition
• C=A+B (SELECT A+B AS C)• In some row r, the values of A, B are a, b
– a = aka’ (ak: item key at user, a’ encrypted value of a)
– b = bkb’ (bk: item key at user, b’ encrypted value of b)
• c=a+b = (aka’) + (bkb’) mod n
A B
1 2 3
2 4 1
Plain data
A<4, 1>
B<1, 3>
A B
1 9 31
2 9 29Table schema, and column keys at user Encrypted values at SP
We must combine ak and a’ to compute addition. But ak is not materialized (generated by A’s key)Send A’s key to SP in a protected way.
Encrypt-encrypt addition• C=A+B (SELECT A+B AS C)• In some row r, the values of A, B are a, b
– a = aka’ (ak: item key at user, a’ encrypted value of a)
– b = bkb’ (bk: item key at user, b’ encrypted value of b)
• c=a+b = (aka’) + (bkb’) mod nIn the end, c should be also encrypted like other values, i.e., c = ckc’ mod n
• ckc’= (aka’) + (bkb’) mod n
• c’ = (ck-1ak)a’ + (ck
-1bk)b’ mod n
ck can be abstracted by C’s column key. User generates C’s key randomly
Remaining problem is to help SP compute c’
User prepares these two partsItem keys are not there yet, but can be abstracted at column key level
C <mc, xc>; A <ma, xa>At row r,
ck = mcgxcr mod nck
-1 = mc-1((gxc)-1)r mod n
ak = magxar mod nck
-1ak = mc-1ma (xc
-1xa)r mod n=> [ mc
-1ma, xa-xc]
Example
Hint for A
Hint for B
1 33 33
2 3 12
A B
1 2 3
2 4 1
Plain data
A<4, 1>
B<1, 3>
A B
1 9 31
2 9 29Table schema, and column keys at user Encrypted values at SP
C<3, 21>
1First, generate C’ key
C-1
<12, 3>
2 C’s inverse
3 Hint for A, BHint A
[13, 26]Hint B
[12, 29]
4 SP materializes the hints for every row
C
25
25
5 SP obtains encrypted values of C
C
5
5
C
1 31
2 17
Obtain the correct answers if we look at plain values
Generic Encrypt-encrypt operations
• With addition and multiplication, we can compute any function that can be expressed as a circuit
• All data is in binary form• It is sufficient to show that we can build a
universal gate (e.g., NAND gate) on top of binary data
Building NAND gate
• 1 – XY (multiplication and addition)– EE multiplication / EP multiplication (Z = XY)– EC addition (1 – Z)
• Any circuit can be expressed
X Y Result
0 0 1
0 1 1
1 0 1
1 1 0
Extension operators
• We also develop the following operators to support the following operations efficiently.
• Comparison– Example: Quantity * Price + One-TimeCost > 1M
• COUNT/SUM– Example: SELECT SUM(A+B) WHERE C > 30
• Join– Example: SELECT t1.A, t2.B WHERE t1.C = t2.C
• DELETE/UPDATE with predicates– UPDATE T1 SET A = A*1.1 + B WHERE C = 3000
Indexing
• Processing each tuple by linear scan is feasible but slow
• Indexing is needed• Note: index itself is a compromise of security– If certain tuples are filtered without any
processing, the attacker can obtain certain information about the data, e.g., a range about the data
Domain partitioning index [VLDB 04]
• The domain is partitioned into regions
Row ID Values
1 101
2 235
3 467
Partition ID
1
2
4
Query: SELECT … WHERE Values < 450
Query: SELECT … WHERE Values->ParitionID <= 4
Integration with existing DBMS
DBMS
Applications
SPUser
Query
SDB Client Layer SDB ServerLayer
QueryExecution
Plan
SecureOperators
SecureOperators
MemorySQL
Result
To enjoy existing features of DBMS, e.g., failure recovery
To wrap DBMS layer and provide our operators
A layer at userApplications simply use the database service using SQL
SQL
Partition Index is stored on DBMS
Index processing on SDB
• First process index, filter all disqualified tuples• Then, use cryptographic operation to compute
the actual answer
Index processing done by underlying DBMS
OperatorsDone by SDB layers
(Encrypted) Answers are sent to user
Note on SQL in applications
• The only difference– The application has to mention which columns
require encryption• CREATE TABLE
• Example:– CREATE TABLE Stud(ID, Name, Marks ENC)
Row ID ID Name Marks
1 … … …
2 … … …
Row ID ID Name Marks Marks_ind
1 … … … …
2 … … … …
Plain database Schema in underlying DBMS
Partition IDs, we will create an index on DBMS
Example
• SELECT C WHERE A * B + D > 20
A<…>
B<…>
C<…>
D<…>
Row ID A B C D
105 … … … …
278 … … … …Table schema, and column keys at user Encrypted candidate
tuples at SPn=35
A*B + D – 20 > 0
Row ID A_ind B_ind C_ind D_ind
1 … … … …
2 … … … … Filter by index
Example
• SELECT C WHERE A * B + D > 20
A<…>
B<…>
C<…>
D<…>
Row ID A B C D
105 … … … …
278 … … … …Table schema, and column keys at user Encrypted values at SPn=35
A*B + D – 20 > 0
E<…>
Column-column multiplication:E = AB
Column-column additionF = E + D – 20
Comparison
F<…>
Query execution plan done (with corresponding parameters)Note: E, F can be thrown away by user, since they are not needed in the result
Example
• SELECT C WHERE A * B + D > 20
A<…>
B<…>
C<…>
D<…>
Row ID A B C D
105 … … … …
278 … … … …Table schema, and column keys at user Encrypted values at SPn=35
SP receives the query planRow ID Answers?
105 No
278 Yes
337 No
129 No
… …
Execute the plan and find the answers
Projection on C only
Row ID C
278 3
776 12
… …
Encrypted answer sent back to userRow IDs must be there
Example
• SELECT C WHERE A * B + D > 20
A<…>
B<…>
C<…>
D<…>
Table schema, and column keys at user n=35
Row ID C
278 3
776 12
… …
Row ID C
278 9
776 9
… …
User computes own item keys
Encrypted answers
C
27
3
…
Decrypt
Experiment performance
• Comparison:– EDQ Model• SP filters the database by the index and sends all
candidate tuples to user; user decrypts the tuples and compute the query itself• This method has to be used when the query is outside
the supported query range in existing approach, e.g., ODB, CryptDB
– Our system: SDB
Insertion performance
• Data encryption speed
• Table schema– (A, B, C)• Encrypted: A, B
DB Size 100K 200K 300K 400K 500K
SDB 166.0 162.6 160.4 155.6 153.3
EDQ 335.5 343.0 328.9 333.7 342.8
Throughput: (Number of tuples per second)
Query performance
• SELECT A, B, C from test WHERE A + B < q.
100K 200K 300K 400K 500K0
100
200
300
400
SDBEDQ
Database size
Tim
e (s
)
100K 200K 300K 400K 500K0
0.5
1
1.5
2
2.5
SDB
Database sizeTi
me
(s)
Cost at user Cost at SP
Selectivity: 1%
Query performance
• SELECT SUM(A) from test WHERE A + B < q.
Cost at user Cost at SP
Selectivity: 1%
100K 200K 300K 400K 500K0
20406080
100120140160180200
SDBEDQ
Database size
Tim
e (s
)
100K 200K 300K 400K 500K02468
101214161820
SDB
Database sizeTi
me
(s)
Query performance
• SELECT * FROM test as t1, test as t2 WHERE t1.A = t.B AND t1.B < q1 AND t2.A < q2.
Cost at user Cost at SP
100K 200K 300K 400K 500K0
100200300400500600700800900
SDB
Database sizeTi
me
(s)
100K 200K 300K 400K 500K0
1000
2000
3000
4000
5000
6000
SDBEDQ
Database size
Tim
e (s
)
Conclusion
• We developed a new secure database system– Theoretically support generic operation– Support common database operations
• Range query, aggregation function, delete/update, join
– Our system is more efficient than naïve EDQ approach• Future work
– Query plan optimization– Development of more operators, especially on text data– Index improvement
• How to prepare/make use of the index for more complex queries?