PRIVACY-PRESERVING QUERY EXECUTION USING A DECENTRALIZED ARCHITECTURE AND TAMPER RESISTANT HARDWARE
Quoc-Cuong To, Benjamin Nguyen, Philippe Pucheral
SMIS Team
EDBT 2014Athens, March 24-28
University of Versailles St-Quentin INRIA RocquencourtCNRS
MASS-GENERATION OF (PERSONAL) DATA
2
Data sources have mostly turned digital Analog processes
• e.g., photography, films Paper-based interactions
• e.g., banking, e-administration Communications
• e.g., email, SMS, MMS, SkypeWhere is your personal data? … In data centers
112 new emails per day Mail servers 65 SMS sent per day Telcos 800 pages of social data Social networks Web searches, list of purchases google, amazon
DATA PRODUCED BY SECURE HARDWARE
3
Secure hardware is “everywhere”
Where is your personal data stored? … In data centers
CENTRALIZED VS DE-CENTRALIZED
Centralized solutions Privacy violation Internal & external
attacks on server Single point of attack
4
De-centralized solution Get rid of the
assumption of trusted central server
Distributed secure devices
ASYMMETRIC ARCHITECTURE: SECURE DEVICE
5
How to compute global queries on nation-wide dataset over decentralized personal data stores while respecting users’ privacy?
AuthorizedQuerier
Average energy
consumption of France
Secure Device (Trusted Data Server - TDS) Characteristics :• High security:
• High ratio Cost/Benefit of an attack;• Secure against its owner;
• Modest computing resources (~10KB of RAM, 120MHz CPU);
• Low availability: physically controlled by its owner; connects and disconnects at it will
OUTLINE Generic protocol & variations Information exposure analysis Experiment
6
THE GENERIC PROTOCOL
7
Querier
Supporting ServerInfrastructure (SSI)
…
SELECT <attribute(s) and/or aggregate function(s)>FROM <Table(s)>[WHERE <condition(s)>][GROUP BY <grouping attribute(s)>][HAVING <grouping condition(s)>][SIZE <size condition(s)>];
Collection phase
Aggregation phase
Stop condition: min #tuples or max time
John, 35K Mary, 43K Paul, 100K
HYPOTHESIS ABOUT QUERIER & SSIQuerier: Share the secret key with TDSs (for encrypt the query &
decrypt result). Access control policy:
Cannot get the raw data stored in TDSs (get only the final result) Can obtain only authorized views of the dataset
Supporting Server Infrastructure: Prior knowledge about data distribution. Honest-but-curious attacker: Frequency-based attack
SSI matches the plaintext and ciphertext of the same frequency. look at remarkable (very high/low) frequencies in dataset
distribution (e.g., Mr. X with high salary = 1 M€/month and there is only one distinct encrypted salary → Mr. X participates in the dataset). 8
RELATED WORKS Outsourced database services: simple
queries or high computing cost Statistical Database & Differential
privacy: trusting the server , produce approximate results
Secure Multi-party Computation: not scalable
Secure Data Aggregation in wireless sensor network: communicate with each other in order to form a network topology
First proposal achieving a fully distributed and secure solution to compute general SQL queries over a large set of participants
9
CLASSIFICATION OF SOLUTIONSWhich encryption is used, how the SSI constructs
the partitions, and what information is revealed to the SSI
Secure aggregation solution: nDet_Enc Noise-based solutions: Det_Enc + fake data
random (white) noise noise controlled by the complementary domain
Histogram-based solution: equi-depth histogram
10
Performance & Security
SECURE AGGREGATION
11
Supporting ServerInfrastructure (SSI)
…
encrypts its data using non-deterministic encryption
Form partitions
Hold partial aggregation (Gij,AGGk)
Querier
}
(Paris, 35K)
(#x3Z, aW4r)
(Lyon, 43K) (Nice, 100K)
Q: SELECT City, SUM(Energy) GROUP BY City HAVING SUM(Energy) > 50B
($f2&, bG?3)
(T?f2, s5@a)
(#x3Z, aW4r)($f2&, bG?3)($&1z, kHa3)…(T?f2, s5@a)
(#i3Z, afWE)(T?f2, s!@a)($f2&, bGa3)
(#x3Z, aW4r)($f2&, bG?3)($&1z, kHa3)
(?i6Z, af~E)(T?f2, s5@a)(5f2A, bG!3)
(Paris, 35K)(Lyon, 24K)(Lyon, 43K)
(Paris, 35K)(Lyon, 67K)
(F!d2, s7@z)(ZL5=, w2^Z)
Final Agg(#f4R, bZ_a)(Ye”H, fw%g)(@!fg, wZ4#)
(Paris, 912300M)(Lyon, 56000M)
Evaluate HAVING clause
Final Result(#f4R, bZ_a)(Ye”H, fw%g)
Qi= <EK1(Q),Credential,Size>
Decrypt Qi Check AC rules
Decrypt Qi Check AC rules
Decrypt Qi Check AC rules
NOISE-BASED PROTOCOLS nDet_Enc on AG SSI cannot gather tuples
belonging to the same group into same partition. Det_Enc on AG frequency-based attack. Add noise (fake tuples) to hide distribution of AG. How many fake tuples (nf) needed? disparity in
frequencies among AG small nf: random noise big nf: white noise nf = n-1: controlled noise (n: AG domain cardinality)
Efficiency: Each TDS handles tuples belonging to one group
(instead of large partial aggregation as in SAgg) However, high cost of generating and processing the
very large number of fake tuples
12
NEARLY EQUI-DEPTH HISTOGRAM Distribution of AG is
discovered and distributed to all TDSs.
TDS allocates its tuple to corresponding bucket.
Send to SSI: {h(AG),nDet_Enc(tuple)}
h(AG) = bucketID
13
Not generate & process too many fake tuples
Not handle too large partial aggregation
True Distribution Nearly equi-depth histogram
INFORMATION EXPOSURE (DAMIANI ET AL. CCS 2003)
14
INFORMATION EXPOSURE
15
_1 1 1
1 1 1/k kn
S Agg ji j jj
Nn N
SAgg: ICi,j = 1/Nj for all i,j
• n: the number of tuples, • k: the number of attributes, • ICi,j : the value in row i and
column j in the IC table• Nj: the number of distinct
plaintext values in the global distribution of attribute in column j (i.e., Nj ≤ n)
_1
min( ) 1/k
ED Hist jj
N
EDHist: requires finding all possible partitions of the plaintext values such that the sum of their occurrences is the cardinality of the mapped value: NP-Hard multiple subset sum problem Noise_based & ED_Hist have a uniform distribution of the AG: ɛED_Hist = ɛNoise_based
Plaintext: _1 1
1 1 1kn
P Texti jn
ɛS_Agg ≤ ɛED_Hist =ɛNoise_based <1
UNIT TEST
16
Internal time consumption
• 32 bit RISC CPU: 120 MHz• Crypto-coprocessor: AES, SHA• 64KB RAM, 1GB NAND-Flash• USB full speed: 12 Mbps }
METRICS FOR THE EVALUATION: TRADE-OFF BETWEEN CRITERIA
17
Total Load
Average Time/Load
Query Response Time
Information Exposure
Query Response Time
Resource Variation
WHICH ONE ?
18
S_Agg & ED_Hist: best solutions.
ED_Hist: E.g., medical folder; seldom connect; save resource for their own tasks.
S_Agg: smart meter; connect all time; mostly idle; not care resource.
FUTURE WORK Support external joins (i.e., joins
between several TDSs). Extend the threat model to (a
small number of) compromised TDSs
19
20
Top Related