PowerPoint-Präsentationeecs.ucf.edu/~jinyier/DASS2016/Koushanfar_SFE_DASS.pdf · 1986 2010 2011...
Transcript of PowerPoint-Präsentationeecs.ucf.edu/~jinyier/DASS2016/Koushanfar_SFE_DASS.pdf · 1986 2010 2011...
1 / 20
2
DNA report: Alice wants to learn about her DNA report without revealing her DNA to Bob. Bob also doesn’t want to share his genetic database or model.
Alice
3
Bob (Hospital)
DNA Genetic
Database
DNA report
How to achieve the DNA report w/o a third party while keeping the information private?
Goal: Compute function F() on private inputs X and Y
SFE X
Y
Z = F(X,Y) x1 y1 x2 y2 xn yn
Alice Bob
Our focus: 2-party computation based on (Yao’s) Garbled Circuits (GC) Protocol
[Yao1986]
Circuit generation and garbling as a primitive (inline with JustGarbled []) 4
Circuit-Based protocols
Operates on Boolean representation of function
Yao’s Garbled Circuit (GC) Protocol
Goldreich-Micali-Wigderson (GMW) Protocol
Yao, Andrew. "How to generate and exchange secrets." In Foundations of Computer Science, IEEE, 1986. Goldreich, Oded, Silvio Micali, and Avi Wigderson. "How to play any mental game." In STOC. ACM, 1987. Assaf Ben-David, Noam Nisan, and Benny Pinkas. FairplayMP: a system for secure multi-party computation. In CCS. ACM, 2008 Demmler, D, G Dessouky, F Koushanfar, A Sadeghi, T Schneider, and S Zeitouni. "Automated Synthesis of Optimized Circuits for Secure Computation." In CCS. ACM, 2015.
5
&
&
&
&
⊕
&
⊕
&
|
|
Example: Yao’s Millionaires problem circuit 3-bit comparison
Y1
X1 &
&
&
&
Y3
X3
⊕
&
⊕
&
|
| Y2
X2
Z
X Y F(X ,Y)
(private data) (private data)
Z
e.g. F(X < Y)
Garbled inputs
𝒙𝒊 = 𝒙𝒊𝟎, 𝑿𝒊 = 𝟎
&𝒙𝒊𝟏&, 𝑿𝒊 = 𝟏
Garbled Tables
Generate Logic Circuit C
&
&
&
&
⊕ &
⊕
&
|
|
𝑤1
𝑤2
𝑤3
𝑤4 𝑤5
𝑤6
𝑤7
𝑤8
𝑤9
𝑍&
𝒙𝟏𝟎
𝒚𝟏𝟎
𝒘𝟓𝟎
E(𝑥1𝟎, 𝑦1
0; 𝑤50)
𝒙𝟏𝟎
𝒚𝟏𝟏
𝒘𝟓𝟎
E(𝑥10, 𝑦1
1; 𝑤50)
𝒙𝟏𝟏
𝒚𝟏𝟎
𝒘𝟓𝟎
E(𝑥11, 𝑦1
0; 𝑤50)
𝒙𝟏𝟏
𝒚𝟏𝟏
𝒘𝟓𝟏
E(𝑥11, 𝑦1
1; 𝑤51)
𝒘𝟕𝟎
𝒘𝟓𝟎
𝒘𝟗𝟎
E(𝑤70, 𝑤5
0; 𝑤90)
𝒘𝟕𝟎
𝒘𝟓𝟏
𝒘𝟗𝟎
E(𝑤70, 𝑤5
1; 𝑤90)
𝒘𝟕𝟏
𝒘𝟓𝟎
𝒘𝟗𝟎
E(𝑤71, 𝑤5
0; 𝑤90)
𝒘𝟕𝟏
𝒘𝟓𝟏
𝒘𝟗𝟏
E(𝑤71, 𝑤5
1; 𝑤91)
Generate Garbled Circuit 𝐺𝐶
X3=1 , X2=1 , X1=0 𝒙𝟑
𝟏 , 𝒙𝟐𝟏 , 𝒙𝟏
𝟎
Y = (100)2 X = (110)2
𝒚𝟑𝟏 , 𝒚𝟐
𝟎 , 𝒚𝟏𝟎
𝒚𝟑
𝑤10
𝑤21
𝑤30
𝑤40 𝑤5
0
𝑤60
𝑤70
𝑤80
𝑤90
𝑍0
𝒙𝟑𝟏&
𝒚𝟐𝟎
𝒙𝟏𝟎&
Alice (Circuit Generator) Bob (Circuit evaluator)
𝑊1
𝑊2
𝑊8 𝑊6
𝑊9
𝑊7 𝑊4
𝑊5
𝑊3
𝒘𝟕𝟎
𝒘𝟓𝟎
𝒘𝟗𝟎
E(𝑤70, 𝑤5
0; 𝑤90)
𝒘𝟕𝟎
𝒘𝟓𝟏
𝒘𝟗𝟎
E(𝑤70, 𝑤5
1; 𝑤90)
𝒘𝟕𝟏
𝒘𝟓𝟎
𝒘𝟗𝟎
E(𝑤71, 𝑤5
0; 𝑤90)
𝒘𝟕𝟏
𝒘𝟓𝟏
𝒘𝟗𝟏
E(𝑤71, 𝑤5
1; 𝑤91)
𝒙𝟏𝟎
𝒚𝟏𝟎
𝒘𝟓𝟎
E(𝑥1𝟎, 𝑦1
0; 𝑤50)
𝒙𝟏𝟎
𝒚𝟏𝟏
𝒘𝟓𝟎
E(𝑥10, 𝑦1
1; 𝑤50)
𝒙𝟏𝟏
𝒚𝟏𝟎
𝒘𝟓𝟎
E(𝑥11, 𝑦1
0; 𝑤50)
𝒙𝟏𝟏
𝒚𝟏𝟏
𝒘𝟓𝟏
E(𝑥11, 𝑦1
1; 𝑤51)
𝒚𝟑𝟏
𝒙𝟐𝟏&
𝒚𝟏𝟎&
𝒙𝟑
𝒚𝟐 𝒙𝟐
𝒚𝟏 𝒙𝟏
𝒙𝟏𝟎
𝒚𝟏𝟎
𝒘𝟓𝟎
E(𝑥1𝟎, 𝑦1
0; 𝑤50)
𝒘𝟕𝟎
𝒘𝟓𝟎
𝒘𝟗𝟎
E(𝑤70, 𝑤5
0; 𝑤90)
6
Honest but curious
Malicious
7
8
1986
2010
2011
2012
2013
2015
Yao GC Protocol [Yao, FOCS]
FastGC
Library-based [HEKM, USENIX]
Bill. Gate 2-Party SFE Compiler-based [KSS, USENIX]
Portable Circuit Format (PCF) Compiler-based [KSMB, USENIX]
2-Party SFE on GPU HW-based [HMSG, ACSAC]
AES-NI JustGarble Hardware-based [BHKR, USENIX]
SELECTED
2-Party SFE in ANSI C Compiler-based [HFAK, CCS]
VMCrypt
Library-based [Malka, CCS]
TASTY Compiler-based [HKSSW, CCS]
GC for One-Time Prog. HW-based [CHES]
2-Party SFE Appl. on GPU HW-based [PL, eprint]
TinyGarble Sequential Logic Synthesis (S&P)
2004 FairPlay
Compiler-based [MNPS, USENIX]
2-Party SFE on GPU HW-based [FN, ACNS]
2009 2-Party SFE is practical
Compiler-based [PSNW, AISACCS]
Circuit Structures Library-based [ZE, S&P]
Custom high level procedural language SFDL (Secure Function Definition Language) compiled into a circuit description language, SHDL (Secure Hardware Description Language)
2014 RAM-Model
Crypto primitives [LHSHK, S&P] Mobile phone HW tokens
[DSZ, USENIX]
9
Compiler-based
Compile high-level description of functionality to optimized circuits
e.g., FairPlay, TASTY, KSMB, etc.
Library-based
custom-libraries with special fucntions for emitting Boolean circuits, built-in boolean circuits
e.g., FastGC, VMCrypt, etc.
Hardware-assisted
GPU based, AES-NI
10
Manual circuit-level optimizations Combinational logic
Prevents synthesis of large control-intensive circuits (e.g., SHA3)
Poor Scalability Memory exhaustion Circuit generation/evaluation time may exceed real-time constraints Loops unrolling and subroutines inlining High-level programming abstraction: circuits not compact or optimized
Lack of practical utility Users cannot comprehend the final circuit organization and
therefore cannot apply finer circuit optimizations
Only moderate size circuits are handled Some circuit sizes not feasible for embedded devices
11
Generating super compact and scalable circuits by Sequential logic description for functionality Introducing new transforms/libraries to enable
adapting classic HW synthesis techniques Improving best reported results by several orders of
magnitude Enabling implementation of circuits never reported
before
[1] Songhori (Koushanfar) et al., IEEE S&P ‘15 12
Row-reduction Reduce size of garbled truth table for non-XOR gates by 25% [1]
Free-XOR No garbled truth table for XOR gate needed[2][3]
Garbling with fixed-key block cipher No additional keys for gate output (unique tweak T per gate)[4]
Execution optimization Fast table lookups, pipelining[5][6]
[1] Naor et al. ACMEC‘99 [2] Kolesinkov et al. ICALP‘08 [3] Kolesnikov et al. Crypto‘14] [4] Bellare et al. S&P’13 [5] Järvinen et al. CHES‘10 [6] Haung et al. USENIX‘11
13
14
Sequential circuit: outputs are functions of both inputs and circuit states kept in memory elements, e.g., Flip Flops (FF)
X, Y := F(A, B, C)
Combinational Logic Circuit
A
B
C
X
Y
outp
uts
inpu
ts
Combinational Logic Circuit
A
B
C inpu
ts
outp
uts
X
Y
Combinational (Boolean) circuit: outputs are only functions of inputs
Combinational Logic Circuit B
C
X
Y outp
uts
FF
CLK
feedback
D A
inputs
mem
ory
X, Y := F(A, B, FF state)
15
cn HA FA FA FA
y0 x0 y1 x1 y2 x2
s0
yn-1 xn-1
s1 s2 sn-1
c1 c2 c3 cn-1
FA
FF
CLK
xi yi
si
ci+1 ci
Combinational n-bit adder: 1 HA and (n-2) FAs n-dependent
Sequential n-bit adder: 1 FA and 1 FF (feedback to re-use same FA)
Sequential circuit size independent of input size (n)
More compact and scalable than
combinational design
si
ci+1
xi
yi
ci
Full Adder
xi
yi
si
ci+1
Half Adder
16 / 20
Nearest Neighbor Search (NNS): Mapped perfectly to sequential circuit
ǁ x-d[i] ǁ min
FF
CLK
x d[i] o[i]
FF keeps index and value of closest element to x at every
clock cycle
size = O(1) independent of
database size
compact and scalable circuit representation
17
(*.c/*.cpp) (*.c/*.cpp) High Level Synthesis
(*.v/*.vhdl) (*.vhdl/*.v)
HDL Logic Synthesis
Netlist (*.v) netlist (*.v)
Superfolding Customized Synthesis
Library
User synthesis constraints
Scheduler
Simple Circuit Description
(*.scd)
Evaluation
Garbling
23
Commerical Synopsys Design Compiler
or Open-source
Berkeley ABC tool
Offline synthesis exploits established logic optimization techniques, as opposed to online generation, garbling and evaluation
(*.c/*.cpp) (*.c/*.cpp) High Level Synthesis
(*.v/*.vhdl) (*.vhdl/*.v)
HDL Logic Synthesis
Netlist (*.v) netlist (*.v)
Superfolding Customized Synthesis
Library
User synthesis constraints
Scheduler
Simple Circuit Description
(*.scd)
Evaluation
Garbling
19
20
[1] Waksman et al., 1968 [2] Valiant, 1976 [3] Sander et al., 1999
[1] Gentry et al., CRYPTO’10 [2] Kolesnikov et al., FC’08.
Private
function PF-SFE
data
Processor(Private functionA, dataB) x1 y1 x2 y2 Alice Bob
Alice has a private functionA(.)
Bob has dataB
Bob want to learn private functionA(dataB) without telling Alice what dataB is
Alice does not want Bob to know functionA(.)
21
MIPS is a RISC μP
Low overhead: relatively small # of gates (~ 13K)
non-XORs with support of integer multiplication and shift)
Open-source and available online*
Available cross compiler: gcc-mips
*Plasma project in opencoures.org
The first scalable implementation of a PFE-SFE on a real processor architecture
22
23
Efficient and optimized frameworks are too difficult to work for non-expert users. Requires logic circuit design knowledge
High-level GC languages and compiler are extremely inefficient. Compared to a hand and logic synthesis optimized circuits
Bridge the gap Between efficient-and-hard and inefficient-and-easy GC
framework
24
Garbling a processor [TinyGarble S&P’15] To solve Private Function SFE (PF-SFE), where along with input data,
the function is also private. Programmed with high-level languages and conventional compilers
How about using it for SFE Challenges
Extremely costly since hides the function (PF-SFE)
25
Support for SFE without paying PF-SFE cost
Performance-privacy trade-off
By relaxing privacy and improving performance
Supports private, semi-private, or public functions
While enjoying simplicity of programming a processor
Hardware implementation of GC protocol
Increase performance
26
[Songhori et al., GarbledCPU, DAC’16] Talk this Wednesday 1:30-3pm Session
27
28
Combinational circuit implementations
Sequential circuit implementations
Compared with equivalent combinational circuit
Compared with sequential circuit implementations of different circuit folding
Compared with reference circuit implementations in other works
Report functions not implementable earlier
e.g. SHA3 function
29
Circuit Size Efficiecny (CSE):
Garbling Time: number of permutation function calls for non-XOR gates (PFC)
CS0 size of reference circuit
PFC = 4 x (#non-XOR) x c
𝑃𝐹𝐶&−𝑃𝐹𝐶0
𝑃𝐹𝐶0
&× 100 = PFD
PFC0= PFC for reference circuit Permutation
function π called 4 times for each non-XOR gate
c : number of sequential cycles
c = 1 for
combinational circuits
≈ estimated garbling (communication and computation) time
negative PFD indicates reduced circuit garbling time in
comparison to reference circuit
CSE = 𝑠𝑖𝑧𝑒&𝑜𝑓&𝑟𝑒𝑓&𝑐𝑖𝑟𝑐𝑢𝑖𝑡&
𝑆𝑖𝑧𝑒&𝑜𝑓&𝑇𝑖𝑛𝑦𝐺𝑎𝑟𝑏𝑙𝑒𝑑=&
𝐶𝑆0
𝐶𝑆&× 100
30
Function CSE PFD
16384-bit Compare 1.49 -49%
160-bit Hamming 3.55 -58%
128-bit Sum 2.28 -63%
256-bit Sum 2.32 -65%
1024-bit Sum 2.35 -66%
64-bit Mult 9.26 -84%
128-bit Mult 8.88 -84%
256-bit Mult 7.30 -60%
Our combinational circuit size for 64-bit multiplication is 9.26 times smaller
than results reported in KSMB
Garbling time is reduced by 84% for the same circuit as opposed to KSMB [due to reduced number of non-XOR
gates]
Relative improvement in Memory footprint
% Reduction in # of non-XORs: Communicated labels (BW)
[1] Kreuter et al., USENIX’13 31
0
20000
40000
60000
80000
100000
64-bit Mult. Sequential(this work) c = 16
64-bit Mult. Combinational(this work) c = 1
64-bit Mult. (KSMB)
Tota
l Nu
mb
er
of
Gat
es
Non-XOR Gates XOR Gates
CSE = 15.5
CSEKSMB = 143.5
32
Performance compared to reference circuit from [Kreuter et al., USENIX 2013]
Evaluation of benchmark functions , e..g, sum, Hamming, RSA, and compare show a similar improvement in memory footprint (several orders of magnitude),
and in number of communicated labels (3-4 times)
Relative improvement in Memory footprint (CSE)
% Reduction in # of non-XORs: Communicated labels (PFD)
Metrics:
33
0
20000
40000
60000
80000
100000
120000
140000
160000
1600-bit SHA3 (c = 24) 1600-bit SHA3 (c = 12) 1600-bit SHA3 (c = 6) 1600-bit SHA3 (c = 1)
Tota
l Nu
mb
er
of
Gat
es
Non XOR Gates XOR Gates
34
0.0625
0.25
1
4
16
64
256
1024
4096
0
2
4
6
8
10
12
14
CS
(K
B)
CP
U c
ycl
e ×
106
Sequential Cycle (c)
Garbling Time CS
35
Lite MIPS VI with support of simple instructions plus mult and shift and 256B instruction ROM and 256B Data RAM. # of non-XOR = 3,536
# of XOR = 526
Hamming Distance Benchmark: Distance of two arrays with the length of l (A[l] and B[l])
# of instruction: m = 9 + 9*l
For example when l = 32: Total π function calls: m*#of non-XOR = 1M
Total network communication: m*#of non-XOR*sizeof(garbled table) = 50MB
36
37
The first efficient, scalable, and practical privacy-preserving k-nearest neighbors (k-nn) search
None of the parties reveal their information while they can still cooperatively find the nearest matches
The circuits small enough to fit within an embedded processor
Garbled search for n=128, and k=8 within a few seconds
38 [1] Songhori (Koushanfar) et al., DAC 2015
The first efficient, reliable and provably secure two-party privacy-preserving fingerprint matching
Adopted the NIST standardized Bozorth algorithm so it’s amenable to GC optimizations
Devised a new protocol
Results show the ability to authenticate a garbled fingerprint within a fraction of a second
No loss of accuracy compared to original Bozorth
39 [1] Zhang and Koushanfar, HOST’16
Human Leukocyte Antigen (HLA) analysis which is a crucial test in organ transplantation
Patient holds her whole genome sequence
First scalable and efficient solution for secure organ transplantation compatibility testing
Designing sub-linear size circuit for HLA compatibility testing
Testing can be done within a few seconds on an embedded processor
40 [1] Riazi (Koushanfar) et al., HOST’16
The car Q is lost due to unavailability or malfunction of GPS, e.g., military settings
It sends request to three nearby cars A, B, and C for assistance in computing its location
The three assisting cars then engage in a privacy-preserving localization protocol
A new library with new functions required to generate GC optimized netlists
Could locate ~0.5s on an embedded processor
41
[1 Hussain and Koushanfar, DAC’16 Talk on Tuesday, 1:30-3pm
1. Minimizing computing/storage/comm cost of a broad class of iterative big/dense data analytics
• To the limits of data structure and pertinent platform
• Enables HW acceleration and stream processing
• Benefits costly privacy-preserving computing
2. Novel scalable solutions for privacy preserving computing by Yao's Garbled Circuit (GC)
• Enables addressing classical challenges and new apps
Evaluations show great efficiency compared with prior art, often by orders of magnitude!
42
1. Kolesnikov, V. and T. Schneider, “Improved garbled circuit: Free XOR gates and applications”, Automata, Languages and Programming, 2008
2. Songhori, E., S. U. Hussain, A. - R. Sadeghi, T. Schneider, and F. Koushanfar, "TinyGarble: Highly Compressed and Scalable Sequential Garbled Circuits", IEEE Symposium on Security and Privacy (S&P), 2015
3. Songhori, E., S. U. Hussain, A. - R. Sadeghi, and F. Koushanfar, "A Compact and Scalable Privacy-Preserving k-Nearest Neighbor Search", Design Automation Conference (DAC), 2015
4. Zahur, S, M. Rosulek, and D. Evans, “Two Halves Make a Whole: Reducing Data Transfer in Garbled Circuits using Half Gates”, EuroCrypt 2015
5. Demmler, D., G. Dessouky, F. Koushanfar, A. Sadeghi, T. Schneider, and S. Zeitouni, "Automated Synthesis of Optimized Circuits for Secure Computation", Computer and Communications Security (CCS), 2015
6. Riazi, S. M., N. K. R. Dantu, V. L. N. Gattu, and F. Koushanfar, "GenMatch: Secure DNA Compatibility Testing", IEEE International Symposium on Hardware Oriented Security and Trust(HOST), 2016
7. Zhang, Y., and F. Koushanfar, "Robust Privacy-Preserving Fingerprint Authentication", IEEE International Symposium on Hardware Oriented Security and Trust(HOST), 2016
8. Songhori, E. M., S. Zeitouni, G. Dessouky, T. Schneider, A. - R. Sadeghi, and F. Koushanfar, "GarbledCPU: A MIPS Processor for Secure Computation in Hardware", Design Automation Conference (DAC), 2016
9. Hussain, S. U., and F. Koushanfar, "Privacy Preserving Localization for Smart Automotive Systems", Design Automation Conference (DAC), 2016
43
44