(Yao’s) Millionaires’ Problem

21
Sharing Knowledge without Sharing Data 2019‐10‐10 Azer Bestavros, Boston University 1 Sharing Knowledge without Sharing Data Platforms for resolving the false dichotomy between privacy and utility of information Computer Science Department Hariri Institute for Computing Boston University Azer Bestavros Massachusetts Juvenile Justice Policy and Data Board October 10, 2019 The Rafik B. Hariri Institute for Computing and Computational Science & Engineering (Yao’s) Millionaires’ Problem Want to know who is wealthier Can we reveal the answer without revealing the inputs – not even to an app? Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 2

Transcript of (Yao’s) Millionaires’ Problem

Page 1: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 1

Sharing Knowledge without Sharing Data Platforms for resolving the false dichotomy between privacy and utility of information  

Computer Science DepartmentHariri Institute for Computing

Boston University

Azer Bestavros

Massachusetts Juvenile Justice Policy and Data BoardOctober 10, 2019

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

(Yao’s) Millionaires’ ProblemWant to know who is wealthier

Can we reveal the answer without revealing the inputs – not even to an app?

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 2

Page 2: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 2

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

The Labor Department QuestionWant to know if companies like Google/Oracle are paying white men more

“In a statement, Google said it balked at turning over the private information of employees.”

Can DOL prove (non)compliance without access to sensitive employee records?

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 3

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

The Right to Know Before You Go QuestionWant to enable cost‐benefit analysis of higher education across colleges and majors 

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 4

ID Income …ID College Degree …

Page 3: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 3

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

The Massachusetts Child Advocacy Question  Want to measure educational success for various juvenile cohorts in a DCF database 

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 5

ID Success …ID Status …

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

The answer to all these questions is YES

We can derive knowledge       from data                           without requiring owners of the data to share it or to trust anything other than mathematics under some assumptions about threats

K =

𝑥 , 𝑥 , 𝑥 , …(K)

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 6

Page 4: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 4

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

April 9, 2013

Published October 31, 2013

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 7

Meeting with Mayor Menino @ BU, July 31, 2014

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

GOAL 3: Evaluating Success

Employers agree to … contribute data to a report compiled by a third party on the Compact’s success to date. Employer‐level data would not be identified in the report.

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 8

Page 5: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 5

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

GOAL 3: Evaluating Success

Employers agree to … contribute data to a report compiled by a third party on the Compact’s success to date. Employer‐level data would not be identified in the report.

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 9

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

April 14, 2015

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 10

Page 6: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 6

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

January 5, 2017

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 14

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

January 31, 2018

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 15

Page 7: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 7

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

Meeting with Mayor Menino @ BU, July 31, 2014

The congresswoman, who had signed onto a bill addressing income disparity between men and women, was impressed by the relevance he outlined. “It’s linking it back for the members of Congress,” Clark said. “Nobody would think, oh, the Paycheck Fairness Act, how is that tied into NSF funding?”

2014  2018

2017

2015

“This [is] the first time actual wage data has been reported both anonymously and voluntarily. This is a groundbreaking moment in tackling the gender gap.”

Mayor Marty Walsh

“[MPC] has never been used for public good. Here, we’re beginning to show how to use this sophisticated computer science research for public programs.” 

BWWC co‐chair Evelyn Murphy 

2014

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 16

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

Multi‐Party Computation (MPC)

What is it? – Given multiple parties p1 , p2 , …, pn each with private data x1 , x2 , …, xn– Parties engage in computing a function f(x1, x2, …, xn)

– Nothing is revealed about the inputs beyond what the output of f reveals

– What f leaks is an orthogonal question, e.g., the realm of “differential privacy”

State of the Art– Theory known since 1979, with Shamir’s “How to share a secret”– Frameworks and libraries increasingly available over the last few years …– Experience with real use cases at scale is limited         We are changing that– Deployments are not easily portable                             We are changing that

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 17

Page 8: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 8

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

What is Multi‐Party Computation (MPC)?

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 18

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

How Does it Work?

I would love to know the difference between your salaries. Can you please share them with me so that I may figure that out?

Happy to help you, but there is no way I am going to tell anybody what my salary is.

Happy to help you, but there is no way I am going to tell anybody what my salary is.

I have a solution for you!

Data Owners

Analyst

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 19

Page 9: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 9

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

ServiceProviders

How Does it Work?

$7

$9

43

1011

=

=

+

+

Each one of the two servers has one share of each secret

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 20

$7

$9

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

3

ServiceProviders

How Does it Work?

4

1011

86

– –

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 21

Compute on shares to get secret‐shared 

result…

Page 10: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 10

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

ServiceProviders

Combine shares of the result to 

reveal the answer!

Each server has a share of the (secret) result…

How Does it Work?

86

$2

+=Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 22

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

How does it work?

$2

–$7

$9

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 24

Page 11: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 11

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

How Does it Work? 

$2

In a nutshell, MPC is the collaborative analysis of 

multiple silo‐ed data sets that are never communicated nor 

trusted to any central authority or database

$7

$9

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 25

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

Locally Compute

Locally Compute

Data Owners ServiceProviders

Outsourced MPC Architecture

Analyst

Communicate

Each server gets a share of each DB; both collaborate on joining and aggregating

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 26

Page 12: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 12

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

Is anybody else using MPC?

Unbound / Protect cryptographic keysGoogle / Federated machine learning

BU / Pay equity in BostonCybernetica / VAT tax audits Partisia / Rate credit of farmers

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 29

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

Why use MPC as opposed to anonymization? 

• With de‐identification, privacy protections are applied too early in analysis pipeline

• With MPC, data is never shared prematurely

• MPC ⇒ detailed analysis that is privacy compatible

Join by ID

???De‐ID

De‐ID

MPC

De‐IDJoin by ID

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 30

Page 13: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 13

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

Isn’t MPC the same as Blockchain?

MPC

• Strong confidentiality: data encoded and split in a privacy‐preserving way

• Strong integrity: distributed general purpose data analysis + incentives for long‐term accuracy and stability

• Good availability: tolerates adversarial behaviors to a point, and fails safely

Blockchain

• Poor confidentiality: data copied in the clear across the internet

• Strong integrity: distributed general purpose data analysis + incentives for long‐term accuracy and stability

• Strong availability: persists through attacks from the distributed servers

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 31

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

How about MPC’s performance?

Through

put of AES encryption (bytes/sec)Benchmark: Outsourced Encryption

How fast could two parties jointly encrypt a secretly‐shared message via MPC compared to doing it in the clear (which means trusting cloud with private message and key)? 

Performance shouldn’t be an impediment to deploying and using MPC.  

using MPC 

in the clear

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 32

Page 14: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 14

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

How commoditized is the technology?

𝒇 𝒙 𝑺 𝒓𝟏𝒙𝟏 𝒓𝟐𝒙𝟐

… 𝒓𝒊 𝒙𝒊 …

𝒔𝟏 𝒇 𝟏𝒔𝟐 𝒇 𝟐𝒔𝟑 𝒇 𝟑 . . . . . . . . .𝒔𝒊 𝒇 𝒊

. . . . . . . . .

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 33

We’re getting there!   Secret Sharing and MPC as a Service

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

Is the technology proprietary? 

BU Open‐source MPC Libraries

JIFF: JavaScript Implementation of Federated FunctionalitiesLibrary for building web‐based applications using secure multi‐party computationhttps://github.com/multiparty/jiff

Web‐MPCJavaScript application for user‐friendly privacy‐preserving web‐based data aggregationhttps://github.com/multiparty/web‐mpc

Conclave Workflow ManagerCompiler that optimizes relational queries to be executed under MPC by factoring it into (1) scalable, local, cleartext processing workflows using backends such as Apache Spark, and (2) isolated MPC workflows that utilize existing MPC backend frameworkshttps://github.com/multiparty/conclave

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 34

Page 15: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 15

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

How easy is it to use in practice?

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 35

Reviewable (quality control)

Transparent (open source)

Familiar (same workflow)

Accessible (comprehensible)

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

Page 16: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 16

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

Contribute

Page 17: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 17

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

What other apps have you considered?

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 39

Collective Intelligence in Competitive Settings• Banking and Finance: Multi‐institutional systemic risk assessment• Data Markets: Valuation of marginal utility of data products• Plausible Deniability: Analytics over possibly “toxic” data• Information Brokerage: Business and marketing Intelligence• E‐Commerce: Analytics over segmented proprietary data assets• Sharing Economy: Personalization across multiple service providers

Social Good in Public Settings• Privacy‐preserving sensus and surveys• Basic and applied research in healthcare, education, sociology, … • Multiagency analytics for evidence‐based policy making• Transparency for corporate and government operations• Compliance testing/reporting for trade associations• Private/fair Reporting of sexual harrasement/abuse in workplace 

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

“We are excited to see legislation promoting the use of multi-party computation (MPC) in formulating sound public policy. Boston University's successful collaboration with the City of Boston and the Boston Women's Workforce Council brought this technology into practice to maintain data privacy while gaining insight into an important societal issue -- potential wage inequality in private industry. Such applications demonstrate that MPC can bring enormous value to policymakers at all levels of government.”

-- Azer Bestavros (on behalf of the team from BU)

November, 2017 ++

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 40

Page 18: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 18

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

October 8, 2019

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 41

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

How does MPC impact regulation/disclosure?

Status Quo: Disclose extent of use and/or regulate what data is made accessible to whom (e.g., HIPAA and FERPA). Otherwise trust NDAs!

Implications: • Allow new uses that are consistent with multiple regulations• Allow new uses beyond the artificiality of restricting access• Forces lawmakers to think about the purpose of regulation• Enables purposeful transparency with implications on auditing…

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 47

Collect Secure Access Use

Page 19: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 19

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

What are the legal aspects of using MPC?

• Disclosure: Is release of sensitive information permitted?– FERPA prohibits “improper disclosure of personally identifiable information derived from education records.” HIPAA requires the “individual's written authorization for any use or disclosure of protected health information that is not for treatment, payment or health care operations or otherwise permitted or required by the Privacy Rule.”

• Use: Are subjects informed about how their data will be used?– This is about adherence to the terms of “informed consent.” Entities (such as Facebook) are held accountable if they deviate from the “terms of use” that they develop for using/sharing of data. 

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 48

MPC

MPC side‐steps disclosure issues because it allows 

entities not authorized to view the data to compute over it.

Disclaimer: I am not a lawyer, but I talk to BU School of Law colleagues

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

• Collective Trust:– A key assumption is that not all parties will collude (and hence become a single party). Disclosure could be violated if service providers collude. 

– MPC is about distributing computation over data across multiple parties without disclosing the data to any single party. If parties collude, then all bets are off.

• Privacy requires Security:– A key assumption is that not all service providers will be compromised by a single adversary. MPC does not obviate the need for security, but in fact it strengthens existing security because it is harder for an adversary (hacker) to take over multiple (independent) service providers.

What are the liability considerations for MPC?

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 49

More service providers implies better security and better privacy!

MPC

Page 20: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 20

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

Takeaway: We can have it both ways

We can derive knowledge (   ) from data                           without requiring owners of the data to share it or to trust anything other than mathematics under some assumptions about threats

When it comes to data and computation over data, we need to rethink our notions of ownership, custody, jurisdiction, sharing, disclosure, liability, and introduce new ones such as collusion.

K =

𝑥 , 𝑥 , 𝑥 , …K

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 50

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

The Path Forward: Our Answers

How can we balance the need for transparency and exploration with fairness and sensitivity to users?

Fix the narrative! Deconstruct the false dichotomy between utility and privacy of data, and between transparency and confidentiality.

How do we ensure that individuals and communities can trust these systems?

Empathize! Adapt technology to how people work as opposed to adapt people to how technology works. Do not change the workflow!

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 51

Page 21: (Yao’s) Millionaires’ Problem

Sharing Knowledge without Sharing Data 2019‐10‐10

Azer Bestavros, Boston University 21

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

Acknowledgments: It takes a village!www.multiparty.org

Azer Bestavros: Sharing Knowledge without Sharing Data ‐‐ On the false choice between privacy and utility of information 53

Andrei Lapets Kyle Holzinger Eric Dunton Frederick Jansen Nikolaj Volgushev Malte Schwarzkopf  Kinan Bab Rawane IssaMayank VariaAzer Bestavros

The Rafik B. Hariri Institute for Computingand Computational Science & Engineering

“Leveraging the Computational Perspective in a Data‐Driven World for a Better Society”

54