Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software...

16
Privacy-Preserving Analy0cs in the Cloud Jon Crowcro), h,p://www.cl.cam.ac.uk/~jac22

Transcript of Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software...

Page 1: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

Privacy-Preserving Analy0cs in the Cloud

JonCrowcro),h,p://www.cl.cam.ac.uk/~jac22

Page 2: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

But first…

• Awordfromoursponsor…

Page 3: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

How to deliver every person’s health and wellbeing through data

How to ensure security in a fast-changing world

Machine learning and artificial intelligence

Understanding human behaviour System architecture Security & robustness

Complex structure in data

Ethics in data science

Scalability Missingness Causation

Automating data wrangling Transparency and privacy

Asymmetry of power and knowledge

Building in good behaviour Machines for data science

Robustness and verification of systems

Identity and anonymity Heterogeneity

Finding structure in data

What data?

Smart infrastructure Resilient networks

Data-centric design Theoretical foundations for understanding new data science algorithms

Software infrastructure for data science Learning without labels Design and development of data visualisations Fairness

GOAL

CHALLENGES

THEMES

SCIENTIFIC PROBLEMS

How to create a safe

engineered infrastructure

How to take the pulse of the economy and how to detect fraudulent financial activities

How to advance AI with data science

How to ensure machine augmented decisions are

made ethically

Augmenting human decisions with machine

learning

VISUALISATION OF THE TURING RESEARCH STRATEGY

Mathematical modelling of complex systems

How can government innovate through data

How to scale data science and AI

Page 4: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

Scientific problems vs themes

System architecture for data science

Security and robustness in data science

Machine learning and artificial intelligence

Complex structure in data

Understanding humans in a connected world

Ethics and Data Science

MathemaAcalModellingofComplexData

Scalability Missingness Causation Towards automated data wrangling Transparency and privacy Asymmetry of power and knowledge Building in good behaviour Machines for data science Robustness and verification of systems Identity and anonymity Heterogeneity Finding structure in data What data? Smart infrastructure Resilient networks Data-centric design Theoretical foundations for the understanding of new data science algorithms Software infrastructure for data science Learning without labels Design and development of engaging visualisations Fairness

TABLE MAPPING SCIENTIFIC PROBLEMS AGAINST RESEARCH THEMES

Page 5: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

Private Data Center->Public Cloud

• ATIpartnerse.g.•  Farr/NHSScotland•  HSBC

• MoAvesforpubliccloud•  Scaleout/costsave•  HigherThroughputanalyAcs•  Share“access”withmoreresearchers•  <Yoursgoeshere>

Page 6: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

Infrastructure Loca0on

• Keepfriends&enemiesnear:•  Legal/RegulatoryStuff(inclGDPR)•  Latency/Availabilityetc•  Control(physicalaccessetc)

• Needtovirtualisethese(be,er)•  CryptDataatrest•  Cryptdataduring“processing”•  keymanagementetc•  Enclave…SGX,TrustZone,AMD,CHERI

Page 7: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

GDPR – 2018 – right to an explanaion

Page 8: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

SGX opportunity

• Nottheonlypiece,ofcourse•  StaAc/dynamicanalysisetc•  Unikernels&s/wverificaAon

• CanuseSGXon•  Container(SCONE)•  Pladormbasis,Hadoop,Flink,Sparkh,ps://www.microso).com/en-us/research/publicaAon/vc3-trustworthy-data-analyAcs-in-the-cloud

•  OrapplicaAonbasis

Page 9: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

MARU….@ turing.ac.uk

• ATIw/Intel,Dstl,Docker,Microso)• Hiring:-h,ps://www.turing.ac.uk/jobs/research-associate-maru-project/

•  ComparewhatisinSGX•  Enter/leavecost,cryptmemoryo/hetc•  Hypervisor?

•  Comparew/containerontrustzone,cheri,AMDetc•  CommonAPIsforkeysetc•  Virtualize?

•  Pentest•  manysidechannelpb•  Whatifweakhomomorphiccrypto&diffpriv?

Page 10: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

Public Cloud->Databox (or HAT)

• Databox(andhat)takeoppositeview• Re-decentralize• KeepanalyAcs/MLasaservice

•  Mixofdistributed,privpresML+•  Hierachyof3rdpartyaggregators,MPC•  h,p://www.databoxproject.uk/

• HATreversesdirecAonofvalue…•  Audit(distributedledger)•  Getpaid(money(realorvurt)•  h,ps://www.hatdex.org/

Page 11: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

Container – migra0on&replica

• Replicate(tocloudenclave)•  forrecovery(fromfail,the),loss)

• Migrate(tootherpersonalcloud)•  forlowlatency

• Mostnewdataisappendonly–sousedistributedledger•  (tamperprooflogs–seedatakitindocker)

• Consistencyofreplicas–•  e.g.usefpaxos

Page 12: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

Distributed Analy0cs

• MoAvese.g.•  Movecodetodata•  Keepdataclosetoowner/primaryuser•  Guaranteecanaudittrailaccess•  Addyourshere

•  Challenges•  DependsonMLtechnologyofchoice&goal

•  PCA/Clustering,randomforests•  Curvefimgn(regressionetc)•  ModelInferencing–e.g.Bayesianinference

•  DistrubuteddifferenAalprivacytricky•  HierarchicalversusP2P?

Page 13: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

Distributed Analy0cs

• Hierarchyeasiest•  AggregaAonpoints/serversbroker“modellearnedsofar”•  Havetobetrustedbysubsetofleaves•  Leafcanchoosetochangeaggregator

• P2Pjustextensionofthistodynamic,fasterchoice• Distributed/ParallelML

•  Fromdatacenters•  ClusteringontupleseasyIfindependent•  Graphdataishard,butnotimpossible

Page 14: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

Future Proof for GDPR •  PrivacybyDesignandbyDefault–HATaddressallGDPRprivacyrequirementfromitsdesignprincipletoitssecuritysoluAon.

•  HATecosystemdataexchangeisbasedonfullyspecifiedprivacyterms-Amespecific,recipientspecific,minimumdatapointsspecificwithfullinten-ondisclosed.ViolaAonagainstanyofsuchtermsmayresultabanfromtheEcosystem.

•  Consentbydesignandbydefault-•  thePCSTPoCmandatesa“specific,informedandfreelygivenandunambiguous”intensiondisclosureofdatausage,forevery

singlepersonaldataaccessinstances.•  HATtechnologyensuresthatanexchangeisonlyauthorisedandkeptvalidbyindividual’scasespecificconsent

•  RightsforIndividualsbydesignandbydefault–encapsulatedpersonaldatacontainersisolatedforeachindividual,allowsanindividualisinfullcontrolofitsHAT,henceinherentlyownsallofthefollowing:

•  RighttoAccess|Righttobeinformed|RighttorecAficaAon|Righttorestrictprocessing|Righttoobjecttomarket•  Rightofdataportability|Righttobeforgo,en|Righttoobjecttoautomateddecisionmakingandprofiling

•  Accountabilityandgovernance-PCSTCoPmandateseveryecosystemmembertohigherlevelofaccountabilityandgovernancepracAce.

•  Recordkeeping–HATecosystemautomaAcallytracksdataexchange,evenatamuchmoregranularlevelthanGDPRrequires–itdocumentstheexchangeparAes,Ameofaccess,detaileddatapoints,intensionandT&C,foreverysingletransacAon.

•  DataprotecAonbydesignandbydefault-TheHATDeX-servicedHATisdesignedwithmulAplelayersofprotecAon,coveringDataatRest,DatainTransitandDatainUse.(h,p://www.hatdex.org/wp-content/uploads/2016/06/hatdex-briefing-Issue-2_FINAL.pdf)

•  MandatorybreachnoAficaAon-HAT’sAPIdrivenecosystemautomaAcallyrecordsallexchangesbreachtrackingandinvesAgaAon

GDPRRoundtablediscussionconsultedafewHATresearchteammembersforthedesignofthelegislaAon.HATecosystemcanensureGDPRcompliance,andfurthermandatesAghtertermsthanGDPRasentryrequirementsfromallparAeswhowishtooperatewithinthisecosystemfollowingitsPCST(Privacy,ConfidenAality,SecurityandTrust)CodeofPracAce(h,p://hatcommunity.org/other-resources/).

h,p://hatdex.org/h,p://hatcommunity.org 14

Page 15: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

Things we’re not covering today

•  Database(Farr/ATIworknow)•  Queryplanningw/privacy•  K-anonimity•  Weakhomomorphiccryptoetc

•  Threatmodeling•  AssumingimplicitJ•  SufficeittosayhypervisorvulnerabiliAesexist•  Soneedtrustedstuffonuntrustedpladorm…•  …onnewtrustedstuff…

•  DataSlaveryasaService:NoMore!

Page 16: Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software infrastructure for data science Learning without labels Design and development of data

Who Am I?