Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software...

Post on 23-Sep-2020

1 views 0 download

Transcript of Privacy-Preserving Analycs in the Cloudjac22/talks/imperial-dd-7-11-17.pdf · Fairness Software...

Privacy-Preserving Analy0cs in the Cloud

JonCrowcro),h,p://www.cl.cam.ac.uk/~jac22

But first…

• Awordfromoursponsor…

How to deliver every person’s health and wellbeing through data

How to ensure security in a fast-changing world

Machine learning and artificial intelligence

Understanding human behaviour System architecture Security & robustness

Complex structure in data

Ethics in data science

Scalability Missingness Causation

Automating data wrangling Transparency and privacy

Asymmetry of power and knowledge

Building in good behaviour Machines for data science

Robustness and verification of systems

Identity and anonymity Heterogeneity

Finding structure in data

What data?

Smart infrastructure Resilient networks

Data-centric design Theoretical foundations for understanding new data science algorithms

Software infrastructure for data science Learning without labels Design and development of data visualisations Fairness

GOAL

CHALLENGES

THEMES

SCIENTIFIC PROBLEMS

How to create a safe

engineered infrastructure

How to take the pulse of the economy and how to detect fraudulent financial activities

How to advance AI with data science

How to ensure machine augmented decisions are

made ethically

Augmenting human decisions with machine

learning

VISUALISATION OF THE TURING RESEARCH STRATEGY

Mathematical modelling of complex systems

How can government innovate through data

How to scale data science and AI

Scientific problems vs themes

System architecture for data science

Security and robustness in data science

Machine learning and artificial intelligence

Complex structure in data

Understanding humans in a connected world

Ethics and Data Science

MathemaAcalModellingofComplexData

Scalability Missingness Causation Towards automated data wrangling Transparency and privacy Asymmetry of power and knowledge Building in good behaviour Machines for data science Robustness and verification of systems Identity and anonymity Heterogeneity Finding structure in data What data? Smart infrastructure Resilient networks Data-centric design Theoretical foundations for the understanding of new data science algorithms Software infrastructure for data science Learning without labels Design and development of engaging visualisations Fairness

TABLE MAPPING SCIENTIFIC PROBLEMS AGAINST RESEARCH THEMES

Private Data Center->Public Cloud

• ATIpartnerse.g.•  Farr/NHSScotland•  HSBC

• MoAvesforpubliccloud•  Scaleout/costsave•  HigherThroughputanalyAcs•  Share“access”withmoreresearchers•  <Yoursgoeshere>

Infrastructure Loca0on

• Keepfriends&enemiesnear:•  Legal/RegulatoryStuff(inclGDPR)•  Latency/Availabilityetc•  Control(physicalaccessetc)

• Needtovirtualisethese(be,er)•  CryptDataatrest•  Cryptdataduring“processing”•  keymanagementetc•  Enclave…SGX,TrustZone,AMD,CHERI

GDPR – 2018 – right to an explanaion

SGX opportunity

• Nottheonlypiece,ofcourse•  StaAc/dynamicanalysisetc•  Unikernels&s/wverificaAon

• CanuseSGXon•  Container(SCONE)•  Pladormbasis,Hadoop,Flink,Sparkh,ps://www.microso).com/en-us/research/publicaAon/vc3-trustworthy-data-analyAcs-in-the-cloud

•  OrapplicaAonbasis

MARU….@ turing.ac.uk

• ATIw/Intel,Dstl,Docker,Microso)• Hiring:-h,ps://www.turing.ac.uk/jobs/research-associate-maru-project/

•  ComparewhatisinSGX•  Enter/leavecost,cryptmemoryo/hetc•  Hypervisor?

•  Comparew/containerontrustzone,cheri,AMDetc•  CommonAPIsforkeysetc•  Virtualize?

•  Pentest•  manysidechannelpb•  Whatifweakhomomorphiccrypto&diffpriv?

Public Cloud->Databox (or HAT)

• Databox(andhat)takeoppositeview• Re-decentralize• KeepanalyAcs/MLasaservice

•  Mixofdistributed,privpresML+•  Hierachyof3rdpartyaggregators,MPC•  h,p://www.databoxproject.uk/

• HATreversesdirecAonofvalue…•  Audit(distributedledger)•  Getpaid(money(realorvurt)•  h,ps://www.hatdex.org/

Container – migra0on&replica

• Replicate(tocloudenclave)•  forrecovery(fromfail,the),loss)

• Migrate(tootherpersonalcloud)•  forlowlatency

• Mostnewdataisappendonly–sousedistributedledger•  (tamperprooflogs–seedatakitindocker)

• Consistencyofreplicas–•  e.g.usefpaxos

Distributed Analy0cs

• MoAvese.g.•  Movecodetodata•  Keepdataclosetoowner/primaryuser•  Guaranteecanaudittrailaccess•  Addyourshere

•  Challenges•  DependsonMLtechnologyofchoice&goal

•  PCA/Clustering,randomforests•  Curvefimgn(regressionetc)•  ModelInferencing–e.g.Bayesianinference

•  DistrubuteddifferenAalprivacytricky•  HierarchicalversusP2P?

Distributed Analy0cs

• Hierarchyeasiest•  AggregaAonpoints/serversbroker“modellearnedsofar”•  Havetobetrustedbysubsetofleaves•  Leafcanchoosetochangeaggregator

• P2Pjustextensionofthistodynamic,fasterchoice• Distributed/ParallelML

•  Fromdatacenters•  ClusteringontupleseasyIfindependent•  Graphdataishard,butnotimpossible

Future Proof for GDPR •  PrivacybyDesignandbyDefault–HATaddressallGDPRprivacyrequirementfromitsdesignprincipletoitssecuritysoluAon.

•  HATecosystemdataexchangeisbasedonfullyspecifiedprivacyterms-Amespecific,recipientspecific,minimumdatapointsspecificwithfullinten-ondisclosed.ViolaAonagainstanyofsuchtermsmayresultabanfromtheEcosystem.

•  Consentbydesignandbydefault-•  thePCSTPoCmandatesa“specific,informedandfreelygivenandunambiguous”intensiondisclosureofdatausage,forevery

singlepersonaldataaccessinstances.•  HATtechnologyensuresthatanexchangeisonlyauthorisedandkeptvalidbyindividual’scasespecificconsent

•  RightsforIndividualsbydesignandbydefault–encapsulatedpersonaldatacontainersisolatedforeachindividual,allowsanindividualisinfullcontrolofitsHAT,henceinherentlyownsallofthefollowing:

•  RighttoAccess|Righttobeinformed|RighttorecAficaAon|Righttorestrictprocessing|Righttoobjecttomarket•  Rightofdataportability|Righttobeforgo,en|Righttoobjecttoautomateddecisionmakingandprofiling

•  Accountabilityandgovernance-PCSTCoPmandateseveryecosystemmembertohigherlevelofaccountabilityandgovernancepracAce.

•  Recordkeeping–HATecosystemautomaAcallytracksdataexchange,evenatamuchmoregranularlevelthanGDPRrequires–itdocumentstheexchangeparAes,Ameofaccess,detaileddatapoints,intensionandT&C,foreverysingletransacAon.

•  DataprotecAonbydesignandbydefault-TheHATDeX-servicedHATisdesignedwithmulAplelayersofprotecAon,coveringDataatRest,DatainTransitandDatainUse.(h,p://www.hatdex.org/wp-content/uploads/2016/06/hatdex-briefing-Issue-2_FINAL.pdf)

•  MandatorybreachnoAficaAon-HAT’sAPIdrivenecosystemautomaAcallyrecordsallexchangesbreachtrackingandinvesAgaAon

GDPRRoundtablediscussionconsultedafewHATresearchteammembersforthedesignofthelegislaAon.HATecosystemcanensureGDPRcompliance,andfurthermandatesAghtertermsthanGDPRasentryrequirementsfromallparAeswhowishtooperatewithinthisecosystemfollowingitsPCST(Privacy,ConfidenAality,SecurityandTrust)CodeofPracAce(h,p://hatcommunity.org/other-resources/).

h,p://hatdex.org/h,p://hatcommunity.org 14

Things we’re not covering today

•  Database(Farr/ATIworknow)•  Queryplanningw/privacy•  K-anonimity•  Weakhomomorphiccryptoetc

•  Threatmodeling•  AssumingimplicitJ•  SufficeittosayhypervisorvulnerabiliAesexist•  Soneedtrustedstuffonuntrustedpladorm…•  …onnewtrustedstuff…

•  DataSlaveryasaService:NoMore!

Who Am I?