Controlling Leakage and Disclosure Risk in Seman6c Big...

55
Controlling Leakage and Disclosure Risk in Seman6c Big Data pipelines Ernesto Damiani (joint work with Paolo Ceravolo)

Transcript of Controlling Leakage and Disclosure Risk in Seman6c Big...

Page 1: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

ControllingLeakageandDisclosureRiskinSeman6cBigDatapipelines

Ernesto Damiani (joint work with Paolo Ceravolo)

Page 2: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Outline

•  Introduc.on•  PrerequisitesandVision•  NewBigDataThreats•  SomeideasforaKNOW,PREVENTDETECT,COUNTERparadigmcounterthem.

Page 3: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

BIG DATA INITIATIVE

Driveopenresearch&innova6oncollabora6onwithUAEandinterna6onalins6tutesandorganisa6onstocarryworldleadingresearchanddelivertangiblevalue,training,knowledgetransferandskillsdevelopmentinlinewiththeUAE

strategicpriori6esintheareasof:Smartenterprise,smartinfrastructure&smartsociety

Security Research CenterSECURITYOFTHEGLOBALICTINFRASTRUCTURENetworkandCommunica.onsSecurityBusinessProcessSecurityandPrivacySecurityandPrivacyofBigDataPlaJormsSECURITYASSURANCESecurityRiskAssessmentandMetricsCon.nuousSecurityMonitoringandTes.ngDATAPROTECTIONANDENCRYPTIONHighPerformanceHomomorphicEncryp.onLightweightCryptographyandMutualAuthen.ca.on

Page 4: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

SESARLAB•  SecureSoOwareArchitecturesandKnowledge-basedsystemslab(SESAR)hTp://sesar.d..unimi.it

•  Located on the new campus in Crema, 40 km south-east of Milan •  Industry collaborations: SAP, British Telecom Nokia Siemens, Cisco, Telecom Italia •  Part of the BigData Community

Page 5: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Someac.vi.es

Page 6: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

•  BigDataisnotjustatechnologicaladvancebutrepresentsaparadigmshiOinextrac6ngvaluefromcomplexmul6-partyprocesses

Vision

Page 7: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

FromclassicdatawarehousetoBigData

Page 8: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a
Page 9: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Internalvs.Externaldatasources

Page 10: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

ProcessingModels

•  Batchvsstreaming•  Hashvssketch

Page 11: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

DataModels

•  DATAMODELS:•  Non-rela.onal(aTribute-value)

•  Extendedrela.onal(columnorrow-par..oned)

•  Neo-rela.onal(hybrid)•  LargeDataSharingInfrastructuretofeedMRcomputa6ons

Page 12: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

DesigningDataRepresenta.onsforBigDataApplica.ons

•  Designastheyteachyouatschool•  Scaleup->DenormalizeInstance(dropindexes,triggers)

•  Solveproblemswithread/writeprecedence->Createwrite-toandread-fromdatareplicas(keepconsistencyperiodically)

•  MemcachetheDenormalizedInstance->(looseACID)

Page 13: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Rela.onaldenormaliza.onrefresher

•  Simpleconcept:flaTenarepea.nggroupinasingletable

•  InsteadofEMP (E#, D#, Ename) - DEPT(D#, DEPT, Address)

•  UseEMP (E#, Ename, DEPT, Address)

Page 14: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Denormalization backsides

•  Makes rows longer -> longer data transfers

•  Needs more RAM for in-memory processing

•  Redundant relationships improve performance at the expense of update overhead

Page 15: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

MemcacheTypicalusage:public Data readData (String query) {

Data answer= memcache.execute(query); if (answer== null) { answer= database.read(query); memcache.write(answer); } return answer;

}

Page 16: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Low-levelrepresenta.on

•  Key-valuedata-stores•  Persistent,distributed(key,value)maps

• Organizedinregionsheldbydifferentservers

•  Everyen.tyisasetofkey-valuepairs

Page 17: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Key-valuereminder•  Akeyhasmul.plecomponents,specifiedasanorderedlist.– Themajorkeyiden.fiestheen.tyandconsistsoftheleadingcomponentsofthekey.

– Thesubsequentcomponentsarecalledminorkeys.Thisorganiza.onissimilartoadirectorypathspecifica.oninafilesystem(/Major/minor1/minor2/).

•  The“value”partofthekey-valuepairisanuninterpretedstringofbytesofarbitrarylength

Page 18: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Example“Employee” : {

“Data” : { “EmpID”: “anyByteArray” “Photo” : “anyByteArray” “DeptID” : “anyByteArray”

REGION 1 } “Department” : {

“DeptID” : “anyByteArray” “DeptDescription” : “anyByteArray” }

REGION 2

This is a key, !not a column name!

Page 19: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

DenormalizedExample“Employee” : {

“EmpData” : { “Photo” : “anyByteArray” “EmpID” : “anyByteArray” } “DeptData” : { “Description” : “anyByteArray” “DeptID” : “anyByteArray” “DeptLocation” : “anyByteArray” } }

REGION 1

Page 20: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Consensus

•  Sincedataitemsarereplicated,opera.onscanbeaTemptedconcurrentlyonreplicas

•  Synchroniza6onusingleaderelec6on(Paxos)•  Features

– Reliabilityandavailability– easy-to-understandseman.cs– performance,throughput,acceptablelatency

•  hTp://labs.google.com/papers/chubby-osdi06.pdf

Page 21: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Data Batch processing: Map/Reduce

•  Map/Reduce is a programming model for efficient distributed computing

•  It works like a Unix pipeline: –  cat input | grep | sort | uniq -c | cat

> output –  Input | Map | Shuffle & Sort | Reduce |

Output •  Efficiency from

– Data routing based on keys, reducing seeks – Pipelining

•  A good fit for a lot of applications –  Log processing – Web index building

Page 22: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Prac.calMapReduce=HDFS+Hadoop

Locality optimizations Map-Reduce queries HDFS for locations of input data Map tasks are scheduled close to the inputs when possible

Page 23: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

RiskandThreats

Page 24: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

RiskComponents

Page 25: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

BigDataThreats:Breach•  IntermsoftheISO15408model,adatabreachoccurs

when“adigitalinforma6onassetisstolenbya8ackersbybreakingintotheICTsystemsornetworkswhereitisheld/transported”

•  BigDataBreach:theOofaBigDataassetexecutedbybreakingintotheICTinfrastructureofacollector,transformer,processororuserwhoholdsit.–  ManyaTacksdocumentedinthefieldcanbeclassifiedasBigDataBreachesinvolvingDataSourceassets

–  2014Targetdatabreachinvolved40milliondebitandcreditcardnumbers.

•  aBigDataBreachrequirespro-ac.vehos.lebehavior(thebreak-in)

Page 26: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

BigDataThreats:Leak

•  BigDataLeakcanbedefinedasthe(totalorpar.al)disclosureofaBigDataAssetatacertainstageofitslifecycle.– ABigDataLeakcanhappenwhenBigDataare(unwillingly)disclosedbytheownertotheproviderofanoutsourcedprocess,e.g.compu.ngdataanaly.cs.

•  IntermsoftheaTackermodel,BigDataLeakcanbeexploitedevenbyahonest-but-curiousaTacker.

Page 27: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

BigDataThreats:Degrada.on

•  BigDataDegrada6oncanbedefinedasinjec.onofdoctoredversionofaBigDataAssetatacertainstageofitslifecycle.– BigDataDegrada.oncanhappenwhenBigDataarepoisonedbytheproviderofanoutsourcedprocess,e.g.compu.ngdataanaly.cs.

•  IntermsoftheaTackermodel,BigDataDegrada.onrequirespro-ac.vehos.lebehavior(theinjec.on),

Page 28: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

BigDataThreatsasAPTs

•  Anadvancedpersistentthreat(APT)isasetofstealthyandcon.nuousprocesses,oOenorchestratedbyhuman(s)targe.ngaspecificen.ty.– ”Advanced”:signifiessophis.catedtechniquesusingmalwaretoexploitvulnerabili.esinsystems.

– ”Persistent”con.nuouslymonitoringandextrac.ngdatafromaspecifictarget.

Page 29: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

TheSilosproblem

•  Different data are held by different departments

•  Representation and processing choices were made independently and may conflict

•  Regulatory differences in collection and usage may make merging a challenge

•  Early merge, late merge or never merge?

Page 30: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Datarepresenta.on

•  Theimplica.onondatamodellingandseman.cshavebeenmasterfullydiscussedinseveralworks...

•  HoweverlessaTen.onhasbeendevotedtaspectsthatareusuallysecondaryincentralizedapproaches

•  Oneoftheseaspectsistheimplica.onofpre-injec.onofJoinforDataLossPreven.on...

ESWC2016

Page 31: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

BreakingtheSilos

Page 32: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Tradeoffs•  Atinges.on.metwotradoffsmustbemade:

–  I/OperrequestvsTotalDataVolume-Denormaliza.on,ifdonewell,bringsmorelocalitytodataandtheamountofI/Operrequestdecreases.

•  Anormalizedrela.onalstorehastoquerymul.pletablestofulfilleachrequest,leadingtonon-localizedfetches

•  Non–localizedfetchesleadingtomoreI/O,aseachfetchrequirstoareadandeachreadhasa“blocksize”minimum.

– ProcessingComplexityvsTotalDataVolume–•  Non–localizedfetchesarefollowedbyassemblingopera.onsthatrequireCPU.me.

•  Denormalizeddataprocessingissimpler,butatthecostofincreasedtotaldatavolumeinthestore.

Page 33: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

TransparentDe-normaliza.on

•  BigDatatoolssupporttransparentdenormalisingatdatainges.on.me.

•  TheuserofaBigDatacomputa.onmaywellignore1.  thenumberofreplicasatrun.me2.  TheRegionbordercrossingsgeneratedat

inges.on.meforefficiencyreasons.

Page 34: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

De-normaliza.ongrayarea

AnalyticsAlgorithms

AvailableObservation

Space

Context

GrayArea

Page 35: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Degrada.onviafaultyvalues

Source:[13]withthanks

Page 36: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Needforade-normaliza.onindex•  The“grayarea”isacri.calissueforBigDataLeakpreven.on,especiallyforBig-Data-as-a-Service

•  Thegloballikelihoodofexposureofdatainthegrayareacanbees.matedviaaBigDatastorage’sdegreeofde-normaliza6on,orD-index[11]–  (Normalized)medianofthenumberofreplicasperdataitemheldintheBigDatastorageduringareference.meintervalΔ

•  Measurableviatrustedprobes[12],morelater

Page 37: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

FromD-indextodisclosureprobability(1)

•  TheD-indexisseenasa“propensionfactor”todisclosure

•  Intui.vely,itmeasurestheoverall“unrequestedtrips”thatdataitemsvalueshavedonetotheneighborhoodsofotherrelateddataitemsjustbecausethe“fuelprice”,i.e.thestoragecostintheBigDatasystem,islow.

Page 38: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

FromD-indextodisclosureprobability(2)

•  TheD-indexitselfcannotbedirectlyiden.fiedasaprobability

•  Although,beingnormalized,itsvaluefallsinthe[0,1]interval,itlackssomeformalproper.eswewouldexpectfromalikelihood(forinstance,thereisnorela.onlinking(1-Dindex)andtheintegrityofthedataspace).

•  AformalmappingprocedurecanbedevisedtoturntheD-indexintoarigorousprobabilityorpossibilitymeasure[6],[7].

Page 39: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Needforanaccrualconsensusindex

•  Foreachdataitemi,ΦiisthenormalizednumberofupdatesthatoriginatedeachvalueoficurrentlyheldintheBigDatastorage

•  Measuresthebasisfortheconsensusthatoriginatedeachvalue– Smallconsensusbasis->higherlikelihoodofthedataitemdegrada.on

•  InspiredtoCassandrafailureindex[14]•  Measurableviaatrusteddetectorthatoutputsavalue,Φi,associatedwitheachitem.

Page 40: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Independentinterpreta.onsofΦ

Source:[14]

Page 41: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

LeakvsBreachvsDegrada.onrevisited

•  BigDataBreach:adversarybreaksintothesystemandsees(a)allavailabledatasourcesand(b)theinternalstateoftheBigDatasystem.–  Nosilosboundaries:fullplayground!

•  BigDataLeak:adversarycollaboratestothecomputa.onofanaly.csandtakesadvantageofde-normaliza.ontoaTractinforma.oninregions

•  BigDataDegrada6on:honest-but-curiousadversarieswilljustpeek,butamaliciousaTackercoulddoctorherownorotherpeople’sdata,leadingtowrongdecisionswhichmaycausepermanentdamage.

.

Page 42: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

42

Someideas

•  Systema.cstudyofBigDataSecurityprac.cesiss.llinitsinfancy.

•  Organizebestprac.cesaroundtheworkontop-levelcybersecurityfunc.onsongoingatNIST(availableathTp://www.nist.gov/itl/upload/draO_framework_core.pdf)–  Closelybasedonfunc.onssuggestedbypubliccomments.

•  Thesefunc.onsareKnow,Prevent,Detect,Respond,andRecover.

Page 43: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a
Page 44: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Aprac.calexample

Page 45: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Datastructure

From https://en.wikipedia.org/wiki/K-anonymity

•  Thisdatahas2-anonymitywithrespecttotheaTributes'Age','Gender'and'Stateofdomicile'sinceforanycombina.onoftheseaTributestherearealwaysatleast2rowswiththoseexactaTributes.

•  TheaTributesavailabletoanadversaryarecalled"quasi-iden.fiers".Each"quasi-iden.fier"tupleoccursinatleastkrecordsforadatasetwithk-anonymity.

Page 46: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Datastructure

ESWC2016

From https://en.wikipedia.org/wiki/K-anonymity

Ourdatasetinneo4j

Page 47: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

AchievingthedesiredK-anonymity•  Therearetwocommonmethodsforachievingk-anonymityforsomevalueofk:

•  Suppression:inourexampleweremovename•  Generalisa.on:inourexampleagevaluescanbesubs.tutedwitharange

•  Butthek-anonymitylevelofagivensubsetofdataselectedbyaquerydependsontwofactors:–  theObfusca.oncreatedbySuppressionandGeneralisa.onofsomeaTributesintheoriginaldataset

–  theSegmenta.onofthequeryresult

ESWC2016

Page 48: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Segmenta.on•  Supposewesubmitaquerywhich

specifiesthegenderandarangefortheage:

•  Theresulthas2-anonymityw.r.t.Domicile;1-anonymityw.r.t.ReligionandDisease-guessingthevalueofthelaTeraTributeswilliden.fythepa.ent.

MATCH (s:User), (d:Domicile), (r:Religion), (e:Disease), (s)-[q2:REL]->(r), (s)-[q1:REL]->(d), (s)-[q3:REL]->(e) WHERE toInt(s.age) < 25 AND s.gender = "Female”RETURN (s)-[]-();

ESWC2016

Page 49: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Problem

•  InBigDatastorage,amalicioususerextrac.ng/inspec.ngaregion=selec.onasubsetofdata

•  Segmenta.onofBigdataregionsisdifficulttocontrol

•  Inferencesarepossible

Page 50: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Apossiblecountermeasure:RedundantRela.ons

•  Addingredundantrela.onswecanlimittheeffectofSegmenta.on

MATCH (s:User), (e:Disease)WITH COLLECT(e) AS Disease, sFOREACH (e2 in Disease |CREATE (s)-[q3:REL {context: "4321"}]->(e2))RETURN (s)-[]-();

ESWC2016

Page 51: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Secret•  Thesecretisa

contextualiza.onindexthatcountersignstherela.onshipthatwasoriginatedfromthetruedatasourceandnotforredundancy.Inourexample:

MATCH (s:User), (d:Domicile), (r:Religion), (e:Disease), (s)-[q2:REL]->(r), (s)-[q1:REL]->(d), (s)-[q3:REL {context: "1234"}]->(e) WHERE toInt(s.age) < 25 AND s.gender = "Female" RETURN (s)-[q3]-(e), (s)-[q1]-(d), (s)-[q2]-(r);

ESWC2016

Page 52: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Notapanaceaw.r.t.distribu.onschecks

•  ATackercanstudythedistribu.onsamongthecontextualrela.onships

MATCH (s:User)-[q:REL {context: "1234"}]->(e:Disease)RETURN id(s), Count(e) AS Relationships;

ESWC2016

Page 53: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Hashing

•  Allrela.onshipsaremarkedwiththesamecontext

•  Anhashindexiscreatedoverthetriple: (s)-[REL]-(e)•  Thehashfunc.onisthesecret

– Given(s)and(e)nodesweknowiftherela.onisintheoriginaldataset

ESWC2016

Page 54: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

Technologycannotdoitalone(1)

•  Theopportunis.cone-shotaTackstypicaloftheearlydaysofBigDatahavebeensupplementedbyleakagesthataremorepersistentand,insomecases,moreworrisome.

•  WeneedtostartdesigningBigDatasystemsnotjusttopreventaTacksandrecoverfromthem,butalsotodetectsuccessfulaTackersquicklyandcontainthemsothatanydataleakagecanbeiden.fiedandcountered.

Page 55: Controlling Leakage and Disclosure Risk in Seman6c Big ...translectures.videolectures.net/site/normal_dl/tag=... · Big Data Threats: Breach • In terms of the ISO 15408 model, a

References[1]HesmanSaey,T.,“BigData,BigChallenges”,ScienceNews,February7,2015[2]Chi,Guangqing,JeremyR.Porter,ArthurG.Cosby,andDavidLevinson.2013."TheImpactofGasolinePriceChangesonTrafficSafety:ATimeGeographyExplana.on."JournalofTransportGeography28(1):1–11.[3]BellandiV.,CimatoS.,DamianiE.,GianiniG.andZilli,A.“TowardsEconomics-AwareRiskAssessmentonTheCloud”,IEEESecurityandPrivacy,toappearinNovember2015[4]Demirkan,H.,&Delen,D.(2013).Leveragingthecapabili.esofservice-orienteddecisionsupportsystems:Puznganaly.csandbigdataincloud.DecisionSupportSystems,55(1),412-421.[5]Damiani,E.,Oliboni,B.,&Tanca,L.(2001).FuzzytechniquesforXMLdatasmushing.InComputa.onalIntelligence.TheoryandApplica.ons(pp.637-652).SpringerBerlinHeidelberg.[6]Damiani,E.,Cimato,S.,&Gianini,G.(2014).“Ariskmodelforcloudprocesses”.TheISCInterna.onalJournalofInforma.onSecurity,6(2),99-123.[7]Bellandi,V.,Cimato,S.,Damiani,E.,&Gianini,G.(2015).“Possibilis.cassessmentofprocess-relateddisclosurerisksinthecloud”.InW.Pedryczetal.,eds.,Computa.onalIntelligenceandQuan.ta.veSoOwareEngineering.Springer-Verlag,2014[8]Chen,M.,Mao,S.,Zhang,Y.,&Leung,V.C.(2014).Bigdatastorage.InBigData(pp.33-49).SpringerInterna.onalPublishing.[9]Forbes,“BigDataBreachesof2014”,availableathTp://www.forbes.com/sites/moneybuilder/2015/01/13/the-big-data-breaches-of-2014/,2015.[10]B.Biggio,B.Nelson,P.Laskov“PoisoningATacksagainstSupportVectorMachines”,Proceedingsofthe29thInterna.onalConferenceonMachineLearningEdinburgh,Scotland,UK,2012[11]E.Damiani,TowardBigDataLeakAnalysis,ProceedingsofIEEEPSBD2015,SanJosè,CA,2015[12]ClaudioAgos.noArdagna,RasoolAsal,ErnestoDamiani,QuangHieuVu:OntheManagementofCloudNon-Func.onalProper.es:TheCloudTransparencyToolkit.NTMS2014:1-4[13]SantoshAditham,NagarajanRanganathan,ANovelFrameworkforMi.ga.ngInsiderATacksinBigDataSystems,ProceedingsofIEEEPSBD2015,SanJosè,CA,2015[14]NaohiroHayashibara,XavierDéfago,RamiYared,andTakuyaKatayama,TheϕAccrualFailureDetector,JSTIS-RR-2004-010