WP3: Data Provenance and Access Control
description
Transcript of WP3: Data Provenance and Access Control
WP3: Data Provenance and Access Control
Giorgos Flouris, Irini Fundulaki, Vassilis Papakonstantinou, FORTHSeptember 9-10, 2013, Heraklion
Slide 2
Presentation Outline
WP3 status and outline◦Research achievements
D3.2 statusReview commentsHealth use case descriptionDemoNext steps (on demo)
Slide 3
WP3: Work Plan View
18 24 30 366 120
Task 3.1ProvenanceManagement
Task 3.2Privacy, DRM and Access Control
Task 3.3Trust Management
42
FORTHFORTH
FORTH, KITFORTH, KIT
EPFLEPFL
D 3.2 Provenance management and propagation through SPARQL query and update languagesD 3.2 Provenance management and propagation through SPARQL query and update languages
D 3.3 Access control system and privacy-aware language
D 3.3 Access control system and privacy-aware language
D 3.4 Trust management and inference system
D 3.1 Access control specification language,
reasoning and enforcement mechanisms
D 3.1 Access control specification language,
reasoning and enforcement mechanisms
Slide 4
Research So Far (Outline)
Abstract models for access control (FORTH)Abstract models for provenance (FORTH)
◦Provenance for SPARQL query◦Provenance for SPARQL update
Privacy (KIT)◦Privacy in smart grids (not integrated)◦Some integration in the demo
Problems (non-critical) – to be discussedTrust (EPFL)
Slide 5
Access Control
The selective exposure of information to different users/roles
Useful for applications involving sensitive information
In the context of LOD:◦Encourages publication of data that may
include sensitive informationStandard approach:
◦Data annotates with specific tags determining whether it should be accessible by specific users/roles
Slide 6
Abstract LabelsTriples associated with abstract labelsA set of abstract tokens (a1, a2, …)
◦Explicit triples associated with such tokens via authorizations
Abstract operators (⊙, , )◦a1 ⊙ a2: the triple occurred via inference from
triples with labels a1, a2
◦a1: the triple occurred via propagation from a triple with label a1
◦a1 a2: the triple occurred in two different manners, one via a1, one via a2 (e.g., two different authorizations)
◦a1 (a2 ⊙ ( a3)): …
Slide 7
Determining Accessibility
Concrete policy◦Associate tokens to concrete values◦Associate operators to concrete operations◦Determine whether the final value corresponds
to an accessible triple (access function)Example
◦a1=1, a2=2, a3=3
◦⊙=min, =max, =ID function◦Accessible iff result >1◦a1 (a2 ⊙ ( a3)) evaluates to 2 (i.e., triple is
accessible)
Slide 8
SPARQL Query ProvenanceWhat is the provenance of the result of a
complex SPARQL query?Adapting relational solutions
◦Positive fragment (semirings) Works fine
◦Non-monotonic fragment (m-semirings) Problem with OPTIONAL, DIFFERENCE Different semantics than SQL
Two alternative approaches◦m-semirings: translation to SQL◦spm-semirings: a new operation (and the
corresponding properties) to capture the provenance of OPTIONAL, DIFFERENCE
Slide 9
SPARQL Update Provenance
What is the provenance of a new triple, inserted via a complex SPARQL Update?
Similar to CONSTRUCT (query)But still different
◦CONSTRUCT creates a new triple but does not modify the dataset
◦Updates specify explicitly the named graph to put the new triple(s) Triples with different provenance may be put in the same named graph
Named graphs alone are not sufficient for capturing the provenance of updates
Slide 10
D3.2 Status
Contents of D3.2◦Abstract models for provenance (very similar to
the abstract models for access control)◦Provenance for SPARQL query results◦Provenance for SPARQL update (inserted
triples)Review version uploaded on the wiki on 05/09/13
◦http://wiki.planet-data.eu/web/D3.2 ◦Only one reviewer at the moment (Oscar)
Volunteers?
Slide 11
Review Comments
Generally happy (“impressed by D3.1”)Applicability
◦Usefulness: convince industry to look into that◦Focus on a real-world use case to demonstrate
valueIn a nutshell
◦Some implementation to show valueSolution: demo (use case)
◦Health use case◦Also suitable to show synergy
Slide 12
Health Use Case
A use case to show applicability and usefulness◦In collaboration with Computational Medicine
Laboratory (CML) of FORTHHealth-related data are sensitiveProposed by the reviewers (Anders Tornquist)
◦Insurance companies need controlled access to sensitive medical data to determine premiums, insurance policies, contract terms etc
Relevant to access control/privacy challenges◦But also related to streaming, data quality and
trust
Slide 13
Personal Health Record
Personal Health Record (PHR)◦Collection of data regarding a patient
Diseases, personal information, medications, clinical observations and findings, measurements, …
Properties◦Sensitive◦Dynamic, sometimes streaming◦Not always of good quality
Slide 14
Relation to Other WPs Relation to WP1
◦ Part of the PHR data may be of streaming nature E.g., vital signs’ measurements of hospitalized patients
Relation to WP2◦ Data often of poor quality◦ Up to 26,9% of the data can be erroneous
Patient provides data, faulty readings, sensors etc Suggestion (for the review)
◦ Outline how the technologies developed in WP1, WP2 could be used (potentially) to address these issues
◦ Specific and concrete, but no implementation needed
Slide 15
Access Control and Privacy
PHR (normally) accessible only by the patient◦Sensitive data
Doctors, nurses, hospitals, insurance companies, public services may require access
Informed Consent◦Patient allows access to (parts of) his PHR to
specific entities, for a specific purpose, in a specific timeframe etc
Via Consent Forms◦Formal, legal document
Slide 16
Objectives
We will use this use case to demonstrate the benefits of our approach
Different entities have access to the same data, without accessing sensitive information◦Unless the owner of the data has explicitly
allowed so (via the consent form)Without replication
Slide 17
Health Use Case Setting
Dataset(collection of PHRs)
Dataset Dat
asetD
atasetD
ataset Dat
aset
Slide 18
result (triples)
SQL,concrete
policy
SPARQL
result(triples)
user request(accessing entity, SPARQL query)
Architecture (Data Access)
PACEM API
SPARQL to SQL Translation Module
accessing entity
concrete policy
•MonetDB •Abstract expressions DB
AnnotationModuleA
AC
AP
I
EvaluationModule
UpdateModule
l1⊙ l2
l2⊙ l3
&a type Person
s p o label
Student type class
l1⊙ l2
l2⊙ l3
&a type Person
s p o label
Student type class
User interface- authentication- queries
•User credentials for authentication
AUTH DB
AU
TH
AP
I
AUTHModule
•Purpose and role hierarchy•Assignment of concrete policies to accessing entities
CPRP DB
CP
RP
AP
ICPRP
Module
Slide 19
Dataset
Advanced Patient Data Generator (APDG)◦Synthetic, but realistic data◦Developed in the context of EURECA (FP7 IP)
Data associated with large medical schemas◦HL7-RIM, SNOMED-CT
10K patients750K instance triples
Slide 20
Data on HL7-RIM (1/2)
Slide 21
Data on HL7-RIM (2/2)
Observation
http://kandel…./entityno/BC_ZSH2012A1000000
http://kandel.…/obsno/5bf7d7bc-a1e8-11e2-bb58-6d82cec8d2c3
…
“Sally Berry”
foaf:name
Entity
Role Participation…
Slide 22
Data on SNOMED-CT (1/2)
http://purl.bioontology…./408643008
“Infiltrating duct carcinoma of breast”
skos:prefLabel
Observation indicating that the patient has“infiltrating duct carcinoma of breast”
http://kandel.…/obsno/5bf7d7bc-a1e8-11e2-bb58-6d82cec8d2c3
Slide 23
Data on SNOMED-CT (2/2)
Infiltrating duct carc. of breast
Neoplasm of breast
Malignant tumor
of breast
Carc. of breast
Infiltrating lobular carc.
of breast
Carc. in situ of breast
Lobular carc. in situ of
breast
Intraductal carc. in situ of
breast
Slide 24
Infiltrating duct carc. of breast
HL7-RIM and SNOMED-CT
Observation
Entity
http://kandel…./entityno/BC_ZSH2012A1000000
http://kandel.…/obsno/ 5bf7d7bc-a1e8-11e2-bb58-6d82cec8d2c3
Neoplasm of breast
Malignant tumor
of breast
Carc. of breast
Infiltrating lobular carc.
of breast
Carc. in situ of breast
Lobular carc. in situ of
breast
Intraductal carc. in situ of
breast
…
“Sally Berry”
foaf:name
Slide 25
Demo Scenario
Breast Cancer Action Fund (BCAF) provides benefits for cancer patients
Requires info on patients’ status to give the benefit
Sally Berry wants to apply for the benefit
Alternative: insurance company wants access to (part of) the data for determining the insurance premium and the contract terms
Demo: http://daphne.ics.forth.gr:8084/pd-demo/login.jsp
Slide 26
Next Steps
Make more explicit the benefit of abstract models◦Efficient updates (no recomputation required)◦Efficient change of policies (no recomputation
required)Try more scenariosPurpose and role hierarchiesMore functionality