Privacy for the Personal Data Vault Information Systems and Computer Engineering

102
Privacy for the Personal Data Vault Tamás Balogh Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering Supervisors: Prof. Ricardo Jorge Fernandes Chaves Master Researcher Christian Schaefer Examination Committee Chairperson: Prof. Luís Eduardo Teixeira Rodrigues Supervisor: Prof. Ricardo Jorge Fernandes Chaves Member of the Committee: Prof. Nuno Miguel Carvalho dos Santos July 2014

Transcript of Privacy for the Personal Data Vault Information Systems and Computer Engineering

Page 1: Privacy for the Personal Data Vault Information Systems and Computer Engineering

Privacy for the Personal Data Vault

Tamás Balogh

Thesis to obtain the Master of Science Degree in

Information Systems and Computer Engineering

Supervisors: Prof. Ricardo Jorge Fernandes Chaves

Master Researcher Christian Schaefer

Examination Committee

Chairperson: Prof. Luís Eduardo Teixeira Rodrigues

Supervisor: Prof. Ricardo Jorge Fernandes Chaves

Member of the Committee: Prof. Nuno Miguel Carvalho dos Santos

July 2014

Page 2: Privacy for the Personal Data Vault Information Systems and Computer Engineering
Page 3: Privacy for the Personal Data Vault Information Systems and Computer Engineering

Acknowledgments

First of all I would like to thank Ericsson for providing me with the opportunity to work on this

interesting research project. Special thanks goes out for Christian Schaefer for his great support

during the thesis work.

I would like to thank my thesis supervisor, Prof. Ricardo Chaves for his help and valuable

feedback during the course of this work.

My gratitude also goes out for the European Masters in Distributed Computing program co-

ordinator, Prof. Johan Montelius, Prof. Luıs Rodrigues and Prof. Luıs Veiga, who guided me

throughout my masters’ program.

Last but not least, I would like to thank my family and friends for supporting me all along.

Page 4: Privacy for the Personal Data Vault Information Systems and Computer Engineering
Page 5: Privacy for the Personal Data Vault Information Systems and Computer Engineering

Abstract

Privacy is an important consideration in how online businesses are conducted today. Personal

user data is becoming a valuable resource that service providers collect and process ferociously.

The user centric design, that stands for the basis of the Personal Data Vault (PDV) concept, is

trying to mitigate this problem by hosting data under strict user supervision. Once the user’s data

leaves its supervision, however, the current privacy models offered for the PDV are no longer

enough. The goal of this thesis is to investigate different privacy enhancing techniques that can

be employed in the scenario where PDVs are used. We propose three different privacy enhancing

models, all based around the use of the Sticky Policy (policy attached to data, describing usage

restrictions) paradigm. Two of these models are inspired by previous research, while the third one

is our novel approach that turns a simple Distributed Hash Table (DHT) into a privacy enforcing

platform. We perform several evaluations of the proposed models, having different aspects in

mind, such as: feasibility, trust model, and weaknesses.

Keywords

Personal Data Vault, privacy, Sticky Policy, trust, assurance

iii

Page 6: Privacy for the Personal Data Vault Information Systems and Computer Engineering
Page 7: Privacy for the Personal Data Vault Information Systems and Computer Engineering

Resumo

A privacidade e um aspecto importante a ter em consideracao na forma como as trocas com-

erciais sao realizadas hoje em dia. Os dados pessoais estao a tornar-se um recurso valioso

que os fornecedores de servicos recolhem e processam copiosamente. Um design centrado ni

utilizador, e a base do conceito do “Personal Data Vault (PDV)”, que tenta mitigar este problema,

acolhendo estes dados pessoais sob estrita supervisao do utilizador. No entanto, assim que o

utilizador deixa de realizar esta supervisao, o modelo de privacidade actualmente disponibilizado

pelo PDV deixa de ser suficiente. O objectivo desta dissertacao e investigar diferentes tecnicas

de reforco desta privacidade, que poderao ser aplicadas nas situacoes onde os PDVs sao us-

ados. Seguidamente sao propostos tres modelos de privacidade reforcada, todos baseados no

paradigma do uso de “Sticky Policy” (polıticas associadas aos dados, descrevendo as restricoes

a sua utilizacao). Enquanto, dois destes modelos sao inspirados no estado da arte existente, o

terceiro constitui uma nova abordagem que transforma um simples Distributed Hash Table (DHT)

numa plataforma de privacidade reforcada. Foram realizadas varias avaliacoes aos modelos

propostos, tendo em mente diferentes aspectos, tais como: viabilidade, confianca e debilidades.

Palavras Chave

Personal Data Vault, privacidade, Sticky Policy, confianca, garantia

v

Page 8: Privacy for the Personal Data Vault Information Systems and Computer Engineering
Page 9: Privacy for the Personal Data Vault Information Systems and Computer Engineering

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 System requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Thesis Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.6 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Background 7

2.1 The Personal Data Vault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 PDV as an Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.2 PDVs in the Healthcare System . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Personal privacy concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Related Work 15

3.1 XACML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Usage Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2.1 UCON in practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 TAS3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.4 PrimeLife . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.5 Other Privacy Enforcement Techniques . . . . . . . . . . . . . . . . . . . . . . . . 23

3.5.1 DRM approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.5.2 Trusted platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.5.3 Cryptographic techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 System Design 27

4.1 PrimeLife Policy Language (PPL) Integration . . . . . . . . . . . . . . . . . . . . . 28

4.2 Verifiable Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

vii

Page 10: Privacy for the Personal Data Vault Information Systems and Computer Engineering

Contents

4.2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.2 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.4 Privacy Manager Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.4.A Verifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.4.B Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2.5 Interaction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.5.A Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.5.B Forwarding Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3 Trusted Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3.2 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3.4 Privacy Manager Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3.4.A Trust Negotiator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3.4.B Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3.5 Interaction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3.5.A Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3.5.B Forwarding Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4 Mediated Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4.2 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.4.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.4.4 DHT Peer Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.4.4.A The Remote Retrieval Operation . . . . . . . . . . . . . . . . . . . 46

4.4.4.B Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4.4.C Keyspace Assignment . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.4.4.D Business Ring Size . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4.4.E Business Ring Description . . . . . . . . . . . . . . . . . . . . . . 49

4.4.5 Privacy Manager Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4.5.A Sticky Policy Enforcement . . . . . . . . . . . . . . . . . . . . . . . 50

4.4.5.B Trust Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4.6 Logging Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4.7 Interaction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4.7.A Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4.7.B Multiple Data Subject (DS) Interaction Model . . . . . . . . . . . . 54

4.4.7.C Multiple Data Controller (DC) Interaction Model . . . . . . . . . . . 55

viii

Page 11: Privacy for the Personal Data Vault Information Systems and Computer Engineering

Contents

4.4.7.D Log Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4.7.E Indirect data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4.8 Prototype Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 58

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5 Evaluation and Discussion 61

5.1 Comparison on Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.1.1 Establishing Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.1.2 Transparent User Data Handling . . . . . . . . . . . . . . . . . . . . . . . . 63

5.1.3 Data Across Multiple Control Domains . . . . . . . . . . . . . . . . . . . . . 65

5.1.4 Maintaining Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.1.4.A Direct Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.1.4.B Indirect Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.1.4.C Sticky Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2 Comparison on Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.3 Comparison on Trust Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.4 Comparison on Vulnerabilities and Weaknesses . . . . . . . . . . . . . . . . . . . 72

5.4.1 Weaknesses of the Sticky Policy . . . . . . . . . . . . . . . . . . . . . . . . 72

5.4.2 Malicious Data Controller (DC) . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.4.3 Platform Vulnerabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6 Conclusion 77

6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

ix

Page 12: Privacy for the Personal Data Vault Information Systems and Computer Engineering

Contents

x

Page 13: Privacy for the Personal Data Vault Information Systems and Computer Engineering

List of Figures

2.1 Personal Data Vault Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Personal Data Vault in the Healthcare System . . . . . . . . . . . . . . . . . . . . . 11

3.1 Overview of XACML Dataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Collaboration Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1 Verifiable Privacy: Abstract Architecture of a single Policy Enforcement Point (PEP)

node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Verifiable Privacy: Interaction diagram between a PDV and a Service Provider (SP) 34

4.3 Verifiable Privacy: Example of Forwarding Chain on Personal Health Record . . . 36

4.4 Trusted Privacy: Abstract Architecture of a single PEP node . . . . . . . . . . . . . 39

4.5 Trusted Privacy: Interaction Model of the Data Flow . . . . . . . . . . . . . . . . . 41

4.6 Mediated Privacy: Architecture of a DHT node . . . . . . . . . . . . . . . . . . . . 44

4.7 Mediated Privacy: Business Ring formed around a healthcare scenario . . . . . . 45

4.8 Mediated Privacy: Privacy as a Service (PaaS) design for the Hospital Service

Business Ring node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.9 Mediated Privacy: DC - DS interaction model . . . . . . . . . . . . . . . . . . . . . 53

4.10 Mediated Privacy: Key Dissemination . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.11 Mediated Privacy: Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.12 Mediated Privacy: Indirect Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

xi

Page 14: Privacy for the Personal Data Vault Information Systems and Computer Engineering

List of Figures

xii

Page 15: Privacy for the Personal Data Vault Information Systems and Computer Engineering

List of Tables

5.1 Requirements Comparison Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.2 Detailed Comparison on Maintaining Control . . . . . . . . . . . . . . . . . . . . . 67

xiii

Page 16: Privacy for the Personal Data Vault Information Systems and Computer Engineering

List of Tables

xiv

Page 17: Privacy for the Personal Data Vault Information Systems and Computer Engineering

List of Acronyms

BFS Breadth First Search

DC Data Controller

DFS Depth First Search

DHPol Data Handling Policy

DHPref Data Handling Preference

DHT Distributed Hash Table

DRM Digital Rights Management

DS Data Subject

noSQL Not Only SQL

MP Mediated Privacy

PaaS Privacy as a Service

PD Protected Data

PDP Policy Decision Point

PDV Personal Data Vault

PEP Policy Enforcement Point

PHR Personal Health Record

PM Privacy Manager

PPL PrimeLife Policy Language

RDBS Relational Database System

RDF Resource Description Framework

SQL Structured Query Language

xv

Page 18: Privacy for the Personal Data Vault Information Systems and Computer Engineering

List of Tables

TCG Trusted Computing Group

TP Trusted Privacy

TPM Trusted Platform Module

TTP Trusted Third Party

UCON Usage Control

UI User Interface

VP Verifiable Privacy

XACML eXtensible Access Control Markup Language

xvi

Page 19: Privacy for the Personal Data Vault Information Systems and Computer Engineering

1Introduction

Contents

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 System requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Thesis Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.6 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1

Page 20: Privacy for the Personal Data Vault Information Systems and Computer Engineering

1. Introduction

The majority of interactions on today’s internet is driven by personal user data. These infor-

mation pieces come in different shapes and forms, some being more valuable than others. For

example, banking details might be considered more valuable than a persons’ favourite playlist.

What all of these data pieces have in common is that they all belong to some specific user. This

property, however, is not reflected in how data is hosted and organized over the web, since the

hosting entities of personal user data consists of multiple service providers. Data belonging to a

single user is fragmented and kept independently under different control domains based on the

context. For example data related to somebody’s social life might be stored in some social net-

work provider, while the same person’s favourite playlist might be hosted by his music provider

service. Different initiatives exist to unify these scattered data. The Personal Data Vault (PDV)

can be considered one such proposed solution.

The PDV is a user-centric vision of how personal digital data should be hosted. Rather than

having bits of informations scattered around multiple sites, the PDV tries to capture these under

a single control domain. Every user is associated with his own PDV where he hosts his personal

data. PDVs are not only secure storage systems, but also offer ways to make access control

decisions on hosted data. External entities, such as different service providers, can request user

data at the user’s PDV, in order to provide some functionality beneficial for the owner of the PDV.

By unifying the source of the personal user data, we are expected to achieve a more flexibility and

better control over how data is being disclosed. By employing an access control solution users

can have assurance that only authorized entities are going to get access to their data. It does

not, however, provide any privacy guarantees with regard to how personal data is being protected

after it leaves the control domain of the PDV.

The PrimeLife [5] was a European project that researched technical solutions for privacy guar-

antees. Their privacy enhancing model introduces a novel privacy policy language, which em-

powers both users and service providers to specify their intentions with regards to data handling.

The privacy policy language, however, lacks the technical enforcement model needed to support

its correct functioning. This enforcement model is required to provide trust and assurance to end

users. A trust relationship needs to be established between remote entities prior to personal

data exchange, while assurance needs to be provided as proof that user intentions have been

respected.

We propose a novel privacy policy enforcement model with an integrated trust and assurance

framework. Our solution utilizes the completely decentralized construct of a Distributed Hash

Table (DHT) to sustain a mediated space between PDVs and service providers. This mediated

space serves as a platform for privacy enhanced data sharing. Pointers to the shared data objects,

which live in the mediated space, are kept by both the owner and the requester. This way data

owners can stay in control over their shared data. A distributed logging mechanism supports our

enforcement model in delivering first hand assurance to end users.

2

Page 21: Privacy for the Personal Data Vault Information Systems and Computer Engineering

1.1 Motivation

1.1 Motivation

Personal user data is becoming a highly demanded and valuable resource, not just for the

users themselves, but also the service providers. Data analytics are carried out at different sites

in order to bring businesses forward. Sometimes these operations on personal user data are even

carried out without the awareness of the user.

Users are mostly unaware of how the explicit data that they provide, like name, address, phone

number, etc. is handled by service providers, such as social networks or e-commerce systems.

Moreover, users also lack control over the information that they are willing to share. The lack of

control manifests in two ways: users are unable to specify the scope in which their data shall be

used, and sometimes they are also unable to retrieve and remove personal information hosted

on a service providers network. The lack of awareness and control leaves the user defenceless

against privacy violations.

The system in place today, used to avoid the privacy violations described above, is built around

a trust framework. The Privacy Policies offered by service providers are considered to be the

pillars of this trust framework. These Privacy Policies are often presented to the end user in the

form of static texts, describing how personal user information is going to be treated by the data

collector. Nowadays we are used to seeing more diverse privacy options that can be set by the

end user, like the sharing setting regarding a post in a social networking website.

The main problem we are faced with when looking at the approach to provide data privacy is

that it is highly unbalanced. It offers guarantees of a one-sided privacy system, since the data

collector is the sole entity that decides how personal data is handled, without the involvement of

the user. This leaves their clients with a ”take it or leave it” offer, which clients are often willing

to take. The result of this compromise is that user data ends up under the full control of the

data collector. Another problem with these Privacy Policies is that they are often lengthy and

ambiguously stated, such that they become hard to decipher for the average user. Moreover, it

only offers a static policy setting that might not fit every user’s requirement. Their more dynamic

counterpart, the user settable privacy options, offer a bit more flexibility, but the implementation

of these settings are again fully up to the data collector himself. This in turn means that they can

revoke or modify these privacy options without the consent of their users.

The lack of a system that promotes the user-centric vision with regard to privacy concern

motivates us to look for possible alternatives to improve how we handle personal data privacy

today.

1.2 Problem Statement

The problem that this thesis focuses on is the one of providing privacy guarantees for a system

where PDVs are widely used. Although the PDV concept allows to have a fine grained access

3

Page 22: Privacy for the Personal Data Vault Information Systems and Computer Engineering

1. Introduction

control over the user’s personal data, it still fails to address the issue of how remotely stored

data should be protected. It is important to notice that once the user chooses to disclose some

personal data he is left vulnerable to privacy violations. User privacy can become compromised

through unawareness and lack of control.

1.3 System requirements

In order to provide a higher degree of awareness and control to the end user the underlying

technology needs to provide a higher level of trust and assurance. The user-centric design of the

PDV system, although offers a comprehensive picture on how data should be organized, leaves

many specifications open regarding the privacy requirements. The following details the major

requirements set by this thesis. This list of requirements forms a solid foundation of the trust

framework that in turn focuses on achieving a user-centric model. They are as follows:

1. Establishing trust between actors, like service providers and data owners. Trustworthiness

refers to the degree of assurance in which an actor can be trusted to carry out actions that

he is entrusted with. The user needs to have some sort of mechanism to determine whether

a service provider is going to treat his data according to pre-agreed set of rules. Pre-agreed

rules, or data handling rules, should be formulated in agreement with both parties, and they

should adhere to the correct handling of personal data.

2. Transparent user data handling should be a priority for every service provider. Users need

to get assurance that their preferences on how to handle their data are carried out by the

actors. Assurances are a form of trustworthy reports that describe the business process

that has been carried out over the user’s data. Continuous assurance will turn into a higher

degree of trust that users can develop over time.

3. Data protection across multiple control domains is needed in order to facilitate the safe

interoperability of multiple service providers. Delegation of rights to forward user data is a

common use case, therefore there should be a clear model that describes how delegations

take place, and how the data protection rules apply to the third party who receives the data.

4. Maintaining control over distributed data promotes user centrality. In the user-centric

model the owner of the personal data is considered to be the user, even in the case when he

chooses to share it with other parties. He must have a way to continue his rights to exercise

operations on his personal data, such as: modifications, revocation of rights, deletion, etc.

1.4 Contributions

The goal of this thesis work is to research the existing privacy enhancing techniques that could

be employed in a PDV oriented system. The first contribution for the work is to investigate whether

4

Page 23: Privacy for the Personal Data Vault Information Systems and Computer Engineering

1.5 Thesis Scope

the privacy policy language proposed by the PrimeLife [5] project fits the highly distributed PDV

system.

The second contribution is to categorize several different privacy enforcing models for the

considered problem. These models are used to guarantee the correct functioning of privacy

policies established in the first contribution, by covering some of the existing privacy enforcing

techniques proposed by related research. While formulating these alternatives, we proposed a

novel privacy enforcement model, which relies on the concept of a mediated space where shared

objects live.

The third contribution is to provide an evaluation of the proposed privacy enhancing models

herein formulated. This evaluation takes into account different tradeoff criteria, namely the initially

proposed requirements, feasibility, trust source, and vulnerabilities. By doing this, we evaluate the

strengths and weaknesses of our proposed models.

The final contribution is the development of a prototype implementation based on our novel

enforcement model, to show that the proposed concept can be carried out within the scope of

currently existing technology.

1.5 Thesis Scope

The design and evaluation of different privacy enforcement models used together with PDVs

bears complexities beyond the scope of this thesis project. First of all, we refrain from talking

about the detailed design and architecture of a PDV. Furthermore, we also do not consider every

security aspects related to the PDV concept. Instead, we use PDVs as abstract building blocks

clearly defined in Section 2.1.

The definition and design of the proposed privacy enforcing models in this thesis are also not

subject to a complete security evaluation, as we are more concerned with the privacy aspects.

Assumptions on the existence of secure channels and storage systems are made throughout this

thesis. Moreover, we also assume a well defined identity framework which guarantees the identity

provisioning and verification of every actor in the system.

Providing privacy guarantees is also a vast research field by its own. This thesis is focused on

enforcement techniques for privacy policy languages, such as the one outlined in the PrimeLife

project. In order to define a clear goal for the thesis, the scope of the work regarding the design

of the enforcement models is narrowed down to a set of requirements outlined in Section 1.3.

Requirements are targeting aspects, such as: trust establishment, data handling transparency,

data across multiple control domains, and maintaining control. These requirements also serve as

a basis for evaluation. We refrain from talking about any quantitative performance measurements

in our evaluations, since the thesis is carried out on a conceptual level.

5

Page 24: Privacy for the Personal Data Vault Information Systems and Computer Engineering

1. Introduction

1.6 Dissertation Outline

The upcoming chapters are organized as follows. Chapter 2 focuses on the description of

the background concepts used in this thesis, containing the research involving PDVs and a short

study on privacy concerns. Chapter 3 presents relevant projects involving research in privacy

enforcement techniques. Chapter 4 presents the three privacy enforcement models herein pro-

posed, highlighting the novelty of the proposed solution, called Mediated Privacy (MP). Chapter 5

contains the evaluation of the models proposed in Chapter 4, based on our requirement set, and

other metrics, such as feasibility and trust source. Chapter 6 concludes the thesis with a summary

of the conducted work and suggestions with regards to future works.

6

Page 25: Privacy for the Personal Data Vault Information Systems and Computer Engineering

2Background

Contents

2.1 The Personal Data Vault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Personal privacy concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

7

Page 26: Privacy for the Personal Data Vault Information Systems and Computer Engineering

2. Background

In Chapter 2 the relevant background material used to carry out this thesis is presented. The

first section is focused on detailing the concept of a Personal Data Vault (PDV). The second

section focuses on the description of privacy concerns, namely: awareness, control and trustwor-

thiness.

2.1 The Personal Data Vault

The interactions that people are having over the Internet contain a significant percentage of

personal user data. Users are asked to provide personal information in exchange for access to

some advertised online service. For example, a person might use a social media site to stay

connected with friends and share information about himself, such as name, address, likes, and

dislikes. This person might also be part of other social community sites, such as a virtual bookclub,

or a career portal, where she has to share similar personal information again. Following this

model, the data that belongs to a single user will end up at multiple hosting sites.

This model, although suits the needs and desires of the service providers, leaves the users in a

difficult position when they want to interact with remotely hosted data. It is becoming increasingly

difficult for users to collect their data from multiple control sites to provide interoperability. One of

the downsides of it is the phenomenon called lock-in. It is getting increasingly difficult for users to

migrate between services that they are using, because the data that they previously shared with

a service provider is locked-in under their control domain. Another concern is data fragmentation

which lets data exist in inconsistent states. A user can have his address hosted by different

services, but under different formatting, which in turn may lead to confusion when interoperability

needs to be provided. The root of all of these concerns are that the user lacks the appropriate

fine-grained control mechanism over his own data.

In order to provide a solution for easy interoperability and fine-grained control the Personal

Data Vault proposes a user-centric design that tries to unify personal data under a singe control

domain.

“Built with security and legal protections that are better than most banks, your vault

lets you store and organize all the information that powers your life. Whether using a

computer, tablet or smartphone, your data is always with you.” [19]

The Personal Data Vault also appears under various other terminologies, like “Personal Data

Store” or “Personal Data Locker”. The attempts to formalize the concept of a PDV are comple-

mentary in the sense that they all try to focus on providing a better control over personal data for

the end user. However, a clear formalization of the term is still missing, since projects are built

with different aims in mind. Some of them conceptualize a raw storage service with the only pur-

pose to host data securely, while others focus on providing software solutions to manage already

existing storage spaces or even link different user accounts.

8

Page 27: Privacy for the Personal Data Vault Information Systems and Computer Engineering

2.1 The Personal Data Vault

There have also been efforts to categorize different approaches that research projects take in

order to formalize what a PDV is actually like [29]. These fall into three main categories:

1. Deployment of these unified user data stores can be facilitated by a centralized cloud-

based service, which in turn grants the user full control over the hosted data. On the other

hand, this requires a high level of trust in the hosting entity. Alternatively, deployment can

also be split between multiple trusted hosting providers, or even kept under end user’s local

machines.

2. Federation is also an important consideration that focuses on interoperability between mul-

tiple different storage providers and individuals. It tries to outline different interaction models

that facilitate the collaboration between different deployments.

3. Client-Side solutions are targeting individuals to use their own devices as data hosts to-

gether with a social peer-to-peer network. Without the need of a centralized entity to govern

data movement the solution focuses on a more ad-hoc solutions.

There is also a substantial difference in how these projects envision the data model and inter-

nal storage system that are used for hosting personal user data. While some are leaning towards

using Relational Database System (RDBS), others are looking into solutions such as Not Only

SQL (noSQL), and semantic Resource Description Framework (RDF) stores.

Since security is a central concern of all of these solutions, they mostly come with an additional

data access layer on top of the storage system. This access layer facilitates the interoperability

between different entities in a secure manner. The fine grained control can be achieved through

the use of access control mechanism that rely on predefined policies. These policies can either

be confirmed by the end user, or constructed on the fly.

Another key aspect of these projects is the interoperability of different entities [11]. PDVs

should integrate seamlessly with other entities and facilitate the secure sharing of data across

different control domains. The security of these operations can be guaranteed by providing en-

crypted channels between entities. These interactions can be of multiple types depending on the

acting sides. Person-to-person connections are trying to connect individuals: independent entities

that serve as representative hosts for a person. Person-to-community solutions try to formulate

groups of persons depending on some social context. Person-to-business connections are de-

scribing how individuals are interacting with different service providers. In order to achieve these

features interoperability needs to be provided, that overcomes the differences in the underlying

data model with the aid of standardized APIs and protocols.

2.1.1 PDV as an Abstraction

For the purpose of this thesis work the PDVs is treated as an abstraction of a data layer

together with a manager layer. We consider these to be entities made out of a single or multiple

9

Page 28: Privacy for the Personal Data Vault Information Systems and Computer Engineering

2. Background

machines with high availability. Moreover, we consider them resilient in the face of failures and

secure in the face of vulnerabilities and exploits that may be used directly by a potential attacker.

Herein, we disregard these security aspects, and focus on the privacy concerns that appear in the

interoperability scenarios.

Figure 2.1: Personal Data Vault Abstraction

Figure 2.1 depicts the high level abstraction of a single PDV entity. The data layer on the bottom

of the abstraction represents the collection of hosting machines, that facilitate secure storage of

personal information. These machines can either be found under the direct control of the data

owner or they can also be multiple interconnected machines residing on external entities that are

fully trusted. Again, the purpose of this project is not to investigate safe data storage for PDV, but

rather focus on what happens to data once it leaves the PDV.

The manager layer above the data layer acts as a guard for the personal data. It guarantees

that only authenticated and authorized requesters are able to get access to data. The rules de-

scribing the access control policies in place are under the full control of the PDV owner. Secondly,

it also offers an external interface that facilitate the interoperability with different PDVs and external

service provider entities as well.

2.1.2 PDVs in the Healthcare System

Several research projects involving privacy enhancement [14][16][18] are focusing on the

healthcare system as their main use case. Benefits of a safe and reliable information system

interconnecting healthcare centers clearly outweighs the benefits in other domains, because of

its potential of saving human life.

The information systems of healthcare centers operate on Personal Health Records. A Per-

sonal Health Record (PHR) is a collection of relevant medical records belonging to a single patient,

containing information such as chronic diseases, check-ups, allergies, etc. PHRs are, usually,

hosted by the healthcare center in which a patient was examined. This design requires PHRs to

be shared among different health centers in cases of patient migration, or emergency situations.

This can become cumbersome, since it requires interoperability of multiple independent services.

10

Page 29: Privacy for the Personal Data Vault Information Systems and Computer Engineering

2.2 Personal privacy concerns

Figure 2.2: Personal Data Vault in the Healthcare System

The user-centric design focusing on data unification fits the presented healthcare scenario

operating on PHRs. Instead of healthcare centers hosting PHRs, they could be kept directly in

a PDV, under the direct control of the owner of the PHR. Figure 2.2 illustrates how a PDV can

become beneficial in an emergency scenario. Imagine Bob, owner of PDV-Bob, is using the Home

Hospital Service for his regular check-ups and treatments. During check-ups and treatments

the Home Hospital Service extends Bob’s PHR with relevant information, such as his allergy

of antibiotics. His PHR is regularly updated in his PDV. Imagine Bob going on vacation in a

foreign country, and suffering an accident where he loses consciousness. As Bob is taken into

the foreign hospital, the doctors determine that he needs antibiotics in order to prevent infections.

Instead of a rushed procedure, the doctor could first discover the patient’s identity from his ID

card, then consult his PHR, from PDV-Bob, through the Foreign Hospital Service. Assuming that

hospital’s staffl are authorized to access Bob’s PHR, the foreign doctor can discover his allergy

and administer an alternative solution, potentially saving Bob’s life. His treatment in the Foreign

Hospital can be appended to his PHR and followed by the Home Hospital, once Bob returns from

his vacation.

2.2 Personal privacy concerns

The maintenance of personal privacy is becoming an increasingly important concern in how

businesses are conducted over the internet today. The safeguarding of personal privacy rights

is relying on a tangled framework which incorporates legal regulations and business policies.

Business policies are required to be built on top of existing regulations that are in place at the

location where the said business is conducted.

For example, the Data Protection Directive formulated by the European Union [12] is one such

legal regulation that provides a set of guidelines on how personal user privacy has to be protected

in the virtual space. In the literature [12][26][5] we can highlight two important terminologies in

use: the concept of Data Subject (DS) and Data Controller (DC). The Data Subject is an individual

who is the subject of personal data. This may commonly be associated with the average user or

client that is sharing some personal data. The Personal Data Vault (PDV) being an entity under

11

Page 30: Privacy for the Personal Data Vault Information Systems and Computer Engineering

2. Background

the control of its owner can also be considered as a DS. The Data Collector is an entity, or a

collection of entities, who is in charge of deciding how collected personal data from the DS is

used and processed. Most of these regulations are targeting the interaction between the DS and

DC to assure that personal data is only collected and processed with the consent of the DS.

The Data Protection Directive has been around since 1995, however, due to the changes in

the IT technology and best practices since then, the directive is becoming obsolete. It fails to take

into account concerns enveloping technologies such as cloud based services or social networks.

A new directive has been proposed [13] in order to face these challenges, since business policies

are starting to become increasingly divergent from initially established regulations.

This new regulation tries to clarify and improve privacy regulations. However, the implementa-

tion of new reforms are always time consuming, and with the quickly changing technology there

is no guarantee that these new regulations will not become obsolete once again. There is also a

great difficulty in formalizing how these regulations protect personal data across different political

zones where other regulations are in place. Business policies associated with service providers

are global, since their services are available regardless of physical location, in most of the cases.

Privacy regulations, on the other hand, are locally applicable laws that change across borders.

The difficulty lies in integrating different local regulations together, since sometimes they are in-

compatible.

The privacy concerns formulated by this and other data protection directives can be catego-

rized under three important aspects [17], namely: awareness, control, and trustworthiness.

Awareness:

The first concern related to privacy is awareness. DSs have to be aware of how the data that

they share is going to be handled by the DC. Handling of data should be in accordance with the

purpose of usage and policies agreed upon by the DS. Policies describing user data handling are

usually provided by DCs and they include information like: processing policies, modification and

forwarding of personal data.

These policies alone, however, only offer a limited amount of awareness for DS on how their

explicitly shared data is processed. More alarmingly, implicit data collected about user behaviour

on the internet, like search keywords, visited pages, clickstreams are also collected and pro-

cessed without the user’s consent. Service providers, such as social networking websites and

e-commerce systems, are notoriously infamous for their practices in collecting their users’ per-

sonal information and through different analytical and profiling techniques use it for different pur-

poses, such as targeted advertisement. Moreover, personal records may also be disclosed to

third parties, such as governments, without the user being aware.

Unawareness of how these personal information pieces are used surrounds many interactions

over the web. In some cases, users can end up giving consent unawarely for information sharing

12

Page 31: Privacy for the Personal Data Vault Information Systems and Computer Engineering

2.2 Personal privacy concerns

because of deceitful user interfaces, or simple carelessness. When seeking comfort in the per-

sonal privacy policies provided by DCs, users can also be left confused because of the complexity

and the abstractness of these statements. Missusage of personal data can lead to problems such

as decontextualization. Explicitly shared personal information can get processed and reposted

under a different context or purpose from which it was initially designed to. This may lead to

confusion and loss of personal privacy.

Control:

Control is the second aspect that surrounds privacy concerns. The policies governing personal

data handling should be created in correlation with the user’s preference. Many service providers

offer a set of privacy options which can give liberty to the user to formulate different privacy

profiles. These options, however, lack the fine-grained control which the users need to have over

their shared data. Policies should be flexible enough to let the users formulate how their data can

be processed or even disclosed to third parties. There is also a need for being able to modify or

even revoke previously given consents. Users should be able to retrieve their personal data at

will.

There is also another category of personal data, called indirect data, which completely lacks

means of control by the DS. Indirect data can be considered data that are not explicitly shared by

a DS, but it is still connected to his identity. For example, pictures that other people share of you

over social networking sites can be considered as indirect data. Frequently, systems offer little

to no control over data objects which are not shared explicitly by a user, but are still tied to his

identity. This in turn can lead to disclosure of personal data without the consent of the original

data owner.

Another concern surrounds the way in which service providers physically host personal data.

In order to offer features such as high availability and fault tolerance, systems often keep replicas

and backup copies of data objects, sometimes across different control domains. This leads to

difficulties when a user decides to discontinue the use of a service, and requests the service

provider to delete all previously shared data. In many cases these service providers are retaining

backup copies for an indefinite amount of time, even after the request for deletion has been

completed.

Trustworthiness:

The mechanism to provide awareness and control are complemented with trust. Trust is given

to DCs by DSs if they follow regulations and respect privacy policies. The existing privacy regu-

lations should serve as the baseline of trust. However, as shown before these can often lead to

confusion whenever contradictory regulations are encountered.

Data Collectors are also trusted to have a secure system resilient to vulnerabilities and outside

13

Page 32: Privacy for the Personal Data Vault Information Systems and Computer Engineering

2. Background

attackers, such that personal data cannot be directly stolen. Failure to implement secure software

solutions may lead to disastrous personal privacy violations in the face of data theft. Unfortunately,

the technical means currently in use are providing little to no assurance in how well these systems

are privacy compliant. Providing a highly trusted service should be a priority of every service

provider, since the lack of trust discourages new clients from using the advertised services, which

in turn is bad for business.

Trust also applies to all entities that get access to users’ personal data. For example in the case

of a social networking site the service provider is trusted to offer a secure and privacy respecting

service, but friends who have direct access to a person’s information are also trusted not to use it

without their consent.

2.3 Summary

The background of this thesis work involves privacy concerns in Personal Data Vaults (PDV). A

PDV is an entity associated with a person or a business, providing safe storage and secure access

to personal data. For the purpose of this thesis, we are using PDVs as abstract building blocks

which serve as the main sources of personal user data. The applicability of PDVs is demonstrated

through the example presented in the domain of the healthcare system, using Personal Health

Records.

Privacy concerns, such as awareness, control and trustworthiness, are surrounding online

interactions. The Data Subject (DS) and Data Controller (DC) are two terms commonly used

to denote the user, whose data is being collected, and the service provider, who collects the

data. PDVs are generally seen as DSs, while the DC role is mostly assumed by external service

providers. Existing local regulations on privacy protection and business privacy policies are not

enough to prevent privacy violations, as shown by the examples of unawareness, lack of control,

and untrusted services.

14

Page 33: Privacy for the Personal Data Vault Information Systems and Computer Engineering

3Related Work

Contents

3.1 XACML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Usage Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 TAS3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.4 PrimeLife . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.5 Other Privacy Enforcement Techniques . . . . . . . . . . . . . . . . . . . . . . 23

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

15

Page 34: Privacy for the Personal Data Vault Information Systems and Computer Engineering

3. Related Work

Chapter 3 focuses on existing related work which has been carried out with regards to the

domain of privacy enforcement. The Chapter begins with a short introduction to the XACML

policy framework in Section 3.1. Afterwards it presents some of the relevant research projects

involving privacy enforcement, highlighting the PrimeLife project in Section 3.4.

3.1 XACML

The eXtensible Access Control Markup Language (XACML) is an XML based policy language

standardized by OASIS [4]. The language itself provides a set of well defined policy building

blocks that facilitates the definition of complex access control scenarios. It supports multiple

policies on a single resource, which in turn are combined in order to provide an access decision.

The language is attribute based, meaning that actors and resources can be flexibly described by

a set of attributes. Version 3.0 of XACML also supports obligations for extended access control.

Obligations are specific actions that have to be taken on a predefined trigger, usually after the

access decision has been carried out. Its highly extendible design granted its popularity among

other existing policy language frameworks.

Figure 3.1: Overview of XACML Dataflow1

Apart from the language itself, it also offers a high level architecture that describes how the

policy language can be used to build an access control engine. The dataflow of a high level

architecture can be seen in Figure 3.1. Incoming access requests are routed through a Policy

Enforcement Point (PEP) depicted in point (2) of Figure 3.1, which offers a well defined commu-

nication interface with the rest of the architecture(3). The Context Handler dispatches the request

to a Policy Decision Point (PDP) (4), which is responsible for returning an access decision. The

PDP combines relevant policies stored in the Policy Administration Point (1) with the required at-

tributes (5). The required attributes (specific information on either a Subject, a Resource or the

16

Page 35: Privacy for the Personal Data Vault Information Systems and Computer Engineering

3.2 Usage Control

Environment) are collected by the Policy Information Point (6)(7)(8)(9) and facilitated to the PDP

(10). After the PDP successfully combines the relevant policies, it returns an access decision to

the requester via the Context Handler (11)(12). Additional restrictions that might apply in the form

of obligations have to be carried out by the PEP with the help of an Obligation Service (13).

3.2 Usage Control

There have been many approaches over the years to achieve the safeguarding of valuable

digital objects. Traditional access control solutions offer a way to grant access to protected digital

objects only for authorized entities. These solutions, however, often require a set of predefined

entities in a closed system, such as a company. Trust management offers ways to employ access

control on unknown entities over larger domains.

Digital Rights Management (DRM) solutions are client-side systems that offer the protection

of disseminated digital objects. Each of these mechanisms focuses on different digital object

protection solutions depending on context and requirements.

The Usage Control (UCON) [27][25] research tries to formalize a more extensive solution that

offers digital object protection by embedding traditional access control, trust management and

DRM together with two novel approaches for data protection. UCON tries to capture the whole

lifecycle of a data object, even after it goes beyond authorization. By focusing on the whole

lifecycle, UCON provides the privacy features that previous systems with digital object protection

lack. The two proposed concepts that allow UCON to provide a more extensive control mechanism

over its predecessors are the mutability of attributes and the continuity of access decision.

UCON is envisioned to follow attribute based access control, which requires data requesters

to poses a set of attributes that makes them eligible for authorization. Attributes are used to

formulate rights that a given subject has on a given object. Up until now, this can be realized

through the use of a traditional access control system. The mutability of attributes refers to the

dynamic nature of the attributes, which can be subject to change. Based on these dynamic

changes the authorization rules also have to adapt and be re-evaluated to provide a potentially

new access decision.

Continuity of access decision means that UCON tries to enforce certain security policies not

only during authorization, but also while the object is being used, and after usage, thus covering

the whole lifecycle of it. It carries this out by the use of certain policies that can appear under the

form of:

Authorizations: a set of required attributes that have to be provided and verified during the

pre-authorization phase. This can include certain identity checks of the requesting party.

Conditions: seen as attributes that describe environmental aspects that can affect the access

decision. For example, an object can only be accessible during a given timeframe of the day. Such

1Figure 3.1 source: http://ptgmedia.pearsoncmg.com/images/ch7 9780131463073/elementLinks/07fig09.jpg

17

Page 36: Privacy for the Personal Data Vault Information Systems and Computer Engineering

3. Related Work

conditions have to be evaluated on the pre-authorization phase and during the ongoing usage of

the object.

Obligations: predefined rules that safeguard a protected object after the authorization phase

has granted access to it. Obligations can be activated in any phase during or after the access

decision, and provide privacy enhancing features.

3.2.1 UCON in practice

Although there have been many proposed approaches to implement UCON [27][25][7] it is

generally considered to be a hard problem, given the complex and demanding set of requirements.

In general, UCON tries to realize a data protection framework that relies on the use of certain

enforcement points. These enforcement points can either be present on the server side, providing

a more traditional central approach; or on the client-side, which resembles a DRM system that

is controlling the secure dissemination of digital objects. Hybrid approaches have also been

proposed [27] that try to formalize a symmetric system where both client and server side are

becoming enforcement points. Another proposed solution in [7] is to harness the power of the

quickly growing cloud industry. It proposes the implementation of the UCON framework by shifting

the enforcement point into the cloud. A Software as a Service solution could provide safeguarding

of user data by policies and mechanism described by the UCON research.

Another subset of projects focuses on the security aspects of the enforcement points. In order

to guarantee that these nodes are in fact safeguarding digital objects by enforcing policies, differ-

ent technical measures can be taken. In order to provide assurance, [24] proposes monitoring on

different levels of abstraction. In practice, it focuses on how specialized monitors, such as an OS

monitor, can be used to trigger and carry out events described in obligations. Assurance can be

complemented by providing trust in enforcement points. They propose an implementation that fol-

low the design suggested by the Trusted Computing Group (TCG), which described how Trusted

Platform Module (TPM) enhanced hardware can be used to guarantee that a remote system is

tamper proof.

The features described by the UCON research served as one of the basis for the requirements

set for the models presented in this thesis work. The continuity of access decision captures the

idea of maintaining control over shared personal data objects that are no longer under the direct

control of the user. Moreover, some of the enforcement techniques associated with UCON are

also present in some of our proposed data protection models. We diverge, however, from the

vast focus of UCON to a more narrow scope involving privacy, which means that we are more

concerned about what happens to shared data after disclosure, rather than looking at the whole

lifecycle of digital object.

18

Page 37: Privacy for the Personal Data Vault Information Systems and Computer Engineering

3.3 TAS3

3.3 Trusted Architecture for Securely Shared Services (TAS3)

Trusted Architecture for Securely Shared Services [6] was a European research project from

the Seventh Framework Programme (FP7) concluded in 2011 which addressed some of the se-

curity and privacy concerns regarding personal data distribution across data collectors. Its main

focus was to specify and design a security and trust framework that is generic enough to encom-

pass multiple business domains and provides a user-centric data management in a completely

heterogeneous setting.

In order to promote user-centrality, it examines the possibility of a PDV-like design where data

is kept under the direct control of the end users, rather than scattered around data collectors. The

interaction model required to support data sharing in such a model is facilitated by the Vendor Re-

lationship Management (VRM) [21]. VRM describes a reverse Customer Relationship Manage-

ment (CRM) model where service providers are the ones who subscribe to the users’ personal

information store to get access to data.

It also addresses the difference between by-me and about-me data. By-me data counts as

a direct form of personal data that is submitted or shared by the data owner explicitly. As an

example, a personal CV containing the professional background information of a person is by me

data. On the other hand, if this person attaches a transcript of grades from an institute, that can

be considered about-me data, since its issuer and verifier is the institute rather than the individual.

Control over about-me data can be considered much more cumbersome than that of by-me data,

since about-me data is often hosted and controlled by entities other than the subject of the data.

A proposed solution is to keep updated links pointing to about-me data such that the data subject

can place a relevant data handling policy next to it.

Other subprojects from within TAS3 are examining how changes to the policy framework guard-

ing personal data can promote user-centrality. Today’s unilateral policy system does not meet

the requirements concerned by data privacy, since it empowers the data collector to treat per-

sonal data at will. Traditionally, users are concerned about privacy, while service providers are

concerned about access control over their resources. Instead of treating these two concepts

separately, it tries to encapsulate them under a single bilateral policy framework that lets users

formulate privacy policies and service providers have access control policies. In order to combine

these two policy types, a policy negotiation framework is proposed in [23]. This framework is

responsible for the creation of data protection policies, constraining access to the shared data.

These policies are then signed and distributed in a non-refutable manner in order to assure that a

potential privacy violation can be discovered. Every entity is then responsible for evaluating and

respecting these contractual agreements in the processing and usage of every shared object.

A large part of the research focus is directed towards designing a federated infrastructure

[14][9] which is generic enough to accommodate many different use cases across heterogeneous

19

Page 38: Privacy for the Personal Data Vault Information Systems and Computer Engineering

3. Related Work

systems. The need for high interoperability between independent organizations is partly achieved

by providing a privacy enhancing solution that does not rely on a specific policy language. Con-

straining access to personal data in highly distributed architectures require a complex decision

making process that sometimes relies on multiple independent Policy Enforcement Points (PEP),

which are designed in an application dependent manner. The incompatibility between different

policy frameworks used across different entities raises conflicts when a suitable protection policy

for a shared object has to be formulated. To provide interoperability across organizations a conflict

resolution framework is needed.

Policies and security concepts can have different implementations at different sites. The as-

sumption that all organisations use the same terminology when it comes to data protections does

not hold. In situations when two independent parties need to share data in a secure manner, a

policy negotiation phase has to take place. In order to provide an automated solution it proposes

an ontology based policy matching framework [10] which lets every actor express his security

concerns in his own vocabulary and provides a generic way to map between vocabularies.

Another approach [14] tries to solve the conflict resolution by introducing a central component

called the MasterPDP which governs and combines the independent access decisions coming

from the stateless Policy Decision Points (PDP).

A version that offers a better scalability is proposed in [9]. Instead of having a central decision

point, it introduces multiple application independent Policy Enforcement Points (PEP) that serve as

wrappers over every application dependent PEP, and mediates the access decisions between the

PEP and the PDP. These application independent PEPs are communicating on an independent

communication channel and serve the resolved policies to their application dependent PEP.

The requirements set by our proposed models, defined in Section 1.3, can be seen as a subset

of the requirements formulated by TAS3. We specifically offer an evaluation of our proposed mod-

els that takes into account the differences between by-me and about-me data. Although offering

a generic solution greatly increases interoperability, our solutions are not built with federation as

the main focus.

3.4 PrimeLife

The PrimeLife Project [5] was a research project conducted in Europe under the Seventh

Framework Programme (FP7), concerned with privacy and identity management of individuals.

They are addressing newly appearing privacy challenges in large collaborative scenarios where

users are leaving a life-long trail of data behind them as a result of every interaction with services.

Its extensive research domain investigates privacy enhancing techniques in areas such as policy

languages, infrastructure, service federation and cryptography.

The Privacy and Identity Management for Europe (PRIME) [8] conducted in FP6, predeces-

20

Page 39: Privacy for the Personal Data Vault Information Systems and Computer Engineering

3.4 PrimeLife

sor of the PrimeLife project, also offers valuable insight on privacy and identity management. It

uses pseudonymous identities to achieve different levels of unlinkability between users and their

personal data trails in order to avoid profiling and preserve privacy. Moreover, it strives to give

back control to the end user by designing an architecture that enforces pre-agreed data protec-

tion policies of shared objects. The functioning of such a design is highly dependent on the trust

level given by the end users to service providers. PRIME tries to investigate the different layers

of trust. The system that lets individuals share data with a pre-agreed data handling policy needs

to be enforced by strong technical measures that provide trust and assurance. Major technical

solutions to achieve trust are rooted in verification of trusted platforms in order to guarantee that

remote services are privacy compliant.

The PrimeLife project follows the work outlined in PRIME. One of its major contributions is the

investigation and design of a suitable policy framework that encompasses the privacy features

which promote user-centrality and control of private data. The proposed solution is centred around

the development of the PrimeLife Policy Language (PPL) [33] which is a proposed extension of

the existing XACML [4] standard.

Figure 3.2: Collaboration Scenario2

The core idea of how PrimeLife is intended to use PPL to facilitate privacy options can be

described using a simple collaboration diagram in Figure 3.2. The scenario describes the inter-

action between the Data Subject (DS), who is considered the average user or data owner whose

privacy needs protection; Data Controller (DC), which denotes a wide range of service providers

that the user can be interacting with; and the Third Party, who is considered to be another entity

involved in the business process, like an associate of the service provider. The interaction is initi-

ated by the Data Subject who is requesting some sort of resource from the DC. The DC responds

with its own request, describing what kind of information he expects from the user in exchange

for the resource, and how he is willing to treat that information. The description provided by the

2Figure 3.2 source: http://primelife.ercim.eu/images/stories/deliverables/d5.3.4-report on design and implementation-public.pdf

21

Page 40: Privacy for the Personal Data Vault Information Systems and Computer Engineering

3. Related Work

DC on how he will treat private personal data is called Data Handling Policy (DHPol). The DS

examines the list of information requested together with the DHPol, and combines it with his own

Data Handling Preference (DHPref). The DHPref is the user’s way to describe how his personal

disclosed information is preferred to be treated. A combination between the DHPol and DHPref

results in a Sticky Policy that is sent together with the requested personal data, in exchange for

the resource. The Sticky Policy contains all the relevant data protection rules which have to be

respected by the DC. The direct collaboration between DS and DC ends here. However, the DC

may decide to forward the collected personal data from the DS to a Third Party. In this case, the

DC has to consult the Sticky Policy first, in order to examine whether he is allowed to forward the

information collected from DS or not, and act accordingly. In order to support such a scenario an

expressive language is needed. The PPL is a highly descriptive and easily extendible language

that can support the collaboration scenario described above.

PPL builds on the concept of the existing Sticky Policy paradigm, which serves as the basis

for many privacy and data security related research projects [5][9][28][22]. Sticky Policies are

data access rules and obligations formalized for machine interpretation that are tied together with

a given data object which they protect. The intuition behind it is that data moves around across

multiple control domains together with its associated Sticky Policy, which in turn describes how the

data can be treated. This requires the data object to be closely coupled with its Sticky Policy. In

order to assure that these policies will not get stripped off and ignored, certain Policy Enforcement

Points (PEP) are required to enforce their usage.

One of the contributions that the PPL brings to the existing Sticky Policy paradigm is the two-

sided data handling policy/preference that lets the DS and DC formulate a sticky policy suitable for

both needs. As PPL is designed to be interpreted by the machine it also comes with an automated

matching engine that is resolving conflicts between DHPol and DHPref. It is a symmetric language

that requires both parties of the interaction to formulate their policies in this language.

The language offers a strong expressive nature by which complex policies can be formulated

to accommodate different use case scenarios. Provisional actions and required credentials can

be specified in order to require some authentication before authorization. Data can be kept under

the protection of the purpose of usage, which is used to constrain the actions that DCs can take

with the collected data. It also allows for users to express whether their data can be forwarded to

third parties or not, and under what conditions. More complex use cases can be modelled through

the use of obligations. Obligations are a set of actions that have to be taken when triggered by a

specific event. For example, an obligation could specify to send an acknowledgement back to the

data owner every time his shared personal data gets forwarded to a third party.

Research involving the development of the PPL [32] is also concerned about how the individu-

als fit in this new policy framework. Novel methods for human-computer interactions are required

in order to ease the task of formulating complex data protection policies for the end user, since

22

Page 41: Privacy for the Personal Data Vault Information Systems and Computer Engineering

3.5 Other Privacy Enforcement Techniques

DHPref are fully relying on the assumption that the end user is able to comprehend and formu-

late his own policy. Moreover, situation where the policy matching engine is unable to combine

a DHPol with a DHPref, require an explicit consent and interaction from the end user before the

process can continue. In order to keep the demand for human interaction low an expressive User

Interface (UI) needs to be provided.

This thesis considers the PrimeLife Policy Language (PPL) as its main tool by which privacy

guarantees are provided. However, instead of focusing on the language components of the PPL,

it targets the enforcement model that can be used together with it.

3.5 Other Privacy Enforcement Techniques

3.5.1 DRM approach

Digital Rights Management (DRM) systems are used in order to offer a protection mechanism

of distributed digital content over the web. They offer technical means, such as cryptography

and access control, to safeguard the access to protected content. To achieve this, specialized

software needs to be deployed on the machines of clients requesting access to these protected

data objects. Once the digital content is distributed to the client side, the DRM system prevents

unauthorized usage of it.

User privacy protection, just like distributed content protection, is concerned with the safe-

guarding of personal user data. The valuable resource of privacy protection is the personal data

itself. It is easy to observe the parallel between the requirements of privacy protection and dis-

tributed content protection, since they both can be seen as digital data. DRM-like solutions have

been proposed to overcome the challenges of privacy protection [20]. The client side DRM trans-

forms into a Privacy Rights Management system deployed at the Data Controller (DC). This new

component is then responsible for safeguarding private user data once it has been disclosed, by

enforcing the data protection policies applicable for the disclosed data.

It is worth mentioning that DRM systems are not bulletproof, in the sense that they fail to

offer any kind of protection once digital data has been disclosed in plain site. DRM offers only

limited amount of protection that can sometimes be overcome by technical means. A proposed

Privacy Management System would suffer from the same limitations. Moreover, the operator of

such a PRM system is required to be trusted by the users who are willing to disclose personal

information. Another consideration is that current DRM systems usually require a client-server

scenario, whereas with entities such as PDVs and interconnected service providers, we are facing

a much more distributed peer-to-peer-like structure, where roles such as DS and DC can be

applied interchangeably on a single entity depending on the context.

23

Page 42: Privacy for the Personal Data Vault Information Systems and Computer Engineering

3. Related Work

3.5.2 Trusted platform

Trust is one of the central requirements when it comes to sharing protected data between

unknown entities. The Trusted Computing Group (TCG) defines trust as “the expectation that a

device will behave in a particular manner for a specific purpose” [3]. The TCG offers a range of

technical solutions to accommodate the rising needs for secure systems.

Security is a concern on both the software and the hardware level. They propose an enhanced

hardware extension that serves as the basis of a trusted system. The Trusted Platform Mod-

ule (TPM) is a hardware component closely integrated with the motherboard that offers security

features such as: RSA key generation, cryptographic operations, integrity check. By possessing

an embedded asymmetric keypair the TPM is considered to be the root of trust for the platforms

using it. Being a hardware component it is also considered tamper resistant.

Several solutions have been proposed [22][30] for achieving privacy protection through the

use of trusted hardware, and TPM in turn, by using software attestation techniques. By using

the functionality of the TPM, the integrity of a running application can be attested dynamically.

Checking the current state of an application against an expected value can bring assurance of the

validity of the application, proving that it has not been tampered with. Privacy protection solutions

use remote software attestation to prove that a given software component is in a valid state on

the remote machine. It can, for example, provide proof that a known privacy policy enforcing

software component is in place on a remote server, which brings assurance to the end user that

his protected data is in capable hands.

3.5.3 Cryptographic techniques

Cryptographic techniques are mainly used to ensure secrecy with regard to safe storage and

transporting of sensitive information. There are initiatives researching the use of these also in the

privacy protection domain.

One of the proposed cryptographic models for privacy protection is called Type-based Proxy

Re-Encryption (PRE) [36][18]. It assumes a semi-trusted Policy Enforcement Point (PEP) with

an honest but curious nature, meaning that he is trusted to carry out user intentions, but is also

curious about the shared data for his own purposes. The PEP is trusted to hold the data encrypted

with the data owners public key together with its sticky policy. When a request arrives that asks

for the data, first an authorization is carried out against the Sticky Policy. On permit, the PEP

re-encrypts the data, such that only the recipient can see it. In this setting the PEP becomes the

proxy that performs the re-encryption. They claim that if the receiving party and the PEP are not

conspiring, it is safe to assume that the PEP is not able to decipher the protected data. It employs

the usage of asymmetric keys and assumes that key dissemination and identities are placed and

verified by a trusted third party.

They take this solution further with the type-based PRE which assumes that there are multiple

24

Page 43: Privacy for the Personal Data Vault Information Systems and Computer Engineering

3.5 Other Privacy Enforcement Techniques

proxies from which the user can choose depending on the secrecy and security that he or she

needs. The advantage of this is that if one of the proxies get compromised, there is only a partial

loss of data.

Following their vision, the PEP proxy can be the same as a semi trusted service provider, who

is responsible for distributing personal data. A web-based health-record system, for example, is

responsible for the safe storage and management of personal health records. In this simplified

scenario there can be a doctor and a pharmaceutical company both requesting a personal health

record for different purposes. Let us assume that the owner of the health record specified in his

policy that data can be forwarded to his doctor, but not to any pharmaceutical company. Since the

health-record system is only semi-trusted, the user stores his data encrypted with his public key,

and only provides a re-encryption key tied to the identity of the trusted doctor. In this scenario

the PEP of the health-record system will only be able to re-encrypt the cyphertext for the eligible

doctor. Even if he tries to examine or forward the personal health record to the pharmaceutical

company, all they will see is the cyphertext. By the encryption with the data owner’s public key, his

privacy will be protected, since neither the health-record system, nor the pharmaceutical company

will be able to decipher it.

The solution outlined above, however, is only suitable to a subset of existing use cases. It

does not take into account, for example, service providers who are processing user data in an

automated manner. This becomes impossible under the use of the PRE model, since the data is

encrypted.

Other research projects [15] investigate the potential of self-destructive data. In order to avoid

the persistence of user data in data copies, the self destructive data model offers a method to

render data unavailable after some period of time for everybody, even for the owner of the data.

Their motivation is to avoid unauthorized disclosure of information even if it means losing the

information completely. Some private data, such as private emails do not need any persistence

after they have been received and viewed.

They employ a cryptographic method called threshold-based secret sharing, where a symmet-

ric encryption key is split into multiple pieces, but can be reconstructed with a threshold amount

of key pieces. By their design, personal data gets encrypted with a randomly generated key, that

gets split into multiple pieces and scattered in pseudorandom locations on a Distributed Hash

Table (DHT). The cypthertext, together with hints about the key pieces, is then transmitted to the

recipient via some service. In order for the receiver to be able to decipher the data, he has to

recompute the shared key from its pieces. The receiver will only have to retrieve a subset of the

scattered key pieces from the DHT in order to recompute the encryption key. Once the key is

recomputed, it can access the received object. Their security model relies on the high churn rate

[34] of the DHT which makes key shares impossible to retain after a given time, either because

responsible nodes leave the system, or data gets expired and deleted. The churn rate refers to

25

Page 44: Privacy for the Personal Data Vault Information Systems and Computer Engineering

3. Related Work

the rate at which nodes enter and leave the DHT system.

3.6 Summary

This chapter focuses on the description of existing privacy enhancing techniques. The eX-

tensible Access Control Markup Language (XACML) is an accepted standard, that comes with

a descriptive resource protection language and a high level architecture. Given its flexibility, it is

employed as the basis for many privacy related work.

Usage Control (UCON) represents a vast research area that is focused on the protection

of user data through its whole lifecycle: before authorization is granted, during authorization,

and after authorization. Two main concepts introduced by it are the mutability of attributes and

continuity of access decision.

TAS3 is another initiative focused on multiple aspects of privacy protection, mainly interoper-

ability and federation. The requirements formulated in Section 1.3 are a subset of the high level

requirements defined by TAS3.

The PrimeLife project offers a privacy protection model based around the PrimeLife Policy Lan-

guage (PPL) and the Sticky Policy paradigm. Given the highly descriptive nature of the PPL,the

presented research focuses on how it can be used together with Personal Data Vaults, and what

kind of enforcement models can be built to support it.

Digital Rights Management (DRM) systems exhibit similarities with privacy protection, although

they do not completely cover every aspect of it. The Trusted Computing Group (TCG) conducted

relevant research in developing a trusted computing platform, which in turn can be used for privacy

protection. Other initiatives include cryptographic methods to protect the correct dissemination of

user data, only granting access to authorized parties.

26

Page 45: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4System Design

Contents

4.1 PrimeLife Policy Language (PPL) Integration . . . . . . . . . . . . . . . . . . . 28

4.2 Verifiable Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.3 Trusted Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.4 Mediated Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

27

Page 46: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

Chapter 4 is dedicated to describe the policy enforcement models proposed by this thesis.

The Chapter begins with an evaluation of the PrimeLife Policy Language (PPL) in Section 4.1 with

regards to integration into the PDV design. The description of three privacy enforcement model

follow, highlighting the novel solution proposed by this thesis in Section 4.4.

4.1 PrimeLife Policy Language (PPL) Integration

In order to meet the requirements in Section 1.3 we base our approaches on the existence

of a well-defined policy framework. This policy framework has to facilitate an extensible and

descriptive policy language that can easily be adapted in specialized use cases. The XACML

policy framework serves as a suitable choice, since its abstract architecture design and flexible

policy language makes it applicable in a variety of use cases. Unfortunately, however, the XACML

was designed to provide a descriptive access control mechanism, and only comes with a weak

privacy profile. The PrimeLife Policy Language (PPL) from PrimeLife, however, outlines a privacy-

oriented XACML extension, which allows for a better approach. We will evaluate how the language

feature of the PPL can fit our requirements.

Trust between two parties who are about to exchange personal information has to be estab-

lished prior to any access control decision. PPL provides two language features that have to

be fulfilled by the data requester: CredentialRequirements and ProvisionalActions. Credential-

Requirements contains a set of credentials that have to be provided by the requester to attest

a required attribute. These credentials are usually tied to a verifiable identity. By verifying ea-

chother’s credentials, both parties can assume a basic trust level. ProvisionalActions can refer

to any action that has to be carried out prior to any access decision. This can refer to signing a

statement or spending some credential (if the requested resource has a limited amount of time it

can be accessed).

Transparency of user data handling refers to the DS’s knowledge of how his personal data will

be treated by the DC. The PPL facilitates the use of the Sticky Policy paradigm through the Data

Handling Policy (DHPol) and Data Handling Preference (DHPref). The DHPol is the DC’s proposal

on how private user data will be used. The final policy, however, that provides transparency is the

sticky policy itself. Sticky Policies are created from resolving the DHPol and DHPref that refer to

the same object, and are composed of Authorizations and Obligations. Authorizations describe a

specific purpose for which a data object can be used, while Obligations can be used to express

more fine-grained control.

Authorizations also contain authorizations on downstream usage together with a purpose.

Downstream usage refers to the disclosure of personal information from the DC to Third Parties.

This language feature allows for a description on how personal data can be forwarded and used

across multiple control domains. The purpose attached to the downstream usage gives the user

28

Page 47: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.2 Verifiable Privacy

an even greater flexibility in describing under what circumstances can data be forwarded. In data

forwarding scenarios the forwarded data copy has to have a Sticky Policy at least as strict as the

original data copy, in order to avoid the degradation of the protection level.

The main language feature that offers control for the end user is the Sticky Policy itself. Control

over the usage of a specific shared private information can be achieved by the modification of the

attached Sticky Policy. This method allows modification as well as revocation of accesses from

the user. Obligations being part of Sticky Policies allow the user to set constrains on data, after it

has already been shared. One such Obligation, for example, could require the DC to delete the

collected data after a specified amount of time.

The architecture of the system outlined by the PrimeLife project requires a specialized soft-

ware, or multiple interconnected software components, that are responsible for carrying out the

feature described by the language. Moreover, it is supposed to do this in a highly automated man-

ner, working with predefined access control policies, matching DHPol with DHPref, and enforce

Sticky Policies. This ’always on’ software component can be associated with traditional access

control systems of service providers, which portrays the DC. However, this specialized software

also has to be present at the DS site, which often portrays the end user. The PDV is a suitable

data organization scheme which can integrate any kind of specialized software.

This PrimeLife architecture also shows a high resemblance with the initial XACML architec-

ture presented in Section 3.1, relying on components such as Policy Enforcement Point (PEP)

and Policy Decision Point (PDP) to carry out access decisions. The PEP component, however,

becomes a crucial building block that is responsible for evaluating and enforcing Sticky Policies.

We will refer to this specialized software, often residing on a PEP and enforcing privacy policies,

as the Privacy Manager (PM).

As the Sticky Policy is considered to be the main element of user data protection, we also

introduce the abstraction of Protected Data (PD). The PD encapsulates the user data object and

its Sticky Policy under a single unbreakable logical unit. Throughout the formalization of policy

enforcement models we will use the PD terminology when talking about a shared data object

guarded by a Sticky Policy.

The following details the design and description of the enforcement models that are applied to

provide privacy guarantees by Sticky Policy enforcement using the PM.

4.2 Verifiable Privacy

This sections presents the Verifiable Privacy (VP) policy enforcement model together with

aspects of its architecture design and its interaction model.

29

Page 48: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

4.2.1 Description

This model relies on remote software verification and monitoring solutions, hence its name:

Verifiable Privacy. This section is dedicated to describe a solution involving enhanced hardware

security. As the software systems running on the machines are becoming more complex and

stacked, keeping track of security aspects becomes increasingly difficult. Software bugs and

vulnerabilities are an unavoidable side effect of every system in production. In order to mitigate

the problem of unsecured software solution, today’s hardware components are built with strong

security aspects in mind.

The Trusted Computing Group (TCG) is a pioneer in the field of secure hardware. They offer

a range of integrated components that can help carry out certain security measures. One of their

main focus areas is the Trusted Platform Module (TPM), which is an embedded hardware com-

ponent, that provides a root trust in the system. By providing strong cryptographic functionalities

together with key generations, integrity checks, storage and reporting, the TPM provides a form

of attestation on the security measures of the software running on top of it. In more detail, the

TPM provides signed attestations of PCRs (Platform Configuration Registers), which contain in-

formation regarding the integrity, configuration and state of a software component. These signed

attestations can be verified by external parties. The TPM itself does not provide any security

solution on its own, it rather serves as a basis of trust between entities.

The Verifiable Privacy relies on a solution that harnesses the power of the security enhanced

hardware technology as an enforcement and trust mechanism. On top of this hardware a DRM-

like software solution is responsible for attesting and verifying privacy settings of sensitive data.

This DRM-like solution, referred to as the Privacy Manager (PM), intercepts all accesses to pri-

vate data from running applications, and performs local access control decisions. The correct

functioning of the TPM and the PM components is supported by another mechanism to keep the

running applications in a secure sandbox, isolated from unauthorized actions. In the next sections

we will elaborate on how these components fit together and what their responsibility is.

4.2.2 Prerequisites

Since the Verifiable Privacy is employing the power of security enhanced hardware, it is a

prerequisite, that every actor and machine involved in the transactions should be equipped with

TPM. We assume that these machines are secured from any physical tampering, rendering the

TPMs tamper-proof.

The TPM is also responsible for key generation and management for multiple purposes. It

generates asymmetric keys for both the Privacy Manager (PM) and any application running on

top of the platform. TPM is also used to verify that the public keys of these software components

are indeed bound to that specific machine. It has an internal safe storage of known keys, which

can be used to re-encrypt data depending on the requester. Apart from the keys that are meant to

30

Page 49: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.2 Verifiable Privacy

be used by software, TPMs come equipped with a root key, for which the private key is embedded

in the hardware. Encrypting data with the public counterpart of this root key will bring assurance

that every data access by any software will have to consult the TPM before, in order to release

the private information.

Moreover, the PM should also be present on each PDV and service provider, in order to facili-

tate an interface for exchanging privacy related information between the actors. This components

can be placed at different layers as we will see later on, but its main purpose remains to carry out

privacy related actions, such as: remote attestation, trust establishment, and policy enforcement.

We are also assuming the existence of a certain Trusted Third Party (TTP), which plays an

important role in the correct functioning of the monitoring and assurance system described below.

4.2.3 Architecture

This solution tries to approach the sticky policy enforcement problem by simply assuming that

every machine that is involved in handling protected user data is essentially a Policy Enforce-

ment Point (PEP). As such, it focuses on the design of a common architecture for PEPs, that

will facilitate the interoperability of the system across multiple nodes, regardless of their control

domain.

When it comes to the designing of the architecture of a single PEP, we are faced with multiple

choices that we can make. The base architecture, however, as depicted in Figure 4.1, stays the

same. As one of our prerequisites, we have the TPM equipped hardware at the bottom layer.

Figure 4.1: Verifiable Privacy: Abstract Architecture of a single PEP node

On top of the hardware layer we have an abstraction called the Common Platform. When

deciding what the common platform should be we have to take into consideration the level of

isolation that we require to provide in such a system. Applications will have to reside on their own

isolated space, such that interactions that happen outside of this isolated space can be monitored.

This restriction becomes especially important when private data objects are transmitted to third

31

Page 50: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

parties. The communication between applications and transmission of data between two separate

isolated spaces should happen with the consent and permission of the PM, who in turn enforces

Sticky Policies. In practice the Common Platform can be two things:

1. A trusted operating system could take the place of the Common Platform. The isolation

space, in this case, would be provided by the process Virtual Machines (VM) of the shared

operating system. Monitoring, in this case, would be done on the hosting operating sys-

tem, since inter process communications and external communications all go through the

operating system.

2. Another solution would be to replace the Common Platform with a hypervisor, and let stan-

dalone services run in their own system virtual machine, thus offering isolation on the operat-

ing system level. Virtualization technology is maturing really fast, sometimes even achieving

nearly native operating system speeds. Virtualization is also a commonly employed solu-

tion in cloud environments, which in turn are hosting several client oriented services on the

web. System VMs are much more heavyweight then their process VM counterparts, thus

there needs to be some planning involved when instantiating new services, not to waste

resources.

The strength of this model is also its drawback. Having applications running in their isolated

spaces with the PM attached to them assures that they are subjected to continuous monitoring,

and verification. The Verifier component makes sure that only eligible application get access to

personal data, while he Monitor keep track of ongoing system events to avoid misusage. Through

monitoring and verification the system delivers proof of trust and assurance to their users. This,

however comes at the price of a strict architecture design.

4.2.4 Privacy Manager Architecture

The Privacy Manager (PM) is the specialized software component, which is responsible for the

localized enforcement of privacy policies. Whenever a Protected Data (PD) object is requested,

either from an internal application or from an external entity, the PM is trusted to evaluate and

enforce the Sticky Policy of the respective PD. Moreover, it is also responsible for delivering trust

and assurance of its correct functioning through the Verifier and the Monitor components.

4.2.4.A Verifier

Verification is a pro-active measure that is taken prior to any data disclosure, which is at

the heart of this model. Trust that the user intentions are going to be carried out is partially

rooted in the verification system. As complex systems are built in multiple layers it is important to

provide verification from the lowest level (which is the hardware) to the highest one (which are the

applications providing a certain service).

32

Page 51: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.2 Verifiable Privacy

Hardware verification is at the bottom layer and is done by the technology developed by the

TCG. The TPM assists the software verifier in attesting that a specific software component is

indeed running on top of the host platform. The states of different applications are kept hashed

in the TPM registers, and they are signed and transmitted to any requesting party, on demand.

This way, the requester can be assured that the machine he is communicating with has a software

component running in the provided state.

The Verifier component of the PM is responsible for carrying out the TPM assisted software

verification. We distinguish two independent software components that need verification: one

being the application providing some service, and the other being the PM itself. In order to build

a trust framework, the verification of these components is carried out by different means.

The PM component has to be verified to be in a valid state, since it is the core policy enforcing

mechanism of the model. To provide assurance of a correctly functioning PM, remote software

verification is needed, where the verifier entity is independent of the verified subject. The pre-

ferred solution would be to make the communicating parties verify each other’s PMs. This would

require an open PM specification and design, such that all of its valid states are known prior to

any interaction, and are verifiable by anybody. Another alternative would be to outsource the re-

sponsibility of verification to a Trusted Third Party (TTP), which could be the developer of the PM

or any other authority. An additional TTP, however will affect the scalability and complexity of the

whole system. Further discussion on the identity of the verifier is out of scope for the purpose of

this thesis.

The verification of the application component, on the other hand, can be carried out locally to

every node by the PM. The intuition behind it is that since the PM is remotely verified, it is trusted

to carry out local verifications in a truthful manner. The Verifier component is entrusted to do a

local software verification attested by the TPM of every application that requests access to some

protected resource.

4.2.4.B Monitor

Verification on its own only gives partial assurance about the behaviour of the communication

partner. Certificates confirming the state of a remote software component could be vague or not

descriptive enough. The Monitor component will complement the Verifier by providing a reactive

monitoring service, in order to keep track of ongoing actions in the system, and notify the PEP

whenever a potentially illegal operation is encountered.

Monitoring goes hand in hand with log keeping. Logs are a powerful mechanism for reviewing

past events that serve as evidence for or against a violation. The TPM assists the monitoring

system by providing authenticity for the logs, as long as the monitoring system is trusted to do

proper book keeping.

The main reason behind executing applications in their own isolated space is that they can be

33

Page 52: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

monitored from the outside. Both process and system VMs offer solutions to monitor and intercept

system calls and translate them into native behaviour. This way the monitoring service could

attach itself to crucial system calls and monitor their execution. A store operation for example

could be evaluated before execution, in order to verify whether the sticky policy allows the data to

be stored.

Interactions often require data to be transmitted between different application. These commu-

nicating parties could either be internal or external, and both need a method of monitoring. Two

applications are internal if they both reside on the same machine, or external otherwise.

4.2.5 Interaction Models

In the following sections we examine the interaction of two separate entities, highlighting the

important parts of the protocol used for exchanging Protected Data (PD) objects. Afterwards, the

case when multiple Data Controllers are requesting the same PD is examined.

4.2.5.A Data Flow

The Privacy Manager is responsible for managing private user data that has been shared

with a remote system. Just like a DRM system, the PM treats the user data as the protected

resource and applies access control on it. Moreover, it goes beyond the standard DRM system,

by providing a fine-grained access control that looks at data accesses on per application basis.

Every application is evaluated independently before granting access to data.

Figure 4.2: Verifiable Privacy: Interaction diagram between a PDV and a Service Provider (SP)

A high level interaction digram can be seen in Figure 4.2 which follows a simple scenario with

the PDV playing the role of the DS, and the SP being the DC. The App, who is considered to be

the service running at SP, is requesting some user data from the PDV. The first two steps are part

of the communication protocol between the two actors by which they establish trust and share

protected information. The third step describes the data access by the requester App on the SP

34

Page 53: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.2 Verifiable Privacy

side, while the fourth step depicts a potential forwarding of data to an external entity. Note that an

internal access from a second application, App2, would also happen through the PM. An external

forwarding, on the other hand, will initiate another round of the communication protocol involving

steps 1 and 2, with the SP playing the role of the resource owner. In order to accommodate the

need for a system where the roles of DS and DC can be assigned to PDVs and service providers

interchangeably depending on the context, the communications protocols should be the same,

regardless of the real identity of the actors. Resulting from this is that an interaction diagram

between two PDVs or two service providers would follow the same principles.

The first step of the communication protocol establishes the trust relationship among the two

parties by doing mutual verifications on eachother’s systems. Usually the data requester (the

service provider in our case) initiates the protocol by sending a signed certificate proving the

validity of the Privacy Manager (PM) component running on his machine. This certificate is usually

signed by the TPMSP proving that the PMSP has not been tampered with and it is in a valid state.

It also contains the public key of the PMSP such that the user can encrypt sensitive data with it. A

similar certificate from the PDV is sent back to the SP as proof that PMPDV is also genuine and

it is attested by TPMPDV .

The second step in the communication protocol is the exchange of private information be-

tween the parties. In our case, the PDV is sharing some information with SP. The shared data is

encrypted with a secret key that will also be transferred under the receivers public key protection.

Moreover, the data will be bundled together with its Sticky Policy, forming a Protected Data (PD)

object. The Protected Data is kept in the secure storage of the PMSP .

After the communication protocol concluded, the copy residing on the SP side can be re-

quested by applications running on the same machine. Since data is kept encrypted by a secret

key, in order for an application to get access to it, first it needs to be re-encrypted with PubApp

by the PMSP . The PMSP only does this after the state of AppSP is verified to be valid, and in

accordance with the Sticky Policy guarding the data. Once access is granted, the App receives a

copy of the data protected by his public key.

The final step in the interaction diagram is the forwarding of shared data to a third party. This

step is part of the interaction diagram, since data forwarding happens very frequently in every

data processing system.

Whenever data is forwarded another round of the communication protocol described in steps

1 and 2 is initiated by the SP and the Third Party. In this interaction the SP will assume the

role of the data owner (DS) and the Third Party becomes the requester (DC) who initiates the

protocol. Trust is established between the parties just as before, but before the data transfer can

take place the SP has to verify that the forwarding is in accordance with the Sticky Policy. As long

as the PMSP has been verified to be in a correct state, the PM is trusted to carry out the user

preferences.

35

Page 54: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

Every forwarding action on private user data should be logged by every party, such that proof

can be provided to the original data owner that his intentions were enforced during data process-

ing. Logs should be aggregated by the original data requester and provided to the original data

owner on a periodical basis.

4.2.5.B Forwarding Chain

The Data Flow, described in Section 4.2.5.A, only specifies the interaction of two entities. The

Forwarding Chain, on the other hand, describes how data is shared across multiple parties. The

Forwarding Chain is a tree-like structure of nodes that share a copy of a Protected Data (PD)

object. The root of the Forwarding Chain is the source of the private data, such as a PDV.

Figure 4.3 illustrates how a Forwarding Chain might be built up on a single user object, which is

a Personal Health Record (PHR). A follow-up scenario of the one presented in Section 2.1.2 using

the healthcare system can be considered. The owner of the PDV shares his PHR with a Hospital

Service under the protection of a Sticky Policy. The Hospital Service is in close collaboration with

two other entities: a Pharmacy and a Research Center, so it shares collected Protected Data with

them, assuming the Sticky Policy allows it. These two entities in turn can share the Protected Data

themselves, like in the case of the Research Center publishing information on a News Service,

thus creating a chain of forwarded data. It is worth noting that every link between two entities in

this diagram actually represents an interaction based on Section 4.2.5.A.

Figure 4.3: Verifiable Privacy: Example of Forwarding Chain on Personal Health Record

Whenever Protected Data is forwarded to a Third Party we can distinguish three different

scenarios:

1. In the simplest case the Protected Data is shared as a whole, without modification. In this

case the data copy can be considered as a duplicate.

2. DCs might decide to share only a fragment of the original data, in order to promote data

minimization. The Hospital Service, for example, might decide to share only a subset of the

36

Page 55: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.3 Trusted Privacy

information residing in a PHR with its Pharmacy partner. The data fragment, however, still

has to be protected with the same Sticky Policy to assure that it will not be misused.

3. In some cases the DC might want to disclose protected data under a stronger level of pro-

tection. In our example, the Hospital Service shares PHR with the Research Center under a

stricter Sticky Policy, thus limiting the scope of usage of the data. In case of Protected Data

forwarding, Sticky Policies can always be made stricter, but can never be made weaker. This

rule ensures that the original policies set by the data owner will always be respected.

In order to maintain the Forwarding Chain structure every node is responsible of keeping rout-

ing tables with pointer to the previously disclosed data. The maintenance of up-to-date pointers

is a crucial requirement for the logging and the control system described below.

The Monitor component being part of every PM is responsible for keeping logs on every node.

Logs provide traces on the processing of every Protected Data, which can be viewed as assurance

of data protection. Given the distributed nature of the Forwarding Chain, every node holds a

fragment of the logs that are relevant for a single data piece. In our previous example, each

entity keeps logs about the data processing done on the shared PHR. In order to turn logs into

assurance they have to be aggregated and verified. Verification could be carried out locally to

every node as well, thus skipping the step of aggregation. The local solution however offers

relatively less assurance than its counterpart.

We delegate the responsibility of log aggregation and verification to a Trusted Third Party

(TTP), which can play the role of an audit company. The TTP has to collect these logs either by

direct collaboration or some other means, and perform a verification on them. A final digest is

then periodically sent to the data owner as the final form of assurance. Policy violations that show

up in the logs are also included in the digest.

In order to maintain control over already shared data objects the Forwarding Chain also assists

in manipulating and revoking accesses on Protected Data (PD). When data owners wish to update

the Sticky Policy attached to some object, he can do so by using a push method that propagates

his updates starting from the initial data requester. It is important that the pointers are kept fresh,

such that the chain is not broken. Every party who has a copy of the shared data has to update

its policy locally in case of an update, and forward the update operation to all of its children in

the chain. Every node is also responsible for collecting acknowledgements of the success of the

operation and notify the user about the process.

4.3 Trusted Privacy

This section describes the architecture and functioning of the model which closely resembles

the design outlined by the PrimeLife project [5], together with its predecessor, the PRIME project

[8]. Both are vast projects with years of research behind them. The scope of this description,

37

Page 56: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

however, is to present the main underlying model and design that these projects follow in order to

provide policy enforcement.

4.3.1 Description

Much like the Verifiable Privacy descibed in Section 4.2, the Trusted Privacy relies on the

use of a specialized software: the Privacy Manager (PM). The architecture supporting the PM

component is relaxed by employing a middleware oriented design. Apart from the basic Sticky

Policy enforcement that is guaranteed by the PM, it comes with a different view on the employed

trust framework. As its name suggests, the Trusted Privacy model relies on the correct functioning

of the trust framework, which is the composition of two independent sources of trust.

4.3.2 Prerequisites

The Trusted Privacy (TP) model assumes an active PM system on every participating actor,

both for PDVs and service providers. In order to assure full functionality these components should

be fully compatible with one another and tamper free.

For PDVs, the PM acts like a client-side protection system designed to govern every interaction

on the user’s personal data. Queries on user data are passed through the PM layer which assures

that only requesters that are found eligible can access the protected resource. Incorporating such

a component into the PDVs is a straightforward task. On the other hand, the PM system must also

be present at the service provider. This resembles a server-side system which acts as a DRM

software, protecting shared user data. The server-side PM component holds the responsibility

to act and protect on behalf of the clients, by respecting Sticky Policies. As the PM is a central

component of the model, it has to be fully trusted. Mechanism describing how this trust can be

achieved will follow.

The existence of TTPs are also assumed playing an important role in achieving the desired

trust level.

4.3.3 Architecture

The PRIME project defines two different PM components: one for the client, and one for the

server. In the original PRIME project the client and server-side components are different systems

with different responsibilities. In the PrimeLife project, however, these two blend into a single

component.

Our scenario needs to accommodate PDVs together with service providers, and multiple in-

teractions between them. This requirement leads to a need for uniformity. DS and DC are clear

abstractions of PDV and service providers. These roles are not fixed, but rather dynamically as-

sumed, based on the context. For example, if PDV1 requests some data from PDV2, it is clear that

PDV1 is the Data Controller and PDV2 is the Data Subject. However, if PDV1 decides to forward

38

Page 57: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.3 Trusted Privacy

the collected data to a service provider, PDV1 becomes the Data Subject and service provider

the Data Controller. It is easy to see that PDVs and service providers can assume both the Data

Subject and Data Controller role. The need for uniformity discouraged us from using distinct PM

components, thus the PM residing on the service provider has to have the same functionality as

the one on the PDVs.

Figure 4.4: Trusted Privacy: Abstract Architecture of a single PEP node

Conceptually, the PM sits on top of the persistence layer (or Database), as shown in Figure 4.4.

This way it takes the role of a middleware that governs the access over the database system

underneath. The example of Figure 4.4 depicts a PEP node with the installed PM middleware.

The Database system on top of the OS is entrusted with the safekeeping of stored data, and

only lets itself be queried from the layer sitting right above it, and not any of the higher layers.

This prevents the situation where Apps want to bypass the PM in order to get unrestricted access

to some Protected Data (PD). The PM is a middleware that mediates the access to PD from

the upper Application layer. Apart from safe storage and safe access to stored objects, the PM

middleware also plays the role of a monitoring filter. Since interactions between Apps and remote

or local systems is usually mediated through the OS, the convenient placement of the PM allows

it to track ongoing interactions against policy violations.

4.3.4 Privacy Manager Architecture

The Privacy Manager (PM) middleware closely resembles the PM of the Verifiable Privacy (VP)

model in its functionality, however, the mechanisms by which trust and assurance are provided

are different. The description of the Trust Negotiator and the Monitor components follows.

4.3.4.A Trust Negotiator

Although using the PM would mean that every privacy rule is enforced by the middleware itself,

our model still lacks components that provide trust in the infrastructure. The way that the Trusted

Privacy is targeting the trust framework is by outsourcing trust to TTPs with the use of privacy

seals and reputation systems.

With the introduction of a new component into the PM, called Trust Policy Negotiator, users can

evaluate the trustworthiness of an entity they are about to interact with. It gathers trust information

and compiles it in a meaningful way. If the user is actively taking part in the interaction, this trust

39

Page 58: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

information should be presented to him in an intuitive way through the user interface. On the other

hand, if the user is receiving a query, the PDV should be able to evaluate the trust level provided

by this component in an automatic manner, and carry out a decision based on that. The sources

and mechanism by which trust is evaluated are:

Privacy and Trust Seals offer assurance that the remote party will not violate the privacy

policies previously agreed upon. These seals are usually certified by TTPs. These seals provide

proof that the system run the by the a remote party lives up to certain security and privacy stan-

dards, or uses a certain software solution. For example, it can provide assurance that a service

provider uses the PM in its backend system. We can distinguish two types of trust seals:

1. Static Trust Seals are simple signed documents by the TTP, which attests the correct sate

and functioning of a system at a given moment in time. Since these static trust seals come

with a certain validity window, they need to be re-evaluated and re-issued in order to provide

up to date proof. Since today’s infrastructure is highly dynamic, these certificates might not

be up to date all the time, as new threats and vulnerabilities are surfacing in a more frequent

manner than that of re-issuing of certificates.

2. Dynamic Trust Seals are generated in real time by the machine serving the user’s request.

Dynamic Seals are only trustworthy if the process by which they are generated is also trust-

worthy. Usually these documents are generated with the assistance of tamper-proof hard-

ware which attests their validity. The Dynamic Trust Seals highly resemble the verification

certificate that is provided by the Verifiable Privacy in Section 4.2.4.A.

Through the security claims provided by Trust Seals a trust score can be evaluated for every

remote party. It is worth mentioning that a Dynamic Trust Seal, if attested in the correct way,

always provides a higher trust score than its static counterpart. The flexibility of the Trusted

Privacy model lets each individual PEP node decide what form of trust certification it is willing to

provide.

Reputation Systems are considered to be the secondary source of trust in this model. This

model assumes the existence of multiple independent reputation systems, such as customer feed-

back services or blacklist providers. Blacklist providers keep track of constant policy violators

and notify every actor who tries to initiate an interaction with them. A User Feedback system

harnesses the power of the crowd in collecting individual opinions, or experiences of previous

interactions. External reputation providers also have to be trusted to base their rating on a well

defined and relevant scale. In case of feedback from the crowd, on the other hand, the trust is

divided between anonymous users who may or may not provide correct information.

The collected scores of from the available Reputation Systems is combined in a reputations

score. The reputation score should have its own scale, independent of the scales used by the

actual sources of the score.

40

Page 59: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.3 Trusted Privacy

After the interaction with the relevant TTPs the Trust Negotiator combines the trust score

and the reputation score into a final score. Based on this final value different levels of trust can

be quantified, helping the automated decision making. The intuition behind the outsourcing of

trust to multiple sources is that many independent trust scores from independent authorities can

complement or cancel each other out, leaving the end user with a trustworthy estimate. This, of

course, only works under the assumption that TTPs are truly independent and are not conspiring

to provide a pre-agreed score.

4.3.4.B Monitor

The Monitor component integrated in the PM is built to achieve the same functionality as

described in Section 4.2.4.B. Instead of the isolated spaces, this model uses the middleware

approach to intercept and react to unauthorized operations issued by the application layer.

4.3.5 Interaction Models

The following section presents the interaction model that covers the data flow between two

remote entities, with aspects regarding establishing trust and execution paths.

4.3.5.A Data Flow

The first interaction protocol of the two parties focuses on establishing trust by the use of

the Trust Negotiator. The Trust Negotiator gathers all relevant trust and reputation scores and

computes the final score on the remote party. If the final score satisfies the predefined trust

threshold the interaction continues with the exchange of the desired protected data.

Figure 4.5: Trusted Privacy: Interaction Model of the Data Flow

The PD can take multiple paths once it has been shared to an external entity. Figure 4.5

depicts how PD is handled by the PEP of a service provider. PD is passed through the PM to

the Application Layer which carries out the service provider logic. Two usual use cases include

41

Page 60: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

storing and forwarding the processed data. Both of these operations have to pass through the PM

middleware in order to evaluate whether they are allowed to be stored or forwarded, respectively.

The evaluation is carried out based on the Sticky Policies attached to the data objects. Similarly,

PDs returned as results of a database query are also subject to evaluation. The PM only lets data

through for applications which are authorized to operate on the requested data.

4.3.5.B Forwarding Chain

Since monitoring is carried out individually at every PEP node, we are again faced with the

problem of transforming logs into assurance. Just as the logging system described in Section

4.2.5.B, the Trusted Privacy also relies on the use of the Forwarding Chain, when it comes to the

modification of Sticky Policies by end users, and log verification.

We introduce a slight deviation, however, in the way that logs are aggregated and verified

from the Forwarding Chain. We eliminate the requirement of a TTP that plays the role of an audit

company, and substitute it with a different scheme. The aggregation of logs is the responsibility

of the original data requester who is in direct contact with the PDV. In the example presented

in Section 4.3, the Hospital Service has to aggregate the PHR logs using a pull method. Every

node in the chain is responsible for forwarding the pull request to its children then returning the

gathered logs to its parent.

Verification is carried out by the PDV who is the owner of the shared data on which the logs

were provided. By providing the logs to the end users in a direct manner we intend to achieve a

higher level of assurance than that of a simple digest of an external entity. PDVs are left with the

responsibility to verify aggregated logs and alert the users first hand of a suspicious behaviour.

This offers a much finer granularity of verification of logs, since PDVs can extract any requested

information from the raw logs.

4.4 Mediated Privacy

In the upcoming sections the novel policy enforcement model proposed by this thesis work is

presented, tailored to fit the defined requirements.

4.4.1 Description

The Mediated Privacy sticky policy enforcement model makes use of a mediated space be-

tween DSs and DCs, on which shared data lives. The requirements based on the user-centric

model motivated us to design this mediated space, in order to improve awareness and control

over the disclosed personal information. The mediated space does not belong to a single control-

ling entity, instead it focuses on providing a platform where DSs and DCs can interact on equal

terms.

42

Page 61: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.4 Mediated Privacy

The idea of a mediated space can easily be captured by the concept of a Distributed Hash

Table (DHT) [34]. DHTs are decentralized overlay networks, where each node is seen as equal.

Nodes forming this overlay are responsible for maintaining a predefined keyspace, meaning that

every node is responsible for a subset of the keyspace, called the keyspace slice. New data is

entered under a key in the DHT, called the LookupKey, which is hashed in order to compute its

place on the keyspace. Its place in the keyspace determines the node which will host the data

physically.

In this model we employ the concept of the DHT as our mediated space. Users are aware

of all existing copies of their personal data throughout the system by simply maintaining a set of

LookupKeys in the DHT. Awareness about who accesses it is also improved by tracking search

queries that are targeted to a LookupKey. By holding the LookupKey for each personal data item,

users are in charge of modifying and deleting them at any given time, greatly improving control.

4.4.2 Prerequisites

One of the base prerequisites for our model is the existence of a DHT overlay network. DHTs

are widely employed distributed data stores in today’s data dominated world, since they scale

well and offer a quick lookup of O(log(N)). On the other hand, there are only a few systems that

consider it as a building block for data privacy [15]. Our design requires that both DS and DC

entities to be part of the DHT as peers in an active manner.

A follow-up assumption of the Mediated Privacy model states that data introduced in the DHT

should only be queried and distributed through the DHT itself, avoiding the trading of personal

data using outside copies. Distribution of the private data should only happen with the users’

consent. DCs who wish to distribute user data are required to do so via sharing the LookupKey,

under which the specific data can be found. Such requirements rely on the actors of the system

to obey this rule.

4.4.3 Architecture

In the upcoming sections we will present how the DHT overlay network is formed around the

DC and DS peers. Peers of the DHT, regardless of whether they are part of a PDV or a service

provider, all operate on three layers.

Figure 4.6 depicts the high level architecture of a single DHT node, based on three layers. The

bottom layer, which serves as the base for the other two layers, incorporates all the conventional

DHT functionalities. This includes the maintenance of the overlay topology, and the serving of

basic operations, such as insert and retrieve. The Privacy Manager layer, on top of the DHT

layer, is responsible for safeguarding protected data objects and trust establishment. The Logging

layer sits on top of the stack and is responsible for keeping track of every DHT event regarding

operations on private data. The following sections present in detail the functionality of every layer.

43

Page 62: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

Figure 4.6: Mediated Privacy: Architecture of a DHT node

Business Ring

The mediated space, represented by the DHT, is used to store disseminated user data. Be-

cause of this, both PDVs and service providers are part of this network. Since sharing user data

is a frequent operation, we are expecting the deployment of a large shared data structure.

The first important question to address is the rules by which a DHT is formed. The first solution

that comes to mind is to have all the actors participate in a single DHT. The largest, currently

active DHT is run on 10-25 million nodes, and in practice can scale further [37]. The performance

of operations like search, insert, or delete are bound by O(log(N)). Even though the DHT is a

highly scalable structure, using a single one will result in some drawbacks. The drawback that

we would like to point out is the requirement for uniformity. The behaviour of the DHT is said to

be uniform across all nodes, since it is a completely decentralized system. This requirement for

uniformity does not fit our requirements, since laws and regulations regarding virtual data handling

and privacy are not uniform across different regions of the world. Moreover, different regulations

can be in place on the business model level as well. Although having a single DHT would be a

simpler solution, it would introduce the problem of handling complicated legal and trust schemas.

Instead of having a single DHT, we introduce the concept of a Business Ring. We propose a

solution where Business Rings are spawned as needed around a group of services that have a

closely integrated business model. Service providers belonging to the same Business Ring are

assumed to have an existing business agreement, which ties them together. In principal, these

Business Rings can be formed around different branches of the existing industries. Competing

service providers can either agree on belonging to the same Business Ring, or start their own.

A mature Business Ring with a clear business model, however, is more likely to be targeted by

users, than a less mature one. For example the Business Ring used in case of the healthcare

scenario using PHRs presented in Section 4.3, could be formed according to Figure 4.7.

The black nodes represent PDVs while the white nodes represent the service providers. The

ring-like representation of the DHT from Figure 4.7 resembles a Chord network [35]. Every node

of the Chord Business Ring is said to be responsible for the slice of the keyspace lying between

44

Page 63: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.4 Mediated Privacy

Figure 4.7: Mediated Privacy: Business Ring formed around a healthcare scenario

himself and his predecessor in the ring. The keyspace slices of the service providers can be

seen from the arrow markings. Note that the DHT solution used in an actual implementation can

follow any kind of topology. We refrain from evaluating existing DHT solutions. Rather we try to

describe a system in which any kind of generic DHT solution can be used. For simplicity and

better understanding, however, we will keep talking about a Chord-like structure.

The business model that ties the service providers together in the Business Ring of Figure 4.7

could be the public health services provided to users. Although these service providers offer

independent services, they belong to the same logical ring, since they operate on the same set of

PHRs. Together they form a clear business model, which is used as a basic characteristic of the

Business Ring. These Business Rings can vary from business to business, depending on how

many service providers are part of it, how big the network is, or what kind of general data policies

apply for participants.

Since both PDVs and service providers have to become peers of the DHT, we will investigate

how this requirement fits into their design.

PDV peer

By their design PDVs are abstractions of ’always on’ entities that provide safe user data storage

together with safe data management. The responsibilities of a Business Ring node could easily

be incorporated as an additional component inside the PDV. Since they are always on for high

availability, the downside of high churn rates can also be alleviated. Churn stands for the rate at

which nodes enter and leave a DHT system. High churn rates forces the system to focus more on

self-maintenance, while a low churn rate guarantees a more stable system.

45

Page 64: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

Service provider peer

When it comes to our requirement to incorporate service providers as Business Ring peers,

we are faced with a more complex scenario. Given that backend systems of service providers

significantly differ from one another, it is hard to envision a generic solution. There is, however, a

common design practice that can achieve the above mentioned Business Ring design by providing

Privacy as a Service (PaaS). The responsabilities of a single service provider’s DHT peer could

be advertised like a service, which in turn can have any flexible design.

Figure 4.8: Mediated Privacy: PaaS design for the Hospital Service Business Ring node

Figure 4.8 depicts the backend system architecture of the Hospital Service with the PaaS as

one of its frontend services. Being a part of the Business Ring, the Hospital Service is required to

maintain control over his assigned keyspace slice depicted by the arrow. In its backend system,

this could be load balanced and supported by multiple machines from his internal system through

the PaaS. This design is flexible enough to be easily implemented in any backend system, while

still maintaining the functionalities of a Business Ring peer.

4.4.4 DHT Peer Layer

The bottom layer that every peer operates on is the DHT layer, which is responsible for exe-

cuting all the classical DHT related functionality (insert, retrieve, remove). Special considerations

have to be taken, however, for every remote retrieval operation, in order to avoid untraced data

copies. The local retrieval operation maintains its normal behaviour. Apart from the classical func-

tions, there are a couple of other aspects that need to be addressed, like membership, keyspace

assignment, ring size and description.

4.4.4.A The Remote Retrieval Operation

The classical remote retrieval operation of the DHT retrieves a data object belonging under

a LookupKey in two phases. In the first phase an internal search operation is executed, which

finds the host of the particular data object, depending on which node is responsible for the par-

46

Page 65: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.4 Mediated Privacy

ticular keyspace slice containing the LookupKey. After the right host is found, the second phase

establishes a direct point to point connection between the requester and the host, on which the

requested object is transmitted. This process, by its nature, creates an untraced data copy of the

requested object, the PD in our case. In order to maintain references to all existing copies of a

PD object, the retrieve operation is modified to act like a retrieve followed by an insert.

Our modified retrieve operation does not return the new data copy directly, instead it inserts

it back into the Business Ring under the keyspace slice of the requester. The first phase of the

retrieval stays unmodified, but the second one is replaced by a DHT insert operation. The key

for the insertion, called CopyLookupKey, needs to be included in the request process by the

requester. The functionality of the retrieval operation stays the same, since in both cases the

requester will have his own data copy on his local machine. The difference between the two,

however, is that our modified retrieval keeps track of the data copy via its new CopyLookupKey,

while the normal operation is not concerned with the tracking of data copies. The CopyLookupKey,

pointing to the new data copy, can then be appended to the metadata of the original PD. This

guarantees that the DS will be able to retrieve every CopyLookupKey pointing to different data

copies.

4.4.4.B Membership

A Business Ring has to be bootstrapped in the beginning, in order for other peers to join the

network. The most convenient way would be to let service providers bootstrap the DHT overlay,

and advertise their services together with a reference to the Business Ring. After the initial setup,

we have to devise strategies on how PDVs should join the network. As explained later, having a

certain amount of users in an DHT is desirable in order to enable data access tracking. Moreover,

having a large user base can also act as a social incentive in order to establish trust in a given

service. We try to distinguish several strategies:

1. PDVs who are involved with the services provided in a particular ring should be members

of that Business Ring. This strict strategy states that only PDVs who share their data in the

ring, are allowed to be part of it. Joining Business Rings as a result of a successful data

exchange should be an automated process. Leaving a network can be caused by either an

expiring date of shared data or manual intervention by the PDV owner, in case he decides

not to keep track of shared data for any longer. His previously shared data will persist to exist

in the ring, if not deleted explicitly either by the data owner, data collector, or a predefined

obligation.

2. The previous strategy assumes that peers will have enough incentive to join and use the

Business Ring. However, unpopular businesses could become buried, since nobody con-

siders them safe enough to use, without having an initial user base. To accommodate this

47

Page 66: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

case we could have a set of randomly chosen nodes from the existing PDVs, who could

join these rings. Their only duty will be to route messages and keep small shares of data,

without taking part of any other interaction. A system could function with random nodes,

since data owners need not to be part of the desired network for the system to function.

Operations on the DHT can also be done from outside the system, by executing them via a

randomly chosen node, who is a part of the system.

Since the second strategy would introduce some indirection and complexity, we argue that the

more stricter first strategy would suit our model better. One of our first initial assumptions was

that every entities’ identity is verifiable. Following from this assumption, a Business Ring can be

constructed strictly based on PDVs who are legitimate data sharers. The impact of anonymous

nodes on a Business Ring is out of scope of this thesis.

4.4.4.C Keyspace Assignment

An important consideration in the design of the system was to let the service providers decide

the keys under which user data has to be inserted. Since every service provider has his own

keyspace that he hosts locally, he is in charge of a set of keys. Whenever a DS wants to share

an object with the said service provider, he does so by inserting it under one of the keys chosen

from the service provider’s keyspace.

We also considered a random placement strategy where the PDVs are choosing a random

key, under which their data is inserted. Once the service provider retrieves the chosen random

key, he would have to issue a search on it. This scheme would introduce a performance penalty,

since a single interaction would require two DHT lookups.

To avoid this overhead we decided to put the service provider in charge of the keys where

the user objects are going to be kept. We argue that this scheme does not empower the service

provider with more trust, since they are bound to receive the same data anyway. Moreover, after

the data has been inserted, the service provider can retrieve it from their local machines, without

the need for an extra DHT lookup.

To accommodate our design decision, we also have to take a look at how the keyspace is

divided among the nodes in the Business Ring. Traditional DHT solutions strive to achieve a

uniform distribution of keys among nodes in order to load balance the system. This, however,

is not suitable for our needs, since the service providers are the real hosts of user data, while

PDVs have a different role in the system. We propose an unbalanced key distribution schema

which favours the service provider nodes. Our key distribution is represented by the arrows in

Figure 4.7. The mechanism that determined how large the keyspace associated to a service

provider can be is closely related to the trust framework of our system, and will be discussed later

on.

48

Page 67: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.4 Mediated Privacy

4.4.4.D Business Ring Size

As mentioned in Section 4.4.4.B, the size of the Business Ring plays an important role in

the operation tracking and logging system. The logging system presented later on relies on the

existence of routing nodes of the DHT, which route operations such as insert and retrieve. A

minimum predetermined DHT size (counting the PDV nodes) would be desirable to maintain,

in order to make sure that every operation will get routed at least by one random router node.

This minimum value could be computed, depending on the size of the routing tables used by the

particular DHT implementation.

A possibility solution to achieve this minimum is combining the membership strategies from

Section 4.4.4.B. Use the participants of the business model as a base, and compensate with

random nodes until the minimum desired size is met. This solution, on the other hand, would

require a centralized coordinator entity that governs the memberships.

An alternative strategy, less reliant on a centralized entity, is to start new Business Rings as

part of an already existing mature Business Ring with a stable userbase. The mature Business

Ring can serve as a nursery for the newly created one. After the new Business Ring gathers

enough momentum to build a stable userbase, it can be separated from the nursery.

4.4.4.E Business Ring Description

Every Business Ring should offer a description of the network. Nodes who join the network

should have a way to see which are the service providers that are involved in that particular ring.

Service provider nodes could self advertise their own description regarding client restrictions, and

generally applying policies.

The Business Ring description should also contain the keyspace sizes assigned for each ser-

vice provider from within that ring for trust establishing reasons. The size of the ring also has to be

a public information based on which different trust decisions can be carried out. Details related to

the keyspace size, and DHT size do not have to be precise at any given time. An estimate of these

values would be sufficient for the workings of the system. Such an estimate can be computed by

a gossip algorithm [31] that would run on piggybacked routing messages in the system, making

sure that each node has an estimate value for both the service provider’s keyspace size, and the

DHT size. However, the design of a system to provide accurate estimates is out of the scope of

this thesis work.

Additionally, one might imagine that different business models have different policies regarding

customer requirements. For example, a user could only join the ring of a bank, if he is a customer

there. All these extra policies regarding restriction from the service provider side can also be

taken into consideration.

49

Page 68: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

4.4.5 Privacy Manager Layer

The Privacy Manager Layer stands for the PM component which is responsible for the safe-

guarding of PD objects by enforcing Sticky Policies. Its main responsibility is to filter the incoming

and outgoing operations that are happening on the DHT layer. This layer acts as a guard of the

user data objects hosted at every node.

4.4.5.A Sticky Policy Enforcement

The main method of data safeguarding is sticky policy enforcement. Business rings are re-

quired to operate on PD objects, that guarantee the existence of a Sticky Policy next to some

shared data. There are two big use cases covered by Sticky Policies: local data usage and

forwarding of data.

When a DC wants to process the collected PD he just has to issue a local retrieval operation to

the Business Ring through one of his own local nodes. Before the local nodes return the desired

data, the PM evaluates the Sticky Policy against the requester’s attributes and grants or denies

access to it.

Forwarding of collected user data is following the same rules, but using a remote retrieval

operation. As stated in Section 4.4.2, entities are only allowed to externally forward LookupKeys,

and not the actual PD, since data sharing has to happen through the ring. Third parties interested

in collecting some shared data have to be part of the same Business Ring with the DS and DC.

Only then, a third party can issue a remote retrieval request for a PD object. The PM layer of

the hosting entity is responsible for evaluating Sticky Policies before the actual data transfer can

happen.

The PM layer is also in charge of the obligation engine, which makes sure all obligations

are triggered and carried out. An obligation requiring the deletion of a PD object can be easily

implemented issuing a delete operation on the DHT layer. It is worth mentioning, that the deletion

is verifiable by the DS himself, since he also holds a reference to the LookupKey of the PD. By

periodically interrogating the ring for known LookupKeys, the DS can always know which of his

previously shared object are still there, and which ones have been deleted.

4.4.5.B Trust Management

Since trust is a required component of every framework, the PM layer also offers a trust nego-

tiating mechanism for peers of the Business Ring.

The unbalanced keyspace assignment described in Section 4.4.4.C reduces the communica-

tion overhead between a DS and DC, but it can also be used as a measure of trustworthiness.

Taking the size of the keyspace slices of every service provider, we can offer a quantification,

by which a trust comparison can be carried out. The size of the assigned keyspace slice allows

a service provider to host a limited set of shared user data, depending on the size of his slice.

50

Page 69: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.4 Mediated Privacy

A keyspace slice is made out of a set of lookup keys, which can be used to host a set of PD.

The intuition behind the keyspace slice as a trust measure is that trusted service providers are

allowed to host a bigger set of PD than less trusted ones. In this way, every service provider can

be assigned a trust level based on the size of its keyspace slice.

The establishment of these trust levels is the responsibility of the entity who is in charge of as-

signing keyspace slices, since he can decide how big or small it can be. Letting service providers

claim their own slices will lead to a greedy scenario, in which case the trust measurement loses

its value. A better alternative is to involve the whole Business Ring in deciding the keyspace slice

sizes. A minimum baseline keyspace slice size can be assigned to every node, leaving them

equally at the same bottom level of trust. This minimum value can vary from use case to use

case, and its establishment is independent of this work. After the initial assignment, a consensus

algorithm can be run across the peers in order to grant some more space, or take away some

space from different entities. Since the majority of nodes are required to achieve consensus, we

can assume that if the majority of the peers are trustworthy, then the keyspace assignment is also

trustworthy. A trustworthy keyspace assignment leads to a quantitative trust measure that can be

used to categorize each service provider in its own trust level, and define the automated decision

making depending on it.

A secondary trust source can be derived from the description offered by the Business Ring

itself. With the assumed node identities in place, a list of participating service providers can

be derived from the provided Business Ring description. By looking at the individual service

providers in the list, a DS can set his custom trust level. For instance, a DS may decide not to use

the services of a Business Ring which has a government agency as its member. On the other

hand, he also might be more comfortable with sharing data in a ring that has a well known trusted

service provider as its member.

4.4.6 Logging Layer

The Logging layer is the top layer which offers a wrapper around every operation on a PD.

Being at the top, it is responsible for saving traces of every operation. Logging is an essential

mechanism that verifies the validity of the claimed actions, as well as to help maintain assur-

ance for the user, that his intentions were carried out. Our logging mechanism focuses on saving

data request traces throughout the Business Ring. The logging mechanism happens in an asyn-

chronous manner, such that the performance of the service itself is not affected. We are trying to

achieve a relaxed logging mechanism where some loss is inevitable, but not fatal.

The request tracking system leverages the already existing lookup functionality of the DHT. As

specified before, data is not meant to leave the DHT without authorization, and it is meant to be

kept under a LookupKey which is known by the DS. That being said, access to data inside the

ring is only possible via the search mechanism of the DHT.

51

Page 70: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

In order to perform any operation on data, first the node responsible for it has to be found.

Since DHTs offer high scalability, nodes cannot store references to every other node in the system.

Search solutions where nodes only keep routing tables with a restricted size are commonly used.

Because of this design, every operation first has to go through several hops, in order to get to the

actual data host. Every such routing node has valuable information regarding the identity of the

requester, as well as the key for the resource requested. Every such <Requester, ResourceKey>

pair provides useful information for identifying who has been requesting access to a certain PD.

The ResourceKey represents the LookupKey of the requested PD.

In order to have a functioning logging mechanism, we need to make sure that there are in fact

routing nodes in the system, and not just a singe node serving all requests. We have to assure

that the size of the Business Ring is large enough, as addressed in Section 4.4.4.D.

Since every node is responsible for keeping logs based on its own routed messages, the

log informations referring to a certain key ends up scattered at multiple nodes. Composing a

comprehensive log out of individual pieces of log events scattered throughout the nodes is the

next challenge. Once logging information is aggregated, we need a way to reveal it to the relevant

data owner, whose data is being kept under the referenced key.

The first intuition is to keep the log object inside the same ring in order to provide easy aggre-

gation and quick access to it. The first problem is that this might cause a cascading of logging

messages that might render the system unavailable. We could separate the logging operations

from any other operation such that logging on logging messages would be disabled. This solves

the problem, but inserts a security threat: normal messages masked as logging messages could

be sent to avoid tracing. We need an additional verification step, during which every router node

checks the validity of a log message against some predefined standard, in order to avoid masked

messages. For example, log messages can be composed of predefined fields, each field can

only take a predefined value from an existing value pool. The verification step would then check

whether each value of the log message has been chosen with respect to the predefined value

pool or not.

Another problem with it is under which key to place the <Requester, ResourceKey> log event

chunks, such that they all get aggregated. A deterministic solution is needed since, the data

owner has to figure out where the aggregated location is. Using the ResourceKey itself to keep

logs will end up taking up space from the service providers keyspace. More importantly, the

service provider would be in charge of hosting the aggregated logs, which is not desired. A hash

could be computed on the ResourceKey to compute another LogKey deterministically where the

data owner could find the aggregated logs, by computing the same hash on the LookupKey he is

about to trace. Using a deterministic hash function will place the aggregates at a random node,

depending on the overlay existing at that particular time.

The <Requester, ResourceKey> log objects should be considered as immutable objects that

52

Page 71: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.4 Mediated Privacy

can only be read, but not modified or deleted externally by a request. Log objects should be

designed as short lived objects with an expiring date, such that every node can clean up its logs

periodically. This assures that the system will not get clogged by logs.

Retrieval of aggregated logs are happening by using the pull-method. Every PDV is respon-

sible to periodically query the ring for the LogKeys under which his log information is kept. This

way, long term aggregates can be composed at the PDV site, to assure the persistence of logs.

4.4.7 Interaction Models

The following sections present the interaction models that arise with the employment of a

Business Ring. We examine separately how interactions with multiple Data Subjects and multiple

Data Controllers are handled.

4.4.7.A Data Flow

The first interaction model presented focuses on the data flow between a single DS and DC.

Figure 4.9 depicts the high level interaction diagram between the two.

Figure 4.9: Mediated Privacy: DC - DS interaction model

In the first step, the DC makes his request to the DS together with the Data Handling Policy

(DHPol) and a LookupKey, defining his intentions on data handling and the key under which the

requested data is expected. The LookupKey is a valid key in the Business Ring, residing under

the DC node’s keyspace slice. After the received request, the DS interrogated the Business Ring

for relevant details about the DC. This could include information on his keyspace size, and other

trust measures, which contribute into the reasoning in step 3. Depending on the trust level and

the predefined data policies of DS the reasoning can have two outcomes. In step 3.a, access is

granted and a PD object is created, in step 3.b it is denied. After granting access the DS issues

53

Page 72: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

an insert operation to the Business Ring in step 3.a. The insert request tries to put the PD under

the LookupKey provided by the DC. Steps 3.a.1 and 3.a.2, marked with blue arrows, represent

the internal routing steps of the DHT. Once the request reaches the DC’s node, the DS sends an

acknowledgement back to the DC with the status of the operation.

When the DC wants to access the PD for processing, he issues a request to his Business Ring

node in the form of a local retrieval operation. The request asks for the PD under LookupKey with

a specified PURPOSE. The PM layer of the DC node checks the PURPOSE attributes against

the Sticky Policy of the PD. Based on the decision of the PM layer, it can either disclose the PD

to the DC or not.

At the end of the interaction the LookupKey, which serves as a reference for the shared PD,

will be known both by the DC and DS. By the means of the mediated space, supported by the

Business Ring, both actors share a pointer to the data object, and both can operate on it.

4.4.7.B Multiple DS Interaction Model

A request coming from a DC could target multiple DSs. The single LookupKey key could

be used to host multiple data objects from different DSs, by simply appending them based on

collision. Collision appears when two objects are inserted in the DHT under the same key. In this

case, however, we lose the fine-grained control over individual data, since now multiple PDs are

mapped to a single key. The DSs involved in the transaction will all share a single LookupKey. We

argue that in order to maintain a fine-grained control over user data, and secrecy, a one to one

mapping of single PD to LookupKey is required.

To accommodate this case, instead of sending a single LookupKey, like presented in Section

4.9, the DC will send a set of available lookup keys together with its request. The DSs involved in

the request will choose a single LookupKey from the provided set. Once a key is taken by a DS it

is considered consumed, such that other DSs cannot use it. To offer a LookupKey dissemination

mechanism, different solutions can be employed.

Commonly, the role of DSs are associated with PDVs. The network of PDVs, being individual

data stores, can be seen as a social graph with PDVs as nodes, and edges based on friendships

or other connections depending on the context. In the presented case, the multiple DS interaction

model turns into a distributed query on the social graph. The DC executing the data request only

has to talk to a single PDV node, and let him forward the request to the targeted PDV group. The

PDV targeted directly by the DC’s request becomes the entry point, also known as the root, from

which the query is disseminated.

In order to achieve a one to one mapping of LookupKey to PD, we propose a key dissemination

protocol in two rounds, presented in Figure 4.10. The first round will distribute the query between

the target PDVs, and collect status messages shown by the blue arrow. Status messages flowing

back to the root after the first round will contain information on the size of the subtree for which

54

Page 73: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.4 Mediated Privacy

they are considered the root. After the first round every node will know how many keys will it

need, because they know how many nodes their subtree consists of. The second round consist

of the LookupKey dissemination starting from the root, and moving to the leaves. On each step, a

PDV takes a LookupKey from the provided set, and forwards the rest to its children, based on the

subtree count made in the previous round. During the second round, each PDV inserts the query

results into the Business Ring and replies with the status message that in turn will get forwarded

to the DC. The extra round introduced into the algorithm causes some performance overhead,

this however is necessary to achieve the required functionality.

Figure 4.10: Mediated Privacy: Key Dissemination

4.4.7.C Multiple DC Interaction Model

The symmetrical case descibed in Section 4.4.7.B, is when multiple DCs are interested in the

same PD. This is also applicable for cases, when a single data requester forwards user data to

third parties. From the point of view of the original data owner, both the data requester and the

third party are considered to be DCs.

Data sharing between multiple DCs is done through the assumption that once a PD object

is inserted under a certain LookupKey in the Business Ring, it is only retrieved and distributed

via the remote retrieval operation. Having a global pointer, such as the LookupKey, to the PD

makes it possible to disseminate the pointer itself without the PD. This will require every DC,

who receives the pointer, to execute a remote retrieval in the Business Ring. The Sticky Policy

guarding the PD is responsible for filtering the requests of multiple DCs in a pro-active manner. By

forcing DC to execute remote retrievals on the Business Ring, we ensure that every request will

leave a trace through the logging system, and that every data copy will have his own dedicated

lookup key reference.

55

Page 74: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

4.4.7.D Log Flow

Figure 4.11: Mediated Privacy: Logging

As described in Section 4.4.6, we are employing a distributed logging mechanism, where a

request event log is placed in a deterministic position inside the Business Ring, and retrieved by

the DS using a pull-method.

Figure 4.11 represents a scenario related to the PHR system presented in Section 4.7. We are

assuming that the Hospital Service executed a data request, by which he acquired a PD under the

LookupKey from a DS. The DS could be any PDV belonging to the ring. Furthermore, assume

that the Hospital Service wishes to share the collected PD with the Research Center. In order to

do so, the Hospital Service share the LookupKey under which the PD is kept.

As the Research Center only holds a pointer to the object, it is required to search for the PD

inside the Business Ring, by executing a remote retrieval. Assuming a common DHT design,

where messages are routed through intermediate nodes of the system, if the Business Ring is

large enough, the encounter of a Router Node is inevitable. The Router Node, being part of the

ring has the three layered architecture described in Section 4.6. Its DHT layer is responsible for

the actual routing of the search request, but before it can do so, the Logging layer is triggered first.

The Logging layer of the Router Node inserts an event record in the form of <ResearchCenter,

LookupKey> into the ring, under the key computed by the hash(LookupKey) function. Note that

any subsequent search for LookupKey originating from any source can be routed through any

random router node of the network. Since the hash(LookupKey) will always yield the same key,

the log messages belonging to the LookupKey will get appended under the same key by collision.

As a consequence of our logging mechanism with a deterministic hash function, there is no

need to explicitly aggregate logs. Whenever the DS wants to verify the request traces of the

shared PD under LookupKey, it simply issues a pull request for the hash(LookupKey). Multiple

shared PD objects will result on multiple pull requests targeted to different keys. By collecting the

logs, the DS can get first hand assurance derived from the traces.

Since the LookupKeys are chosen randomly by DC, the hash(LookupKey) will also result in

56

Page 75: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.4 Mediated Privacy

a random key for the log placement. This means that any node of the system can end up as

a potential host of logs for any DS. Assuming that the majority of nodes in the system are well-

behaved, we can assume that logs are kept in an orderly fashion. Even if there are some malicious

nodes, who explicitly delete logs, or are unwilling to insert trace logs, the loss of a fraction of the

overall logs is acceptable.

It should also be pointed out that the logging system is capable of recording logs of potential

unauthorized data accesses. In case the Sticky Policy attached to the PHR hosted at the Hospital

Service disallows any data forwarding, no third party should be able to access it. However, if the

Research Center and the Hospital Service conspire to exchange PHR through the ring, regardless

of its StickyPolicy, they can succeed to do so. On the other hand, the random Router Node is still

entrusted to log an entry about the request. As the Router Node is considered to be random and

independent from both service providers, it is very unlikely to be compromised. Even if there is

no proof of a policy violation, the record registered by the Router Node can raise suspicion, which

can make the service provider lose its trustworthiness.

4.4.7.E Indirect data

So far we have been investigating the direct data sharing scenarios between two clearly de-

fined entities: the DS and the DC. There is, however, another type of data, called indirect data

described in Section 2.2. When it comes to indirect data, the definition of a clear DS gets fuzzy,

since an indirect data object can have multiple subjects simultaneously. The Mediated Privacy

model, however, sketches a solution based around additional data pointers, used to keep multiple

references to indirect data objects.

Considering the example depicted in Figure 4.12 inspired by the PHR scenario described in

Section 4.7. When the DS shares its PHR with a Hospital Service the PHR gets inserted in the

Business Ring under key K2. After the Hospital Service runs some tests on the DS, it decides

to share the TestResults with him via the Business Ring, by inserting it under the key K8. The

TestResult is produced and controlled by the Hospital Service. On the other hand, the TestResult

also belongs to the DS, since he is the true subject of the test. This makes the TestResult indirect

data.

Figure 4.12: Mediated Privacy: Indirect Data

57

Page 76: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

The Mediated Privacy strives to provide a mechanism that lets the DS track and control his

indirect data, to some extent. Indirect data, such as the TestResults, will be inserted by the DC

in the same ring under one of its free keys, which is K8 in our example. In order to communicate

the knowledge to the user, the DC inserts the pointer to the indirect data, as metadata next to the

PHR. This connection offers a clear way to identify the indirect relationship between different data

objects.

The StickyPolicy2 attached to the indirect data is a derivative of the StickyPolicy1, provided by

the DS. StickyPolicy2 is created according to the forwarding rules described in Section 4.2.5.B.

This gives great flexibility for the user, in case he wishes to further share his TestResult with

anybody else, say his health consultant.

4.4.8 Prototype Implementation Details

A prototype has been implemented in order to prove the viability of our novel proposed model,

the Mediated Privacy (MP). Two separate modules have been implemented, namely: the PPLMod-

ule and the DHTModule. With the use of these two modules a minimalistic demo has been de-

veloped that simulates the interaction between two PDVs and two service providers. The projects

involving the implementation of the PDV and service provider were outside of scope for the pur-

pose of this thesis.

The PPLModule has been developed in order to facilitate the functionalities involving the

PrimeLife Policy Language (PPL). The PPL is an extension of the XACML standard proposed

by the PrimeLife project. The functionalities of the XACML language has been supported by

an existing open source implementation, called the Balana engine [1]. The XACML implemen-

tation serves as an access control engine used to carry out authorizations. The PPLModule is

concerned with the creation of the PPL elements: Data Handling Policy (DHPol), Data Handling

Preference (DHPref) and Sticky Policies, which in turn are attached to the XACML policies. For

the purpose of this prototype, the PPL elements only contain two relevant properties: the Au-

thorizationForPurpose and the AuthorizationsForDowstreamUage. The AuthorizationForPurpose

property defines the purposes under which a DC is authorized to access user data, while the

AuthorizationsForDowstreamUage property defines for what purposes the DC might disclose col-

lected user data to a Third Party. Two additional services have also been impemented into the

PPLModule, namely: the PolicyMatchingEngine and the PolicyDecisionPoint. The PolicyMatchin-

gEngine is responsible for the creating Sticky Policies by matching a DHPol with a DHPref. A

simplfied implementation of the PolicyMatchingEngine has been carried out that always prefers

the DHPref of the DS over the DHPol of the DC. In practice, this means that the resulting Sticky

Policy will be a subset of the DHPref. In a full implementation a reasoning engine is needed,

which can provide a more flexible matching. The PolicyDecisionPoint is a service which is used to

evaluate an existing StickyPolicy against an access request. Access requests are accompanied

58

Page 77: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4.5 Summary

by a AuthorizationForPurpose property, which represents the purpose of data usage. The Poli-

cyDecisionPoint is in charge of deciding whether a StickyPolicy allows data usage for a specified

purpose or not.

The DHTModule has been implemented in order to provide the DHT functionality required by

the Business Ring. We decided to use OpenChord [2] as our base DHT implementation because

of its simplicity. The DHTModule uses the Protected Data (PD) data abstraction, encapsulating

data and Sticky Policy. The insert and retrieve DHT operations have been modified in order to sup-

port the logging mechanism outlined in Section 4.4.6. Before routing operations to the responsible

Chord node, the DHTModule issues a customized insertLog operation. The insertLog operation

acts as a normal Chord insert, except that it rehashes the value of the original LookupKey before

the operation is carried out. The insertLog inserts a LogEvent into the Chord DHT. The LogEvent

object contains information about the requested LookupKey, the identity of the requester and the

nature of the operation (insert or retrieve). The LogEvent can then be retrieved by a modified

retrieve operation, called the retrieveLog operation, which rehashes the LookupKey before the

actual retrieval operation is carried out. Furthermore, a safeRetrieve operation has also been

implemented, which additionally to its LookupKey parameter also takes an AuthorizationForPur-

pose parameter, describing the purpose for which data is requested. After the hosting node of

the requested data is found, the safeRetrieve operation evaluates the AuthorizationForPurpose

against the StickyPolicy of the hosted PD. In case of an allow, the safeRetrieve returns with the

PD. In case of a deny, no data is returned.

4.5 Summary

The first contribution of this thesis work was to investigate how the PrimeLife Policy Language

(PPL) proposed by the PrimeLife project can be used to accommodate a system with Personal

Data Vaults. As the language framework by itself is not enough to provide strong guarantees

without enforcement, we proposed three alternative designs of policy enforcement frameworks.

The Verifiable Privacy (VP) comes with an strict architectural design that facilitates remote

software verification and monitoring of applications. The TPM attested software verification stands

at the basis of this model’s trust framework. We introduced the concept of a Forwarding Chain, a

platform used to maintain control over existing data copies.

The Trusted Privacy (TP) model is a relaxed version of Verifiable Privacy. Its trust framework

relies on trust outsourcing to multiple independent TTPs. A combined trust score is derived for

every entity based on static trust seals and reputation systems. The Forwarding Chain is also part

of this model with an extended responsibility of supporting log aggregation.

The Mediated Privacy (MP) enforcement model is our novel proposed solution, which envi-

sions the design of a mediated space where shared data objects live. The DHT data structure

59

Page 78: Privacy for the Personal Data Vault Information Systems and Computer Engineering

4. System Design

captures our idea of a mediated space. Herein, the creation of a Business Ring platform used

to accommodate a specific business model is proposed. Both Data Subjects and Data Controller

keep a reference of a shared data object, via its LookupKey. The Business Ring comes with its

own integrated trust framework based on keyspace slice sizes and consensus. Moreover, it also

provides request tracking and log aggregation in order to provide an assurance system.

60

Page 79: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5Evaluation and Discussion

Contents

5.1 Comparison on Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.2 Comparison on Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.3 Comparison on Trust Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.4 Comparison on Vulnerabilities and Weaknesses . . . . . . . . . . . . . . . . . 72

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

61

Page 80: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5. Evaluation and Discussion

Table 5.1: Requirements Comparison Table

OnlyPrivacyPolicy

VerifiablePrivacy

TrustedPrivacy

MediatedPrivacy

EstablishingTrust

Static PolicySoftware

Verification

Static\DynamicSeals &

ReputationSystems

Keyspace SliceSize & BusinessRing Description

TransparentUser DataHandling

Static PolicyMonitoring and

LoggingMonitoring and

LoggingLogging

Data AcrossMultipleControl

Domains

Static Policy &Manual

Permission

ForwardingChain

ForwardingChain

Business Ring

MaintainingControl

Only if DCallows

ForwardingChain update

ForwardingChain update

Direct DHToperation

Chapter 5 focuses on the evaluation of the privacy enforcement models outlined in Chapter 4

based on different criteria, namely: defined requirements, feasibility, trust model and vulnerabili-

ties. The Chapter concludes with a short discussion on the subject of privacy enforcement.

5.1 Comparison on Requirements

In this section we will go through our initial requirements formulated in Section 1.3 and evaluate

how each of the above presented models fit them. By doing a comparison per requirement, we

can observe some of the relevant tradeoffs between the systems.

Table 5.1 compiles all of the three models, together with the Privacy Policy Model, which stands

for the privacy protection system in place today. The introduction of the Privacy Policy model is

simply for comparison purposes.

5.1.1 Establishing Trust

In today’s Privacy Policy model, trust establishment is often a step that most of us tend to

disregard. Reading a lengthy and abstract description on data usage often takes too much time

and focus from the end users, delaying their access to the desired functionality. In some situations,

trust is established based on mouth-to-mouth reputation, like recommendations of friends and

colleagues. This, however, can not be considered an accurate trust measure, since it relies on

the peoples subjective perception. A more objective automated trust reasoning process is clearly

needed. The quantification of a trust level, however is not a straightforward job.

62

Page 81: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5.1 Comparison on Requirements

Both the Verifiable Privacy (VP) and the Trusted Privacy (TP) models strive to achieve trust by

proving that the overall system run by the Data Controller (DC) is trustworthy and secure. These

proofs are the result of software verification techniques. Software verification follows the idea, that

a verified software system should run and behave according to some predefined requirements.

Verification of the integrity of the software components can either be done statically or dynamically,

and both require a verifier entity, which is usually a Trusted Third Party (TTP). The TTP supporting

static verification provides a certificate proving that the DC is following some security standard.

The downside of the static verification is that certificates are only issued periodically, leaving

an open time window for vulnerabilities. Dynamic verification on the other hand provides more

assurance, since it is carried out in real time. This extra assurance, on the other hand, comes

with the need for an enhanced hardware. The verification of the dynamically issued certificate is

carried out by a TTP. While the VP focuses on a highly dynamic software verification using the

TPM enhanced hardware, the TP settles for its static counterparts.

Software verification by itself is not enough to provide extensive privacy guarantees. The TP

model has an outsourced trust model, that strives to combine multiple sources of trust into a single

score. While the VP relies heavily on a software verification scheme, the TP compensates with

a mechanism that outsources the responsibility of trust verification. By combining the indepen-

dent trust scores from different TTPs and Reputation Systems, it strives to provide an accurate

measure.

The trust frameworks of both VP and TP are relying on the existence of multiple TTPs. The

Mediated Privacy (MP) model, on the other hand, comes with a built in trust measure that is able

to provide a reliable quantification of trust. The keyspace slice size is a quantification that can be

measured by any peer of the system. Instead of focusing on proving the trustworthiness of some

software, the MP is build around the concept of a trustworthy crowd. Along with the keyspace slice

size, the userbase size is also measured through the peers of the network. The size of a DC’s

userbase is a well adopted social incentive, but is often misused as an accurate trust measure.

The keyspace slice size on the other hand, is a value established by the users of the system by

mutual agreement.

Given the different nature of the approaches for trust establishment presented by the three

models, it is hard to highlight a single model that provides a higher trust level than the other two.

We can conclude, however, that the trust establishment mechanism of the VP is most fitted when

it comes to establishing trust between two physical machines. On the other hand, the other two

solutions are focusing in providing proof of trust for a DC entity in a more broader sense.

5.1.2 Transparent User Data Handling

The only transparency provided by DCs today are formulated in the static Privacy Policy con-

structs. Privacy Policies, dictated by the DCs, only provide a one sided agreement, not considering

63

Page 82: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5. Evaluation and Discussion

Data Subject (DS) preferences. Assurance of the claims of these Privacy Policies is usually not

provided. There is a need for a two sided agreement solution where the privacy preferences of

both DCs and DSs are considered. Moreover, the new solution also has to deliver assurance to

the DS, proving that his data has been handled according to pre-agreed rules.

All three privacy protection models presented above are based on the usage of the Sticky

Policy paradigm, instead of static privacy policies. Thus transparency in this context translates to

assurance that the pre-agreed Sticky Policies have been met. The most common way of getting

assurance is by verification of logs. Logging is part of all of the three models, but are realized

in different ways. Keeping event logs is the responsibility of the Monitor component of the PM

of every machine according to the VP and the TP models. The equivalent component in the

MP model is the Logging layer present on every node of the Business Ring. The Monitor of the

VP model offers the most thorough log keeping solution, since applications are running in their

isolated spaces. The middleware approach of the TP model offers a similar monitoring solution, as

long as the application layer is not bypassing the middleware. The MP model only offers logging

functionalities on Business Ring peers, and is not concerned with the application layer, as the

previous models.

The similarity of the three models in regard to their logging mechanism can be seen by the fact

that, given the distributed nature of the problem, logs are scattered around multiple nodes. Since

logs only turn into assurance once they are verified, they first need to be collected for verification.

The aggregation of logs is done through different means in all of the models. The VP alleviates

the problem by employing an external trusted audit company, which is in charge of aggregating

the logs. The TP and the MP both provide a pull-based solution, supported by the Forwarding

Chain and the Business Ring in turn. A log-pull operation in the TP triggers a traversal of the

Forwarding Chain, the log-pull in the MP, however, only requires a DHT lookup. In both cases the

integrity of the logging platform (the Forwarding Chain and the Business Ring) plays an important

role in the functioning of the logging system. We argue that the Business Ring offers a better

platform than the Forwarding Chain, since the loss of a single node in the Forwarding Chain will

result in the loss of a complete subtree. On the other hand, the problems relating to the loss of a

node in the Business Ring can be alleviated by data replication.

We can also differentiate between the log verification mechanisms, based on where the verifi-

cation is carried out. The VP assumes the existence of a TTP, while the TP and MP both deliver

the raw logs to the DS himself. The solution employed by the VP is more coarse grained, since

the TTP only provides an assurance digest. On the other hand the solution of the TP and MP is

more fine grained, since data owners can process raw logs based on their own requirements.

In conclusion we can state that the VP and TP models offer a more localized logging solution

that focuses on the application layer. On the other hand, the solution in MP is built to support a

resilient logging system with log aggregation built in mind. Moreover, the TP and MP both offer

64

Page 83: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5.1 Comparison on Requirements

first hand log delivery mechanism, which result in a higher assurance level, than the one provided

by the log digest in the VP.

5.1.3 Data Across Multiple Control Domains

Some Privacy Policies dedicate a section to state that collected user data could potentially be

disclosed to third parties, but they usually fail to address any details related to this transaction.

Not only the DS is asked to agree on such data forwarding, but often times the identity of the

third party remains undisclosed. A better data forwarding schema is indispensable for the sake of

control.

Sticky Policies are the main tools that dictate whether data forwarding can take place, and

under what circumstances. A general rule that applies to all of the models is that data forwarded

to third parties should have the same or a more restrictive Sticky Policy than the original Protected

Data (PD). The Privacy Manager (PM) component of the DCs is entrusted to create new Sticky

Policies according to this rule and make sure that data is only forwarded together with it. On

the other hand, the data protection across multiple domains also relies on the properties of the

dissemination platform on which the data is forwarded. An undefined platform, where entities can

directly connect to one another for data sharing purposes, makes maintaining control over shared

data nearly impossible. The VP and TP models use the Forwarding Chain as their dissemination

platform, while the MP has its own novel solution represented by the Business Ring.

One of the main differences of the two platforms is that, while the Forwarding Chain is a

highly dynamical construct with ad-hoc properties, the Business Ring is defined around existing

business models. A separate Forwarding Chain is build for every single shared PD object, while

there is only a single DHT encapsulating every internal data exchange in case of the Business

Ring. Moreover, the restriction free nature of the Forwarding Chain allows any DC eligible for PD

exchange to be part of the chain, leaving the DS without any information about the identities of

the potential third parties. The Business Ring, on the other hand, requires every DC who wishes

to share PD to be part of the same ring, because of our requirement that data should only be

exchanged through the DHT itself, never externally. The Business Ring description, provided to

every DS, can contain information regarding the identity of every potential DC and third party. DSs

can assume that his shared data will not be disclosed to DC outside of the ring in case of the MP,

while in case of the VP and TP there is no restriction with regards to the identity of the DC of the

Forwarding Chain.

In conclusion, the open nature of the Forwarding Chain used by the VP and TP offers a more

flexible solution, but lacks the structured property of the MP. By its design, the Business Ring

offers a clearer data forwarding model than its counterpart.

65

Page 84: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5. Evaluation and Discussion

5.1.4 Maintaining Control

Maintaining Control, being the second aspect of privacy protection, is one of the more complex

and harder requirement. In order to provide a more thorough analysis we take a look at Table 5.2

for a categorization of control, based on the nature of the data.

5.1.4.A Direct Data

Every PD object shared explicitly by the data owner is considered direct data. DCs are left

with the choice whether to provide fronted services for manipulating direct user data, or not. This

becomes especially important when a user tries to remove some previously shared data. For

example an e-commerce site might save your shipping address long after you stopped using their

services. This becomes even more complicated in scenarios where PD has already been shared

across multiples third parties.

Both VP and TP offer the functionality of control over direct data via the Forwarding Chain

platform. The DS issues the modified PD to the original data requester, who in turn is responsible

to push the modification on through the chain. One of the downsides of this solution is that,

since every node hosts his own data copy, every modification operation initiated by a DS triggers

a cascade of requests that is flooded over the Forwarding Chain. The MP, on the other hand,

assures that the DS will possess a lookup key associated with every shared PD object. If there are

two live copies of the same PD under different DCs from within the same ring, the DS maintains

a lookup key for each of them. Modification of PD can be achieved via a simple DHT insert that

replaces the old version. The Business Ring not only keeps track of all the existing data copies

throughout the ring, but also lets the DS modify every PD separately.

The benefit of the Business Ring over the Forwarding Chain is much more obvious when we

take a look at how data deletion is handled. A delete operation on a PD might not get propagated

to every DC of the Forwarding Chain due to a broken link. In the Business Ring, however, the

remove is an already implemented and supported operation of the DHT, which can be used with

this exact purpose. DSs hold a LookupKey associated with every existing copy of a previously

shared PD object. In case they wish to remove the previously shared PD, they just have to

perform a remove operation inside the Business Ring. The physical host of a PD only carries out

the operation, once the identity of the requester is verified to be the original DS associated with

the PD. The mechanism issuing identities and providing verification is not part of this research.

5.1.4.B Indirect Data

A different fraction of data, called indirect data described in Section 2.2, is often left out of the

privacy research. Given the mashup-like structure of the web, however, indirect data is becoming

increasingly important. The only model that addresses the handling of indirect data is the MP.

Given the ambiguous nature of the indirect data, we refrain ourselves from deciding who is the

66

Page 85: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5.2 Comparison on Feasibility

Table 5.2: Detailed Comparison on Maintaining Control

MaintainingControl of

OnlinePrivacyPolicy

VerifiablePrivacy

TrustedPrivacy

MediatedPrivacy

Direct DataOnly if DC

allowsvia Forwarding

Chainvia Forwarding

Chainvia Business

Ring

Indirect Data N/A N/A N/A via Metadata

Sticky Policy N/Avia Forwarding

Chainvia Forwarding

Chainvia Business

Ring

correct DS of such an object, instead we provide a platform by which users can be aware of

indirect data. Since awareness is the first step of achieving privacy protection, we consider this

an important feature. The Business Ring increases awareness of indirect data by maintaining data

pointers in form of metadata attached to PD objects. The data pointers are simple lookup keys

pointing to the shared indirect data inside the ring. Inserting data pointers is the responsibility of

the entity who shares the indirect data, while keeping an updated view on the existence of indirect

data is the responsibility of every DS node of the ring through pulling the metadata.

5.1.4.C Sticky Policy

Some of the use cases are focusing on the manipulation of the access rights and conditions

regarding the previously shared data. Conceptually, this can be achieved by updating the Sticky

Policy of a PD with a new version. This becomes increasingly relevant in use cases where a DS

wants to revoke the access rights of a PD.

The modification of a Sticky Policy can be achieved in a similar manner of that of the direct

data, since both the Forwarding Chain and the Business Ring operate on PD objects. Just as

stated before, a potentially broken Forwarding Chain can leave behind unauthorized data copies,

while the Business Ring makes sure to provide a data pointer to every existing copy to the DS.

Thus the MP offers a finer granularity of control, since it allows the DS to modify the Sticky Policies

of individual PD copies.

5.2 Comparison on Feasibility

This section is dedicated to the analysis of the assumptions made by the proposed models, in

order to evaluate their feasibility. In this analysis we will consider criteria such as prerequisites,

67

Page 86: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5. Evaluation and Discussion

architectural constraints, and other assumptions.

The first observation that we can make after going through our models, is the difference in the

approach taken by the VP and TP from the MP. Both VP and TP focuses on a privacy enforcement

model that is carried out on a per machine basis. Resulting from this, privacy guarantees can

only be provided if the interacting parties are both operating on machines built according to a

specific design. A homogeneous setup, where every machine is designed alike with privacy

features in mind, is a highly demanding assumption. The concept of PDVs is still in its maturing

phase, thus integrating a privacy oriented design into it is still acceptable. This can hardly be

said, however, of the already existing infrastructure. This becomes especially clear when looking

into how the backend of information systems are built today. Service providers tend to have highly

optimized and distributed backend systems, designed to fit their own specific use, which differ from

one to another. Achieving homogeneity, by converting every machine into a Policy Enforcement

Point (PEP) node, implies a radical change required from service providers.

The MP, on the other hand, instead of focusing on a per machine schema, comes with a

design which resembles an additional data access layer. The requirement for homogeneity still

holds across PDVs, since they have to be active participants of the Business Ring, in order for

them to share information. Requiring homogeneity across all machines would have the same

implications as presented above. To avoid this, the MP describes a privacy enforcement model

where the privacy guarantees are enforced on a per entity basis. By this requirement every entity,

being it DS or DC, PDV or service provider, should have their dedicated set of nodes inside the

Business Ring. These dedicated machines form a data access layer, which is enhanced with

privacy enforcing techniques.

As pointed out above, all three models can be fitted into the design of the loosely defined

PDV concept. Differences arise in the strategies in which they can be incorporated into existing

infrastructure. If we abandon the requirement for homogeneity of VP and TP, we can assume a

per entity design, where each entity has its own dedicated PEP, or set of PEPs. This brings all

thee models on similar terms with regards to integration efforts. However, we argue that the MP

is formalized with an integration model in mind, through a PaaS solution, while the other models

are lacking this.

The second concern with regards to feasibility is the architectural requirements of a single PEP

node. The VP model sketches the most demanding PEP design, because apart from specialized

software, it also requires TPM enhanced hardware. In order to achieve the isolation of separate

applications running on top of it, its architecture has to follow either the system VM or the process

VM design. The TP relaxes this architecture model by replacing the TPM with external trust

sources and positioning the PM above the OS. Instead of a whole specialized software system,

the TP only requires the existence of the PM middleware.

The PEP node of the MP corresponds to a Business Ring node, and follows the three layered

68

Page 87: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5.2 Comparison on Feasibility

architecture design described in Section 4.6. Both TP and MP only require the existence of a

specialized software component in the form of a software layer. But while the software layer of

MP is a purely conceptual data access layer, the middleware of TP is a software layer that has to

fully cover the functionality of the OS, providing a privacy enhanced layer to the application layer

on top.

The architectural requirements of both VP and TP, although more strict in terms of design, are

generic enough to support any application running on top of them. The MP node, however has

the most flexible PEP structure. The strictness of the design is inversely proportional to the level

of enforcement that a single node can provide as presented in Section 5.4.

Another aspect of feasibility is the assumptions made with regards to the existence of TTPs.

The VP uses a TTP to outsource the responsibility of aggregation and verification of logs, while

the TP relies on the existence of several TTPs in order to sustain its trust framework. The MP, on

the other hand does not require a TTP, because it is capable of providing a built in log aggregation

and verification system, together with a self reliant trust framework.

Important considerations with regards to feasibility are the performance penalties that every

model introduces. We refrain from talking about any quantitative performance measurement,

since the thesis is carried out on a conceptual level. However, we can make estimates regarding

the performance of some aspects.

Operations on maintaining control over previously distributed data copies can be subject to a

rough performance evaluation. This operation can either require all data copies to be modified

or deleted, or it might even target the Sticky Policies of the data copies. The Forwarding Chain

and the Business Ring are the two subjects of comparison, since they are providing the platform

where these operations are carried out. Suppose that there are N copies of a shared PD that

need to be updated. In case of the Forwarding Chain these copies are organized in a tree-like

structure with N nodes, one for each copy. A simple traversal of the tree, which visits every data

copy, has the time complexity of O(N). In case of the Business Ring, on the other hand, data

copies are organized inside the DHT in an unpredictable manner. A single update operation in

the DHT has the time complexity of O(log(U)), where U denotes the size of the userbase (number

of participants in the Business Ring). An operation that modifies all data copies would have the

time complexity of O(N*log(U)), which is slightly worse then O(N).

The MP, however, makes up for this performance penalty in scenarios where only a single

data copy needs to be updated. In case of the Forwarding Chain, the only entry point being the

root of the dissemination tree, a Breadth First Search (BFS) or a Depth First Search (DFS) is

required in order to find the right node holding a particular data copy. The complexity of both

search algorithm yields O(N), since the whole tree needs to be traversed in the worst case. The

DHT of the Business Ring, on the other hand, only requires a single operation, since every data

copy has its own reference LookupKey. This puts its time complexity to O(log(U)). U is assumed

69

Page 88: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5. Evaluation and Discussion

to be a much larger number then N, since the size of the userbase can reach magnitudes of

thousands, while the number of existing data copies is a much smaller number. This means that

for smaller Ns the Forwarding Chain will yield better performance, but the Business Ring will keep

a constant performance of O(log(U), independent of N. Thus for larger Ns the Business Ring will

show a better performance.

In conclusion, the VP model presents the most assumptions and requirements such as TPM

enhanced hardware, strict PEP design and existence of a TTP. The TP is considered to be

a relaxed version of VP, but it still needs the presence of multiple TTPs in order to function

reliably. The MP comes with the least requirements with regards to architecture at the cost of

some performance penalty. The source of this performance penalty is the requirement for a DHT

operation on every data exchange, which involves hops of several nodes. The VP and TP assume

a point-to-point PEP without any indirection.

5.3 Comparison on Trust Models

This section provides an overview and comparison of the trust frameworks designed by the

three formulated models. Special attention is dedicated towards the way in which the source of

trust is shifting from one model to the other.

In order to provide a comprehensive evaluation of the trust models, first we introduce some

general concepts regarding trust. The PRIME research project [8] depicts a categorization of the

trust model based on layers. Each component of the layer contributes into the final trust score, by

which users decide whether to trust an entity, or not. These layers are as follows:

1. Socio-cultural layer

This layer refers to the general socio-cultural background of every activity. Cultural back-

ground, for example, can have an impact on the attitude adapted towards strangers in dif-

ferent people. While some social groups are more inwards oriented, others are not. This in

turn can influence how easily previously unknown service providers are accepted into the

trust zone.

2. Institutional layer

This layer refers to the underlying law regulations of the legal system, which provide gener-

ally structured rules. Data protection laws described in Section 2.2 belong to this layer.

3. Service Area layer

This layer refers to the difference in trust based on the branch of an existing industry. The

banking sector, for example is supposedly more trusted then the social networking sector.

4. Application layer

70

Page 89: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5.3 Comparison on Trust Models

This layer targets the trust in a particular service provider. The users perception of a direct

interaction with a service provider becomes a deciding factor of trustworthiness. Irregular

events during this interaction can degrade trust. For example, a booking website which

offers incomplete information on the booking, but asks for banking details, is unlikely to be

used by anybody.

5. Media layer

This layer refers to the communication channels via which interactions are conducted. Trust

in the medium is a strong requirement most of the times. The internet is considered such a

medium, where strong trust levels can be achieved via secure encryption.

The VP model’s trust framework is focused on remote software and log verification in order

to provide privacy guarantees. The source of trust is the verification which is carried out by the

TTPs. The user is required to completely trust these entities. This trust framework offers a design

which mostly targets the Application Layer, since verification is carried out on remote parties of

the interaction.

The TP offers a different trust framework which is composed of two components: the trust

score and the reputation score. The trust score strives to achieve similar trust guarantees of that

of the VP. TTPs are attesting the verification of the remote platforms. The reputation score can

either assume another TTP, like blacklist authorities, or rely on the crowd, like feedback systems.

This trust framework also focuses on the Application Layer, since trust is evaluated on individual

service providers. Unlike the VP, however, the trust source of TP is scattered among multiple

entities belonging to an independent crowd. The independent crowd is made up of TTPs providing

the trust and reputation score. The independence property is described in Section 4.3.4.A.

The MP takes a different approach for its trust framework, which no longer relies on remote

platform and software verification. The source of trust shifts form the TTPs to what we call a

collaborating crowd. The collaborating crowd is made up of the peers of a specific Business

Ring, which are collaborating to provide trust measures. The keyspace slice size, which is one

such trust measure defined in Section 4.4.5.B, relies on the correct collaboration of the Business

Ring members achieved through consensus. The MP can partly be seen as part of the Application

Layer of the trust categorization, since it targets trustworthiness of service providers. On the other

hand, it can also be considered as part of the Media Layer, given the construct and properties of

a Business Ring. A Business Ring is regarded as a mediated space, a channel on which different

interactions can be carried out. Of the three models, only the MP is the one which possesses this

property.

Trust can also be divided into two major groups: social trust and technological trust. The

social trust encapsulates all social aspects of trust, which are derived by humans based on the

reputation, interaction history, and other social incentives. The technological trust, on the other

71

Page 90: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5. Evaluation and Discussion

hand, is focused on the trust achieved by technical mechanism, such as tamper free hardware,

cryptographic techniques, and others.

The VP is focused on providing strong technological trust via remote software verification tech-

niques. The high degree of technological trust compensates for the missing, or weak social trust.

The details of the interacting parties become irrelevant once there is a technological assurance

attesting, that both are safe to interact with. The technological trust offered by the software verifi-

cation, however, can be broken by the exploitation of vulnerabilities and software bugs. Similarly,

the TP provides technological trust through verification, but it also proposes an improvement of

the social trust, by combination of independent trust sources.

The MP, on the other hand, is not focused on technological trust, but on social trust via the

collaborating crowd of the Business Ring. The elevated social trust level compensates for the

shortcomings of the technological trust. By trusting the majority of the crowd, the Business Ring

can provide trustworthy assurance in the form of verifiable logs.

5.4 Comparison on Vulnerabilities and Weaknesses

In this section we examine the weak points of every enforcement model, and discuss about

the possibilities of overcoming them.

5.4.1 Weaknesses of the Sticky Policy

Sticky Policies serve as the base data protection mechanism underlying each of the presented

models. As described in Section 3.4, Sticky Policies are created after combining a Data Handling

Policy (DHPol) with a Data Handling Preference (DHPref). The combining of the two may trigger a

negotiation phase, where both parties try to enforce some data handling constraints on the other.

Creating the logic behind such reasoning is not always straight forward. Imagine the following

case: the user wants his data to be deleted after 24 hours, and the service provider states that it

will delete the data, but it will also keep a backup copy. Such scenarios can result in a misleading

agreement.

Moreover, there are other cases when service providers simply refuse to give in to the user’s

data handling preferences. Imagine, again, that the user wants his data to be removed after 24

hours, but the service provider simply denies this request in his DHPol. One might jump to the

conclusion that in these cases the user will simply discontinue the interaction with the service

provider. In real world use cases, however, users are eager to get services on demand as fast

as possible. This can lead to a compromise, which the user makes, letting the service provider

decide what the actual Sticky Policy will look like.

The basic requirement of the Sticky Policy paradigm is that the protected data should always

be accompanied by its Sticky Policy. Privacy enforcement models, like the ones proposed in this

72

Page 91: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5.4 Comparison on Vulnerabilities and Weaknesses

thesis, are required to safeguard the bond between data and policy. Exploits to induce policy

separation from its data however, could lead to privacy violations, since the bare data object is no

longer associated with its protecting policy.

Moreover, exploits can exist to switch or modify Sticky Policies. A malicious DC could replace

the Sticky Policy of a shared Protected Data (PD) object, allowing him to extend the number

of operations it can do on the PD. The switched Sticky Policy can either be a forged version,

or an older deprecated version of the previous policy. The most common method to alleviate

this problem is to introduce integrity checks and provide non-repudiation. Integrity checks are

necessary to assure that the Sticky Policy attached to the data object is the right one. Non-

repudiation guarantees that the DS is the truthful creator of the Sticky Policy.

5.4.2 Malicious Data Controller (DC)

A maliciously behaving DC is the one which is consciously operated, or has been tampered

with, to bypass a restriction or ignore a required operation. A tampered DC system can be consid-

ered unsafe to interact with, because of the high potential of privacy violation. Since the technical

enforcement mechanism are no longer protecting the user’s data, malicious DCs gain unrestricted

access. Moreover, the traces of such a violation can also be disguised by the malicious DC, by re-

fusing to comply with the required logging system. Ignoring, or providing compromised logs, can

help the malicious DC mask a policy violation as a correct access. In order to prevent a malicious

DC all three models have a specialized software component, called the Privacy Manager (PM),

which prevents any misusage. On the other hand, systems might get compromised even if they

follow one of the presented enforcement models.

The weakest model, in regard with the possibility of a malicious DC is the Mediated Privacy

(MP). Without the use of a remote software verification technique, the DSs interacting with a DC

in the Business Ring can only observe its behaviour through his external actions. This gives a

large amount of flexibility and freedom to the internal application layer, which is processing shared

PD objects. Moreover, MP does not employ any of the strict monitoring schemas of the other two

models, leaving the PD vulnerable to exploits in the application layer.

The Trusted Privacy (TP) model relying on a middleware solution offers a better protection

against tampering, because of the software verification techniques employed. TTPs are required

to evaluate and issue static certificates attesting the correct state of the PM on the DC’s system.

A correctly functioning PM, however, can not guarantee the correctness of the DC. Malicious soft-

ware from the application layer could bypass the PM middleware, in order to carry out unrestricted

and unmonitored operations. The likeliness of this bypass, however, depends on the design of

the middleware itself.

The Verifiable Privacy (VP) model offers a much stronger software verification mechanism,

which provides dynamic assurance. The verification of the PM is carried out in real time at every

73

Page 92: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5. Evaluation and Discussion

operation, minimizing the likeliness of a tampered PM. On the other hand, the existence of a

malicious application running on the DC could go unnoticed, even by a fully functioning PM. First,

the malicious application has to pass the verification of the PM and get access to a PD. After

getting the PD, however, the malicious application could open an encrypted channel to an outside

machine, where it can send PD objects unrestricted. The encrypted channel, independent of

any encryption used by the VP itself, leaves the monitor blind to every data exchange with the

outside machine, making unverified data disclosure possible. In this way, the trustworthy PEP of

the DC becomes a proxy for an unverified machine, via a malicious application. Once the external

machine receives the PD, it can do any operation on it.

As presented above, none of the three models are tamper-free, leaving the possibility of a

malicious DC a threat. The efforts required to convert a correct DC into a malicious one varies

from model to model. Lacking software verification and internal monitoring the MP requires the

least effort in face of tampering. The TP and VP requires higher effort, since the bypassing of the

existing enforcement system is more troublesome, but not impossible.

5.4.3 Platform Vulnerabilities

The distributed platform, namely the Forwarding Chain and the Business Ring, on which the

presented models operate on can also have vulnerabilities. Both platforms are responsible to

maintain pointers to existing PD objects, such that the DS can exercise his control on them.

Moreover, these platform are also used by the logging system.

As explained before in Section 5.1.4, a broken link in the Forwarding Chain can cause the loss

of control over a subset of shared PDs. This can lead to both new and deprecated PDs to coexist,

introducing new security holes. Moreover, the broken link also causes the loss of a subset of the

usage logs. The missing logs can lead to weak assurance, followed by a degradation of the trust

framework.

The Business Ring platform, essentially being a decentralized structure with more flexible

properties than the Forwarding Chain, can be targeted by a botnet attack. A botnet living inside a

Business Ring can cause multiple problems, depending on the size of the botnet. A bigger sized

botnet might interfere with the consensus algorithms, rendering the trust framework biased. A

smaller botnet can cause message and log disruptions causing the system to stop functioning. A

viable solution against botnets is the use of an identity management system, where each nodes’

identity is verifiable. Moreover, smaller sized botnets could be discovered by the system itself, and

by running a consensus driven mechanism it could be eliminated.

Both vulnerabilities can cause major disturbances in the correct functioning of both platforms.

We argue, however, that the cost of a botnet attack is much higher of that of a broken link, making

it less unlikely to happen. Botnet attacks need additional resources, put to a specific use, while

the breaking of the Forwarding Chain can be caused by multiple events. One such event is the

74

Page 93: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5.5 Discussion

targeted attack to a single node of the chain. On the other hand, a simple machine failure caused

by an accident can also create a broken chain, thus the Forwarding Chain is less resilient than

the Business Ring.

5.5 Discussion

One of the main conclusions of the above evaluation section is that none of the three formu-

lated models can offer a definite privacy enforcement solution that covers every aspect of privacy

protection. Section 5.4 points out potential weaknesses in the design of the models. Moreover,

software bugs in the implementation of a particular design are inevitable, leaving an even bigger

window opened for vulnerabilities. Since technology is not enough to provide a fully satisfying

solution, alternatives must be considered.

A large subset of privacy violations are happening because of unaware or careless usage of

technology. Sharing private information online is a socially acceptable norm, often even rewarded

through means of upwards pointing thumbs. Users often do not realize the implications of such

data sharing. This is caused by the failure to recognize the world wide web for what it is: a

public domain. Example of real life privacy invasions can be observed on open streets of a city,

or public talk shows. A private investigator, for example, is allowed to track a person though the

public streets of a city. Similarly, a talk show guest is also subject to disclose private personal

information in front of the public. In both cases private information of individuals get uncovered,

but these real world examples are hardly considered privacy violations. In case of the public

streets, an individual is well aware that his behaviour in public can be subject to observation. Talk

show guests might even get some remuneration for their participation for the show, and inherently

for sharing private information. Similarities between real world and online interactions cannot be

ignored. A shift in the public consciousness towards online interactions has to take place, in order

for individuals to realize the implication of online information sharing.

The trust models depicted by the privacy enforcement models formulated can also be subject

of evaluation through real life models and interactions. Trust models that are strongly relying on

TTPs require an authoritative trust from end users. This authoritative trust in real life requires a

trust model similar of the one used for governmental agencies. Both exhibit properties of a semi

closed system which is trusted to carry out some service. Trust in these entities are highly reliant

on subjective opinions, which can range to extremes. Similarly, trust in TTP has the same social

implications.

The trust model of the Mediated Privacy (MP) model relies on the functioning of a collaborating

crowd. The collaborating crowd exhibits properties of a real life community tied together by a

common interest, which is the provided business model inside a Business Ring in our case. A

trust model which focuses on the collective power of individuals is much more appealing, than that

75

Page 94: Privacy for the Personal Data Vault Information Systems and Computer Engineering

5. Evaluation and Discussion

of an authoritative trust. Although the technical means of achieving a strong collaborative online

crowd that can provide trust are not well defined today, we argue that the MP provides valid ideas

in the domain.

5.6 Summary

Chapter 5 is dedicated to the evaluation of our proposed privacy enforcement models consid-

ering multiple different criteria, namely: initial requirements, feasibility, trust source, vulnerabilities

and weaknesses. After evaluating initial requirements, we can conclude that the VP and TP mod-

els are focusing on a stronger localized solution, with strong PEPs, while MP is more concerned

about distributed interaction models. Moreover, the data dissemination platform of the Business

Ring provides a better control model and greater resilience, than that of the Forwarding Chain.

The feasibility evaluation helped us to rule out solutions requiring homogeneity, due to integration

problems, and also points out the high requirements of the VP and TP models, making them less

feasible. The trust source evaluation offers a categorization of the proposed solutions based on

the traits of their trust framework. We can observe a shift in the trust source moving from the TTPs

of the VP and TP towards the collaborative crowd. Evaluation on vulnerabilities show that the MP

model is most easily compromised, but TP and VP also exhibit weaknesses. Finally, we conclude

the chapter with a discussion about our proposed solution in light of current real life aspects.

76

Page 95: Privacy for the Personal Data Vault Information Systems and Computer Engineering

6Conclusion

Contents

6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

77

Page 96: Privacy for the Personal Data Vault Information Systems and Computer Engineering

6. Conclusion

Chapter 6 concludes with a short summary of the thesis, highlighting contributions that made

by this work. The Chapter ends with a section on proposed future work.

6.1 Summary

The goal of this thesis project was to explore different privacy enforcement models employing

the Personal Data Vault (PDV) as the source of personal data. As the related research suggested,

we turned our attention towards privacy enhancing models, that employ the use of privacy pol-

icy languages and the sticky policy paradigm. The PrimeLife project, in particular, offered the

PrimeLife Policy Language (PPL), which formulates Sticky Policies based on a Data Handling

Policy (DHPol) of a Data Controller (DC) combined with the Data Handling Preference (DHPref)

of a Data Subject (DS). Our first contribution consisted of evaluating whether the PPL framework

fits the scenario where PDVs are widely employed. Section 4.1 evaluates the language elements

of the PPL, which were found to be appropriate as a the basis for privacy protection for the PDV.

The second contribution was targeted at the design of several policy enforcement models,

which can guarantee the correct functioning of the PPL framework. A set of requirements were

defined in Section 1.3 in order to focus the scope of the enforcement models on four highlighted

aspects of privacy concerns, namely: establishing trust between entities, providing transparency

in data handling, protecting data across multiple control domains in forwarding scenarios, and

maintaining control over previously shared data.

Herein, three different policy enforcement models are proposed, namely: two based on previ-

ous research (Verifiable Privacy (VP) and Trusted Privacy (TP)), and one novel approach called

Mediated Privacy (MP). The VP provides privacy guarantees through remote software verification

methods attested by enhanced hardware solutions. The TP offers a similar design, but a different

trust framework, which relies on the combination of independent trust sources in order to provide

a quantification of trust. The MP is our novel proposed solution for privacy enforcement, that in-

troduces the concept of a mediated space, which serves as a platform for user data exchange.

We chose the Distributed Hash Table (DHT) data structure as our mediated space because of

its completely decentralized properties. Furthermore, we proposed the concept of a Business

Ring, which is used to define a mediated space, based on a business model. Business Rings

are employed to provide data sharing and hosting between Data Subjects and Data Controllers.

References of objects saved inside the ring are kept by both the DS and the DC, making future

interactions with previously shared data possible for both parties. Together with the Business

Ring, we also propose a stand alone trust framework, which does not require the existence of a

Trusted Third Party (TTP). The proposed trust framework quantifies trust on the keyspace slice

assigned for a particular node inside the Business Ring. The keyspace slice property relies on

the collaboration of the crowd, thus providing a trust model that is based on a network of equal

78

Page 97: Privacy for the Personal Data Vault Information Systems and Computer Engineering

6.2 Future work

peers, rather than TTPs.

The third proposed contribution was to evaluate the models sketched in the second contri-

bution based on a set of different criteria. All of the three models were found to exhibit privacy

enhancing features satisfying our initial requirements. However, differences have been pointed

out between them. We found the MP the most suitable model for maintaining fine grained control

over shared data. Evaluation on feasibility focused on the integration efforts that the enforcement

models would require. The VP proved to have the most demanding assumptions. Our comparison

on trust source offers a categorization of the models based on the traits of their trust framework.

Finally, the evaluation based on vulnerabilities and weaknesses pointed out that all of the models

can be subject to exploitations. However, exploitation efforts are considerably higher in case of

the VP and TP model, than in the MP model.

The considered problem on user privacy protection comes with a large research field. Our

proposed solution only accounts for some of the privacy aspects, given the short time and limited

scope of the thesis work. Likewise, our proposed models are only speculations on a conceptual

level, focusing only on some design aspects. Our evaluation, although not being exhaustive, offers

a basis for discussion on how privacy protection might be carried out in the future.

6.2 Future work

A more extensive design and evaluation of the proposed Mediated Privacy (MP) should be

considered. Research with regards to security aspects that have not been covered in this work

can contribute for the development of the Mediated Privacy into a more complete model. The

weak guarantees in face of compromised nodes pointed out in Section 5.4 could be accounted

for with the use of technologies proposed in the other two models. Investigating whether software

verification and monitoring can become part of the MP is the main direction of the future research.

Another proposal is targeting the evaluation of the sticky policy paradigm, which serves as

a basis for every enforcement model. The main research topic would be to see how a strong

technical bond can be achieved between a sticky policy and a data object, in order to avoid

stripping policies off. Strategies can include research in the direction of digital watermarking and

cryptographic solutions to achieve the strong bond.

79

Page 98: Privacy for the Personal Data Vault Information Systems and Computer Engineering

6. Conclusion

80

Page 99: Privacy for the Personal Data Vault Information Systems and Computer Engineering

Bibliography

[1] Balana, open source xacml implementation. URL https://svn.wso2.org/repos/wso2/

trunk/commons/balana/.

[2] Open chord, open source chord implementation. URL http://open-chord.sourceforge.

net/.

[3] Trusted computing group. URL http://www.trustedcomputinggroup.org/developers/

glossary.

[4] Oasis extensible access control markup language (xacml) tc. URL https://www.

oasis-open.org/committees/tc_home.php?wg_abbrev=xacml.

[5] Primelife, 2011. URL http://primelife.ercim.eu/.

[6] Trusted architecture for securely shared services, 2011. URL www.tas3.eu.

[7] T. Ali, M. Nauman, F.-e. Hadi, and F.B. Muhaya. On usage control of multimedia content

in and through cloud computing paradigm. In Future Information Technology (FutureTech),

2010 5th International Conference on, pages 1–5, May 2010. doi: 10.1109/FUTURETECH.

2010.5482751.

[8] Christer Andersson, Jan Camenisch, Stephen Crane, Simone Fischer-Hubner, Ronald

Leenes, Siani Pearson, John Soren Pettersson, and Dieter Sommer. Trust in PRIME.

In Proceedings of the Fifth IEEE International Symposium on Signal Processing and

Information Technology, 2005., pages 552–559. IEEE, 2005.

[9] David W. Chadwick and Stijn F. Lievens. Enforcing ”sticky” security policies throughout

a distributed application. In Proceedings of the 2008 Workshop on Middleware Security,

MidSec ’08, pages 1–6, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-363-1. doi:

10.1145/1463342.1463343. URL http://doi.acm.org/10.1145/1463342.1463343.

[10] I. Ciuciu, Gang Zhao, D.W. Chadwick, Q. Reul, R. Meersman, C. Vasquez, M. Hibbert,

S. Winfield, and T. Kirkham. Ontology based interoperation for securely shared services:

Security concept matching for authorization policy interoperability. In New Technologies,

81

Page 100: Privacy for the Personal Data Vault Information Systems and Computer Engineering

Bibliography

Mobility and Security (NTMS), 2011 4th IFIP International Conference on, pages 1–5, Feb

2011. doi: 10.1109/NTMS.2011.5721052.

[11] K&L Gates Drummond Reed & Joe Johnston, Connect.Me; Scott David. The personal

network: A new trust model and business model for personal data. May 2011. URL

http://blog.connect.me/whitepaper-the-personal-network/.

[12] Council of the European Union European Parliament. Directive 95/46/ec of the euro-

pean parliament and of the council of 24 october 1995 on the protection of individu-

als with regard to the processing of personal data and on the free movement of such

data, 1995. URL http://europa.eu/legislation_summaries/information_society/

data_protection/l14012_en.htm.

[13] Council of the European Union European Parliament. Commission proposes a comprehen-

sive reform of data protection rules to increase users’ control of their data and to cut costs

for businesses, 2012. URL http://europa.eu/rapid/press-release_IP-12-46_en.htm?

locale=en.

[14] Kaniz Fatema, DavidW. Chadwick, and Stijn Lievens. A multi-privacy policy enforcement

system. In Simone Fischer-Hubner, Penny Duquenoy, Marit Hansen, Ronald Leenes, and

Ge Zhang, editors, Privacy and Identity Management for Life, volume 352 of IFIP Advances

in Information and Communication Technology, pages 297–310. Springer Berlin Heidelberg,

2011. ISBN 978-3-642-20768-6. doi: 10.1007/978-3-642-20769-3 24. URL http://dx.

doi.org/10.1007/978-3-642-20769-3_24.

[15] Roxana Geambasu, Tadayoshi Kohno, Amit Levy, and Henry M. Levy. Vanish: Increasing

data privacy with self-destructing data. In Proc. of the 18th USENIX Security Symposium,

2009.

[16] Tyrone Grandison, Srivatsava Ranjit Ganta, Uri Braun, and James H. Kaufman. Protecting

privacy while sharing medical data between regional healthcare entities. In Klaus A. Kuhn,

James R. Warren, and Tze-Yun Leong, editors, MedInfo, Studies in Health Technology and

Informatics, pages 483–487. IOS Press. ISBN 978-1-58603-774-1.

[17] Prof. Dr. Dr. h.c. Johannes A. Buchmann. Internet Privacy : Options for adequate

realisation. Springer Berlin Heidelberg, 2013. URL http://link.springer.com/book/10.

1007/978-3-642-37913-0.

[18] L. Ibraimi, Q. Tang, P. H. Hartel, and W. Jonker. Exploring type-and-identity-based proxy

re-encryption scheme to securely manage personal health records. International Journal of

Computational Models and Algorithms in Medicine (IJCMAM), 1(2):1–21, 2010. ISSN 1947-

3133.

82

Page 101: Privacy for the Personal Data Vault Information Systems and Computer Engineering

Bibliography

[19] Personal Inc. Personal data vault definitions, 2003. URL http://hub.

personaldataecosystem.org/wagn/Personal_Data_Vault.

[20] Steve Kenny and Larry Korba. Applying digital rights management systems to privacy rights

management. Computers & Security, 21(7):648 – 664, 2002. ISSN 0167-4048. doi: http://dx.

doi.org/10.1016/S0167-4048(02)01117-3. URL http://www.sciencedirect.com/science/

article/pii/S0167404802011173.

[21] T. Kirkham, S. Winfield, S. Ravet, and S. Kellomaki. The personal data store approach to

personal data security. Security & Privacy, IEEE, 11(5):12–19, Sept 2013. ISSN 1540-7993.

doi: 10.1109/MSP.2012.137.

[22] Gina Kounga and Liqun Chen. Enforcing sticky policies with tpm and virtualization. In Liqun

Chen, Moti Yung, and Liehuang Zhu, editors, Trusted Systems, volume 7222 of Lecture

Notes in Computer Science, pages 32–47. Springer Berlin Heidelberg, 2012. ISBN 978-

3-642-32297-6. doi: 10.1007/978-3-642-32298-3 3. URL http://dx.doi.org/10.1007/

978-3-642-32298-3_3.

[23] U.M. Mbanaso, G.S. Cooper, David Chadwick, and Anne Anderson. Obligations for pri-

vacy and confidentiality in distributed transactions. In MiesoK. Denko, Chi-sheng Shih,

Kuan-Ching Li, Shiao-Li Tsao, Qing-An Zeng, SooHyun Park, Young-Bae Ko, Shih-Hao

Hung, and JongHyuk Park, editors, Emerging Directions in Embedded and Ubiquitous

Computing, volume 4809 of Lecture Notes in Computer Science, pages 69–81. Springer

Berlin Heidelberg, 2007. ISBN 978-3-540-77089-3. doi: 10.1007/978-3-540-77090-9 7.

URL http://dx.doi.org/10.1007/978-3-540-77090-9_7.

[24] Ricardo Neisse, Alexander Pretschner, and Valentina Di Giacomo. A trustworthy usage

control enforcement framework. In ARES, pages 230–235. IEEE, 2011. ISBN 978-1-4577-

0979-1.

[25] AsmundAhlmann Nyre. Usage control enforcement - a survey. In AMin Tjoa, Gerald Quirch-

mayr, Ilsun You, and Lida Xu, editors, Availability, Reliability and Security for Business,

Enterprise and Health Information Systems, volume 6908 of Lecture Notes in Computer

Science, pages 38–49. Springer Berlin Heidelberg, 2011. ISBN 978-3-642-23299-2. doi:

10.1007/978-3-642-23300-5 4. URL http://dx.doi.org/10.1007/978-3-642-23300-5_4.

[26] Information Commissioner’s Office. Key definitions of the data protection act. URL http:

//ico.org.uk/for_organisations/data_protection/the_guide/key_definitions.

[27] Jaehong Park and Ravi Sandhu. Towards usage control models: Beyond traditional access

control. In Proceedings of the Seventh ACM Symposium on Access Control Models and

83

Page 102: Privacy for the Personal Data Vault Information Systems and Computer Engineering

Bibliography

Technologies, SACMAT ’02, pages 57–64, New York, NY, USA, 2002. ACM. ISBN 1-58113-

496-7. doi: 10.1145/507711.507722. URL http://doi.acm.org/10.1145/507711.507722.

[28] S. Pearson and Marco Casassa Mont. Sticky policies: An approach for managing privacy

across multiple parties. Computer, 44(9):60–68, Sept 2011. ISSN 0018-9162. doi: 10.1109/

MC.2011.225.

[29] Markus Sabadello. Startup technology report. 2012. doi: http://pde.cc/2012/08/str201201/.

[30] Ravi Sandhu and Xinwen Zhang. Peer-to-peer access control architecture using trusted

computing technology. In Proceedings of the Tenth ACM Symposium on Access Control

Models and Technologies, SACMAT ’05, pages 147–158, New York, NY, USA, 2005. ACM.

ISBN 1-59593-045-0. doi: 10.1145/1063979.1064005. URL http://doi.acm.org/10.1145/

1063979.1064005.

[31] D. Shah. Gossip Algorithms. Foundations and trends in networking. Now Publishers, 2009.

ISBN 9781601982360. URL http://books.google.ee/books?id=EVBoyrxHp_wC.

[32] Jenny Nilsson (Karlstad University) Simone Fischer-Hubner (Karlstad Univer-

sity). Trust and assurance control – ui prototypes. PrimeLife, 2009. URL

http://primelife.ercim.eu/images/stories/deliverables/d4.2.1-trust_and_

assurance_ui_prototypes-public.pdf.

[33] Dave Raggett (W3C) Slim Trabelsi (SAP), Gregory Neven (IBM). Report on design and

implementation. PrimeLife, 2011. URL http://primelife.ercim.eu/images/stories/

deliverables/d5.3.4-report_on_design_and_implementation-public.pdf.

[34] R. Steinmetz and K. Wehrle. Peer-to-Peer Systems and Applications. Lecture Notes in Com-

puter Science / Information Systems and Applications, incl. Internet/Web, and HCI. Springer,

2005. ISBN 9783540291923. URL http://books.google.ee/books?id=A8CLZ1FB4qoC.

[35] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord:

A scalable peer-to-peer lookup service for Internet applications. In Proceedings of the ACM

SIGCOMM ’01 Conference, San Diego, California, August 2001.

[36] Q. Tang. On using encryption techniques to enhance sticky policies enforcement. Techni-

cal Report TR-CTIT-08-64, Centre for Telematics and Information Technology University of

Twente, Enschede, 2008.

[37] Liang Wang and J. Kangasharju. Measuring large-scale distributed systems: case of bit-

torrent mainline dht. In Peer-to-Peer Computing (P2P), 2013 IEEE Thirteenth International

Conference on, pages 1–10, Sept 2013. doi: 10.1109/P2P.2013.6688697.

84