Addressing Exploitability of Smart City Data

15
Addressing exploitability of Smart City data 1 Enrico Daga, Mathieu d’Aquin, Alessandro Adamou, Enrico Motta Data Science Group Knowledge Media Ins8tute, The Open University Milton Keynes (UK) Feedback: @enridaga @datasciencegr #kmiou September 13th, 2016 Trento (Italy) IEEE Interna)onal Smart Ci)es Conference (ISC2) hNp://events.unitn.it/en/isc22016

Transcript of Addressing Exploitability of Smart City Data

Page 1: Addressing Exploitability of Smart City Data

Addressing exploitability of Smart City data

1

Enrico Daga,  Mathieu d’Aquin,  Alessandro Adamou,  Enrico Motta  Data  Science  Group  Knowledge  Media  Ins8tute,  The  Open  University  Milton  Keynes  (UK)  

Feedback:  @enridaga  @datasciencegr  #kmiou

September  13th,  2016  -­‐  Trento  (Italy)  IEEE  Interna)onal  Smart  Ci)es  Conference  (ISC2)  hNp://events.unitn.it/en/isc2-­‐2016  

Page 2: Addressing Exploitability of Smart City Data

2

Smart  Bins  to  make  garbage  collec2on  more  efficient

Monitor  parking  spaces  to  support  ci2zens’  mobility

Observe  busyness  of  places  to  be=er  tune  services

Forecast  car  accidents  to  improve  drivers’  awareness

MK:Smart  is  an  integrated  innova8on  and  support  programme  leveraging  large-­‐scale  city  data  to  drive  growth  in  Milton  Keynes  (UK)  [1].

Smart City data

hNps://datahub.mksmart.org

Delivery

Onboarding

Processing

Acquisi8onData  Hub

It is a loop!

Feedback:  @enridaga  @datasciencegr  #kmiou

Page 3: Addressing Exploitability of Smart City Data

Top MK!

3

Top  MK  is  a  virtual  card  playing  game  where  each  card  represents  a  ward  in  Milton  Keynes,  with  characteris8cs  such  as  area,  popula)on,  level  of  qualifica)ons,  etc.    Two  players,  one  human  and  the  other  automa8c,  try  to  win  the  other’s  cards  by  choosing  the  characteris8c  that  has  the  best  chance  to  win  against  the  other  card.

hNps://data.beta.mksmart.org/apps/topmk/

Feedback:  @enridaga  @datasciencegr  #kmiou

Page 4: Addressing Exploitability of Smart City Data

The problem of exploitability

• Data come from different owners and have different licenses.• Data are processed into new data before being reused.• What are the policies that apply to the output data?• Can we make use of it in a commercial setting?

4

Could Top Trumps sell this game?

Feedback:  @enridaga  @datasciencegr  #kmiou

"Data exploitability" is the assessment of the policies associated with the data resulting from the computation of diverse datasets in complex data flows.

Page 5: Addressing Exploitability of Smart City Data

Under the hood - 1/5

The  En)ty-­‐Centric  API  (ECApi)  offers  an  en8ty  based  access  point  to  the  informa8on  offered  by  the  Data  Hub  [2].

5hNps://data.mksmart.org/en8ty/ward/newport_pagnell_north

{ "global:religion": [{ "global:sikh": ["16"], "global:no_religion": [“2323”], ... }], "global:maritalStatus": [{ "global:in_a_registered_same-sex_civil_partnership": ["11"], "global:married": ["3290"], ... }], "global:economicActivity": [{ "global:unemployed:_never_worked": ["15"], "global:unemployed:_age_50_to_74": ["33"], "global:in_employment": ["3785"], "global:unemployed:_age_16_to_24": ["48"], "global:long-term_unemployed": ["49"], ... }], "global:percentInBasicSkills": [{ "global:literacy_level_1": ["47.41344196"], "global:literacy_level_2": ["46.23217923"], "global:numeracy_level_1_2.5percentci": ["18.13034623"], "global:numeracy_level_1": ["32.38289206"], ... }], "global:peopleInAgeGroups": [{ "global:age_85_to_89": ["152"], "global:age_20_to_24": ["393"], ... }],

"global:qualifications": [{ "global:full-time_students:_age_18_to_74:_economically_inactive": ["61"], "global:highest_level_of_qualification:_level_4_qualifications_and_above": ["1413"], "global:highest_level_of_qualification:_level_1_qualifications": ["1042"], "global:highest_level_of_qualification:_level_3_qualifications": ["794"], "global:highest_level_of_qualification:_level_2_qualifications": ["1050"], "global:full-time_students:_age_18_to_74:_economically_active:_unemployed": ["17"], "global:highest_level_of_qualification:_apprenticeship": ["327"], "global:highest_level_of_qualification:_other_qualifications": ["271"], "global:full-time_students:_age_18_to_74:_economically_active:_in_employment": ["84"], "global:no_qualifications": ["1167"], "global:schoolchildren_and_full-time_students:_age_18_and_over": ["163"], "global:schoolchildren_and_full-time_students:_age_16_to_17": ["165"], "global:all_usual_residents_aged_16_and_over": ["6064"] }](Some logic here)

Entity-Centric API (ECApi)

Page 6: Addressing Exploitability of Smart City Data

6

The  data  hub  offers  a  provenance  access  point  including  the  metadata  of  the  datasets,  including  ownership  and  licenses.

{

"dataset": "urn:census/ks501-qualification", "description": {

"global:owner": ["Milton Keynes Council"], "global:title": ["Census 2011 - Qualifications in Milton Keynes' wards"], "global:uuid": ["3f6c6107-835c-45ee-b8b4-83c2099b4084"], "global:issued": ["2015-10-12 19:18:36"], "global:distribution": ["http://data.mksmart.org/entity/thing/www:uri/datahub.mksmart.org/ns/distribution/3527333636"], "global:modified": ["2016-09-06 12:03:14"], "global:type": ["http://data.mksmart.org/entity/thing/www:uri/www.w3.org/ns/dcat#Dataset"], "global:format": ["CSV"], "global:landingPage": ["http://data.mksmart.org/entity/thing/www:uri/https://datahub.mksmart.org/dataset/census-2011-qualifications-in-milton-keynes-wards/"], "global:homepage": ["https://datahub.mksmart.org/dataset/census-2011-qualifications-in-milton-keynes-wards/"], "global:name": ["census-2011-qualifications-in-milton-keynes-wards"], "global:attribution": [""],

"global:policy": ["http://data.mksmart.org/entity/policy/open-government-license"], "@id": "urn:census/ks501-qualification", "global:api": ["https://datahub.mksmart.org/data-catalogue-api/?action=dataset&name=census-2011-qualifications-in-milton-keynes-wards"] },

"attributes": [ "global:qualifications/global:all_usual_residents_aged_16_and_over", "global:qualifications/global:full-time_students:_age_18_to_74:_economically_active:_in_employment", "global:qualifications/global:full-time_students:_age_18_to_74:_economically_active:_unemployed", "global:qualifications/global:full-time_students:_age_18_to_74:_economically_inactive", … ] },

hNps://data.mksmart.org/en8ty/ward/newport_pagnell_north.prov

“global:qualifications” attributes come from the "Census 2011 -

Qualifications in Milton Keynes' wards” dataset, distributed under the Open

Government License.

Under the hood - 2/5Provenance

Page 7: Addressing Exploitability of Smart City Data

7

{ "global:type": ["http://data.mksmart.org/entity/thing/www:uri/datahub.mksmart.org/ns/schema/RedistributionPolicy"], "global:landingPage": [ "http://data.mksmart.org/entity/thing/www:uri/https://datahub.mksmart.org/policy/open-government-license/", "http://data.mksmart.org/entity/thing/www:uri/https://datahub.beta.mksmart.org/policy/open-government-license/" ], "global:description": [""], "global:title": ["Open Government License"], "global:homepage": [ "https://datahub.beta.mksmart.org/policy/open-government-license/", "https://datahub.mksmart.org/policy/open-government-license/" ], "global:name": ["open-government-license"], "global:api": [ "https://datahub.mksmart.org/data-catalogue-api/?action=policy&id=open-government-license", "https://datahub.beta.mksmart.org/data-catalogue-api/?action=policy&id=open-government-license" ],

"global:permission": [ "http://data.mksmart.org/entity/thing/www:uri/permission:publish-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:redistribute-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:use-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:copy-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:reproduce-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:combine-1441", "http://data.mksmart.org/entity/thing/www:uri/

permission:commercialize-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:adapt-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:transmit-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:extract-1441", "http://data.mksmart.org/entity/thing/www:uri/permission:derive-1441" ] }

hNp://data.mksmart.org/en8ty/policy/open-­‐government-­‐license

Licenses  are  described  as  machine  readable  policies:  permissions,  prohibi8ons  or  du8es  [3].

Good news, this is OGL, it can be used in commercial applications.

Under the hood - 3/5License

Page 8: Addressing Exploitability of Smart City Data

8

Under the hood - 4/5Data flowData  flows  can  be  represented  with  the  Datanode  ontology  [4]  as  graphs  of  data  “nodes”.

(The logic here) http://purl.org/datanode/ns/http://purl.org/datanode/docs/

This is the semantics behind the code!

Page 9: Addressing Exploitability of Smart City Data

9

Under the hood - 5/5Reasoning on Policy PropagationMachine  readable  policies  and  data  flows  allow  us  to  reason  on  policy  propaga8on  exploi8ng  Policy  Propaga)on  Rules  (PPR)  [5].

hNps://github.com/enridaga/pprreasoner/

These are the policies of the output data!

has(output, duty:attribution) has(output, permission:commercialise)

has(X,P) ⋀ propagates(P,R) ⋀ relation(R,X,Y) → has(Y,P)

propagates(permission:commercialise,processed into)

has(dataset1,permission:commercialise) has(dataset1,duty:attribution)

relation(node23,node16,processed into)

Provenance and License

Data flow

Policy Propagation Rule

Propagated policies

Rule engine

Page 10: Addressing Exploitability of Smart City Data

Yes.(but they must include attribution statements)

10

The problem of exploitability (reprise)

Could Top Trumps sell this game?

How can we make it work at scale?

• Represent diversity of datasets, licenses and data flows• Support developers in the assessment of policies associated with the

data and how they affect their data flows

Page 11: Addressing Exploitability of Smart City Data

11

Data cataloguing as the backbone of data governance.Follow the journey of the data and trace the semantics, respecting the diversity datasets, licenses and data flows.

Metadata Supply Chain - 1/2Approach

Delivery

Processing

Record

Content

Data  flow

Proven

ance

(Meta)data  Catalogue

Acquisi)on

Onboarding

Onboarding  Setup  a  catalogue  record  of  the  data  source

Acquisi)on  Extract  content  metadata  (8meliness,  validity,  …)

Processing  Describe  the  Data  flow  Reason  on  policy  propaga8on

Delivery  Provide  provenance  informa8on

Feedback:  @enridaga  @datasciencegr  #kmiou

Page 12: Addressing Exploitability of Smart City Data

12

•Data  provider  specifies  a  single  License  •Same  License  for  any  user  •License  is  described  in  the  catalogue  •License  policies  are  referenced  by  Policy  Propaga8on  Rules

•Data  source  is  accessible  •Acquisi8on  processes  respect  the  data  source  License

•Data  flows  can  be  described  with  Datanode  •ETL  pipelines  do  not  violate  the  policies  •Process  execu)ons  do  not  influence  policies  propaga)on

•Data  flow  descrip8ons  and  License  policies  enable  reasoning  on  policy  propaga8on  •End-­‐user  access  methods  provides  provenance  informa8on

Evaluation (can we really do that?)

An end-to-end solution for exploitability assessment can be implemented.

Metadata Supply Chain - 2/2

Considering  a  given  set  of  assump8ons  (details  in  the  paper…):

Page 13: Addressing Exploitability of Smart City Data

Lessons learnt

13

• Assessing exploitability of smart city data is possible following a holistic approach to data cataloguing:• understanding the semantics of data flows;• understanding the role of policies (licences).

• New open challenges:• Handle the diversity of policies and consequently the size of Policy

Propagation Rules [3].• Support Data providers in the selection of the right license [6].• Support developers in the definition of data flows [7].• Integrate validation of propagated policies [8].• Integrate validation of data flows with respect to policies.• Reasoning with process execution traces (not only at design time).

• We need an end-user evaluation “in the wild”.

Page 14: Addressing Exploitability of Smart City Data

14

Thank you

@[email protected]

hNps://dsg.kmi.open.ac.uk/data-­‐exploitability-­‐how-­‐to-­‐achieve-­‐it/

Page 15: Addressing Exploitability of Smart City Data

References[1] M. d’Aquin, J. Davies, and E. Motta. Smart cities’ data: Challenges and opportunities for semantic technologies. Internet Computing, IEEE, 19(6):66–70, 2015.

[2] A. Adamou and M. d’Aquin. On requirements for federated data integration as a compilation process. In Proceedings of 2nd International Workshop on Dataset PROFIling and fEderated Search for Linked Data (PRO- FILES)., pages 75–80, 2015.

[3] Open Digital Rights Language (ODRL) Version 2.1 https://www.w3.org/ns/odrl/2/ODRL21 (accessed 09/09/2016)

[4] E. Daga, M. d’Aquin, A. Gangemi, and E. Motta. Describing semantic web applications through relations between data nodes. Technical Report kmi-14-05, Knowl- edge Media Institute, The Open University, Walton Hall, Milton Keynes, 2014.

[5] E. Daga, M. d’Aquin, A. Gangemi, and E. Motta. Propagation of policies in rich data flows. In Proceedings of the 8th International Conference on Knowledge Capture, page 5. ACM, 2015.

[6] Daga, Enrico ; d'Aquin, Mathieu ; Motta, Enrico and Gangemi, Aldo (2015). A Bottom-Up Approach for Licences Classification and Selection. In: 2015 Workshop on Legal Domain And Semantic Web Applications (LeDA-SWAn 2015), 1 June 2015, Portoroz, Slovenia.

[7] E. Daga, M. d.Aquin, A. Gangemi and E. Motta: An incremental learning method to support the annotation of workflows with data-to-data relations. 20th International Conference on Knowledge Engineering and Knowledge Management. Bologna, Italy, 19-23 November 2016 - ACCEPTED

[8] H.-P. Lam and G. Governatori. The Making of SPINdle. In A. Paschke, G. Governatori, and J. Hall, editors, Proc. RuleML’09, pp. 315–322. Springer-Verlag, 2009

15