Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av...
Transcript of Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av...
10.09.2015
1
COMBINING NON-SENSITIVE DATA REVEALING SENSITIVE INFORMATION A customer driven student project at NTNU
Seminar at AFIN (Avdeling for forvaltningsinformatikk), UIO,
10. September 2015
2
ACANDO IS ONE OF THE LEADING NORDIC MANAGEMENT AND IT CONSULTING COMPANIES
2014
• More than 1850 employees in 19 locations in eight countries
• Four areas of business:
• IT Management Consulting
• Management Consulting
• Enterprise Solutions
• Digital Consulting & Solutions
1990 2000
2010 1981 Frontec
grunnlegges
1982 Resco
grunnlegges
1999 Acando stiftes av
Custos, Kinnevik, Orkla & Bank of
America
2003
Sammenslåing av Acando og
Frontec
2006 Sammenslåing
med Resco og navnbytte til Acando
2006 Oppkjøp av IQ
Consultancy og dannelse av Acando UK
2007 Oppkjøp av
Abeo A/S og dannelsen av Acando Norge
Acando in Norway:
2014 Oppkjøp av
Connecta i Sverige
2014 Oppkjøp av
e-vita i Norge
NORWAY
SWEDEN
FINLAND
LATVIA
INDIA
WE HAVE A STRONG
PRESENCE IN NORTHERN
EUROPE
Stockholm Ha
mbu
rg
Falun
Riga
Kristiansand
Västerås
München
Po
ri
Ålesund
Bangalore
Malmö
Ludvi
ka
Göte
borg
Stuttgart
Dü
sseld
orf
Helsinki Vantaa
Os
lo
Frankfurt am Main
Trondheim Braunschweig
Indien
Johan H. Gedde-Dahl Rådgiver [email protected] Mobile: +47 97 13 86 46
Acando Norge Tordenskioldsgate 8-10 0160 Oslo +47 93 00 10 00 www.acando.no
Follow Acando Facebook Twitter Linkedin
4
Johan is an experienced Enterprise architect with more than 25 years of
experience from SpareBank 1, Nordea, Forsvaret and NETS. He has expertise in
transforming business strategies to a working enterprise architecture, Master
Data Management and IT infrastructure. He has more than 10 years of
experience as project leader.
Johan has a real commitment and the power to succeed. He has extensive
experience with complex decisions models and negotiations from large IT-
contracts.
Johan is part of a strategic focus area in Acando involving Semantics, Analytics
and Big/Smart Data.
10.09.2015
2
SOME AREAS OF INTEREST:
SEMANTISKE LØSNINGER
Får dataene dine tilstrekkelig oppmerksomhet?
FORVALTNING AV DATASikre tilgang på riktig, forståtte data
med kjent kvalitet på riktig sted og til
rett tid
DOKUMENTASJON AV DATAOversikt over og riktig tolkede data
KOMMUNIKASJONEnhetlig og felles språk i verbal
og digital samhandling.
Begrepsforvaltning.
GJENFINNING, ANALYSE & PREDIKSJONFinne «nåla i høystakken».
Kunnskapsbasert beslutningsgrunnlag.
5
SEMANTISKE LØSNINGER
Begrepsforvaltning
− Leverer begrepsforvaltningsløsning (Skattedirektoratet)
− Metode og begrepsarbeid (innhold) for flere etater
Samarbeidspartner på løsning Knowledge Integration
Informasjonsarkitektur – fra forretningsnivå til løsninger
Informasjonsforvaltning, inkl. arkitekturforvaltning
− Forvaltning og styring, inkl. tilgjengeliggjøring av data
− Metadata
− Begreper
− Datakvalitet
− Informasjonssikkerhet
Analyse & prediksjoner («Big data»)
− grunnlag med eller uten kjent/predefinert mening
− strukturert, ustrukturert og/eller semistrukturert informasjon
Samarbeidspartner med ulike aktører basert på behov/kunde«case»
Systemutvikling inkl. «semantic web»-løsninger
7
Kundesegment for satsningene er
både offentlig og privat sektor.
Løsningsstrategi er behovsbasert og
ikke låst til en «teknologistack».
SOME AREAS OF INTEREST:
SEMANTISKE LØSNINGER
Får dataene dine tilstrekkelig oppmerksomhet?
FORVALTNING AV DATASikre tilgang på riktig, forståtte data
med kjent kvalitet på riktig sted og til
rett tid
DOKUMENTASJON AV DATAOversikt over og riktig tolkede data
KOMMUNIKASJONEnhetlig og felles språk i verbal
og digital samhandling.
Begrepsforvaltning.
GJENFINNING, ANALYSE & PREDIKSJONFinne «nåla i høystakken».
Kunnskapsbasert beslutningsgrunnlag.
6
STUDENT DRIVEN PROJECT AT NTNU
7
12 CHOSEN PROJECTS FOR TDT4290 – AUTUMN OF 2015 - OUT OF 48 APPLICATIONS
8
TDT4290 KPRO 2015 - project groups, customers and supervisorsGroup 1 - room 054 (floor 0 IT-building)Akuttmottaket, St. Olav: Decision support tool for managing non-cooperative patients. Contact: Florentin Moser
Group 2 - room 454 (floor 4 IT-building)Oracle: GIS Query Tool. Contact: Norvald H. Ryeng
Group 3 - room 454 (floor 4 IT-building)iMaxfocus/Olympiatoppen: iMaxfocus Olympic. Contact: Sten Nilsen
Group 4 - room 454 (floor 4 IT-building)DataDrivenFinance AS: Data-Driven Claims Handling. Contact: Jan-Martin Hunderi
Group 5 - room 354 (floor 3 IT-building)Netlight AS: CrowdShelf. Contact: Peder Kongelf
Group 6 - room 242 (floor 2 IT-building)Nidaros Domkirkes Restaureringsarbeider: Cultural digital experiences in Nidaros Cathedral and The archbishops Palace. Contact: Inge Sørgård
Group 7 - room Applab (floor 0 IT-building)Seniornett Trondheim: Support for Senior Citizen participation in the information Society. Contact: Arne Sølvberg
Group 8 - room Oasen (floor 1 IT-buildingThales Norway AS: Automated XMPP conformance testing. Contact: Christian Tellefsen
Group 9 - room 260 (floor 2 IT-buildingAcando AS: Creating sensitive data from open sources. Contact: Lars Bjørnar Listhaug, Hjørdis Hoff, Johan H. Gedde-Dahl, John Arne Øye
Group 10 - room Oasen (floor 3 IT-building)Sintef Fisheries & Aquaculture: Virtual Reality 3D-Visualization. Contact: Jørgen Haavind Jensen
Group 11 - room Oasen (floor 4 IT-building)UNINETT, NTNU, Dokkhuset: User-friendly management of distributed networked musical and dance collaborations. Contact: Otto J Wittner, Leif Arne Rønningen
Gruppe 12 - room Oasen (floor 2 IT-builing)Syklistenes Landsforening: CountMe – Mobile app for counting syclists and pedestrians. Contact: Richard Sanders
10.09.2015
3
PURPOSE OF PROJECT – INTEGRATING NON-SENSITIVE DATA TO REVEAL SENSITIVE INFORMATION
Background:
More and more Governmental offices, Companies and private persons are issuing data to the public
that in nature are not sensitive.
• Open data is available through service interfaces and as data sets made available through web sites like
DIFI: http://data.norge.no/.
• When data is made available to “anyone” as Open Data, the data owner has little or no control of how
the data is used.
• Public data can be misused.
Goal of project:
• Verify the hypothesis is that integrated datasets and analysis based on combined data may
result in sensitive and/or “unwanted”/revealing information.
• How much (and what) open, non-sensitive data can be combined before it changes character from being
data to be sensitive information.
• Can open, non-sensitive semantically enriched data become sensitive when combined / integrated with
other non-sensitive data and analyzed?
9
SOME MORE HYPOTHESES…
• Individuals in small communities are more easily identifiable than people in larger towns
• Celebrities and well known individuals are more easily identifiable in large datasets than
ordinary people
• You need less than 4 different sources to identify individuals
• Compliance with the intention of “PersonOpplysningsLoven” (POL) is only met if you look at
only one source
• The definition of “sensitive” information is not sufficient. It needs a context.
• Other???
10
RESOURCES AVAILABLE TO THE PROJECT - IN AND FROM OTHER ORGANIZATIONS
Resources available to the project in and from other organizations
• Brønnøysund Register Centre is an example of a large governmental office that has large amounts of
information concerning a wide range of public and private life. Some of it is sensitive and protected
and some is open and public, like data from the register for business enterprises / Enhetsregister.
− Data from Brønnøysundregistrene is available from the last ten years on an FTP-server ready to use
− An update has been made available 7. September including final information from 2014
• IBM has made available a portion of their "Bluemix" sky-service ready to use
− Including the analysis platform Watson
• Norge.no
− A site with a lot of public information
• Blogs
• Other sources on the net
− … be creative ….
11
WHAT IS SENSITIVE INFORMATION?
According to Norwegian law (personopplysningsloven):
§ 2.Definisjoner, avsnitt 8
- sensitive personopplysninger er opplysninger om:
a) rasemessig eller etnisk bakgrunn, eller politisk,
filosofisk eller religiøs oppfatning,
b) at en person har vært mistenkt, siktet, tiltalt eller
dømt for en straffbar handling,
c) helseforhold,
d) seksuelle forhold,
e) medlemskap i fagforeninger
12
• Other possible view points on revealing
«unwanted» truths about someone:
− Privileged or proprietary information which, if
compromised through alteration, corruption, loss,
misuse, or unauthorized disclosure, could cause
serious harm to the organization owning it.
− Sensitive information is defined as information
that is protected against unwarranted disclosure
− Information that, if revealed, gives a
person/party an advantage over an individual
− knowledge that might result in loss of an
advantage or level of security if disclosed to
others
All economic aspects, relationships and friends, preferences, hobbies, etc are all non-
sensitive according to Norwegian law…..
10.09.2015
4
WHAT'S GOING ON IN THE MARKET
• Governmental authorities are being measured by making data available
− Kartverket
− Brønnøysundregistrene
− And all the others…
• The term sensitive information is not well defined in the context of Internet
− Is being (mis)used
− Our experience: Governmental regulations are not sufficient when combining different data sets
− example: "Personopplysningsloven"
− If person A can extract information about person B, giving person A an advantage:
− What kind of information is this and how should this information be exposed?
• Apparently there is no common holistic governmental plan for the information provided by
governmental authorities
13
STAKEHOLDERS AND INCREASED FOCUS IN THE MARKET
• Who does this topic concern?
− Everyone - this should concern each and every one of us…
• Who can get an advantage out of information made available by governmental authorities?
• Data owners
− Can the Data owner be liable for making data available without validating the potential usage?
14
PROJECT DELIVERABLES AND DATES
15
November 18th: Final presentation day.
Recipe :
• 7 student
• 350 hours each
• One interesting topic
= Interesting findings!!!
16
10.09.2015
5
COMPETENT We are experts within our
chosen industries and offerings
COMPREHENSIVE Our offering covers all aspects
within management and it consulting
CAPACITY We are one of the largest
consultancy firms in the Nordics
WHY ACANDO?
CLOSE We work close to our clients and
Have a strong presence in northern Europe
CULTURE We value the Nordic business culture and its heritage and strongly believe
that our clients, business partners and employees thrive/excel in an environment where work is not work but work is fun
18
COMBINING NON-SENSITIVE DATA REVEALING SENSITIVE INFORMATION A student project at NTNU
25. August 2015
Additional slides
EXAMPLES
20
10.09.2015
6
BEGREPER
21
WHAT IS A CONCEPT
• A concept is a unit of thought
• Corresponds to an object
• An object can be anything
• Concepts can have a definition
• Usually have a name (term)
• Optional alternative names
• Each term can have translations – one per language
• May be assosiated with symbols
PROPERTIES Example:
EXAMPLES: CONCEPTS FROM LÅNEKASSEN
23
Multilingual concepts.
English concepts does not seem to be available on the Lånekassen web site.
HVORFOR ER ET KJENT OG VELDEFINERT SPRÅK VIKTIG?
Eksempler på hva et veldefinert og kjent språk kan bidra til
kommunikasjonen blir mer effektiv fordi vi
snakker samme språk – lik benevnelse på samme ting
samme budskap formidles og tolkes likt hver gang
kan gir tilgang til definisjonene bak ordene vi bruker - åpenhet
støtter flere språk
færre misforståelser og feil beslutninger
enhetlige metadata om digitalt innhold (strukturert, semistrukturert, ustrukturert)
kortere vei til aktuell kunnskap – treffsikre søk og navigasjon
Knytte aktuelle temaer/interesser til roller og/eller brukere
24
http://www.vg.no/nyheter/innenriks/norsk-politikk/byraakratspraak-koster-300-millioner-i-aaret/a/23402276/
10.09.2015
7
Snik: ” person som skaffer seg fordeler
ved å smiske og holde seg frampå” (Språkrådet)
My free translation: A person that aquires advatages by bootlicking and being pushy
Correct message:
Noen ganger er det ikke nødvendig å vente på oss. Vi
hjelper så fort vi kan, men iblant kan du raskt og enkelt
hjelpe deg selv. På nett.
My translation:
You don’t always have to wait for us. We will help you as
fast as we can, but sometimes you can quickly and easy
help yourself. Online.
COMMUNICATION IS DIFFICULT! From NAV (Norwegian Labour and Welfare Administration)
(Snik=to sneak or a «sneaker»)
EXAMPLES OF CONCEPT USAGE
26
Home page Digital schema /
Electronic dialog
Digital information
- look up
- «tagging»
Common dictionary
Data Data exchange Search and
discovery
Analysis and prediction
Common language
Enterprise
architecture
DOCUMENTED DATA
METADATA
27
20060701
Data type Date
Format YYYYMMDD
Label Creation date
Class Business organisation /
Entity
Entity identifier 990 983 291
Entity name NAV ICT
Registration date 20110912
10.09.2015
8
WHEN CORRECT DATA IS NOT ENOUGH
29
NEEDLE IN THE HEYSTACK
30
ANALYSIS, SEARCH AND DISCOVERY
31
FIND KNOWLEDGE/PATTERNS IN DATA SETS WHERE DATASETS MAY BE INTEGRATED
32
10.09.2015
9
SEARCH AND ANALYSIS: UNCOVER INFORMATION
33