Friday lunchtime lecture: Open data - the dark side

download Friday lunchtime lecture: Open data - the dark side

of 31

Transcript of Friday lunchtime lecture: Open data - the dark side

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    1/31

    broadsight 1

    Open Datathe Dark SideAlan Patrick

    @freecloud

    January 2014

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    2/31

    broadsight Copyright Broadsight Ltd 2

    (Dis)Contents

    Original Sins

    Whose Data is it anyway?

    Open Data from a hackers point of view

    Spear Phishing, and other things Bad Guys will do

    The Politics of Data

    Some Solutions

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    3/31

    broadsight Copyright Broadsight Ltd 3

    (Dis)Claimer

    Open Data usage is like any new technology applied to our livesitcan be used for good or ill.

    History shows us that in the early days of any new online

    technologys life, over optimism about benefits is always rife

    History also shows us that the Dark Side is nearly alwaysunderestimated

    My aim today is to show that the Dark Side of Open Data is real,serious, and under-estimated - and could cause a major backlash

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    4/31

    broadsight Copyright Broadsight Ltd 4

    (Dis)Course

    Those who cannot remember the past, are doomed to repeat it(George Santayana)

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    5/31

    broadsight Copyright Broadsight Ltd 5

    (Dis)Missed

    History is a pack of l ies about events that never happened told bypeople who weren' t there

    (George Santayana)

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    6/31

    broadsight Copyright Broadsight Ltd 6

    The Original Sin of the Internet

    The Original Sin of the Internet was to assume al l the Bad Guys

    would be on the Outside

    the possibility that we may do bad things with computer code was simply not

    considered.Thus, from the very beginning, the world of computing and the Internet wasbased on imperfections, flaws and sometimes poorly understood processes

    (Cybercrime & warfare, Warren & Streeter)

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    7/31broadsight Copyright Broadsight Ltd 7

    The Original Sin of Open Data?

    There is a worry ing assumption th at Open Data wi l l on ly be used

    by wel l intent ion ed people to del iver helpful services

    the possibility that we may do bad things with Open Datawas simply

    not consideredThus, from the very beginning, the world of Open Dataover the Internetwas based on imperfections, flaws and sometimes poorly understoodprocesses

    (Open Data crime & warfare, Broadsight Review, 2020)

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    8/31

    broadsight Copyright Broadsight Ltd 8

    imperfections, flaws and sometimes poorly understoodprocesses

    A realistic look at Open Data

    ProvenanceMuch Open Data is taken from sources far removed inpurpose, context and time to its eventual re-use. Few [data] were createdwith open public usage in mind.

    PracticesIn order to use data accurately, one needs to understand thepractices that created that data

    Proprietyuse of the data can destroy public trust as it is removed from

    the shared social experience it originated in

    ProcessesSubstantial problems for use cannot be avoided.(TBD)

    (Center for Technology in Government, SUNY, Albany, 2012)

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    9/31

    broadsight Copyright Broadsight Ltd 9

    Whose data is it, anyway?

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    10/31

    broadsight

    How to guarantee losing the good will of all your data suppliers:

    Go over the heads of the data suppliers - take peoples very private dataand try and open it up without asking them first

    Argue that the ends justify the means without showing any understanding ofthe asymmetric risks your data suppliers are facing with the means

    Dissemble about the commercial arrangements, and constraints to controlor penalise malpractice

    Finally give in and consult people only when many campaigning groups aremobilising

    Copyright Broadsight Ltd 10

    Wider sharing of medical data has large benefitsbut it also has large risksand glossing over that loses trust

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    11/31

    broadsight Copyright Broadsight Ltd 11

    If you scan the Social Media, there is now a high degree ofscepticism

    What do most people think is going to happen?

    The benefits will be private, the losses public.

    Records will not be responsibly used or carefully looked after by privatecompanieswill they really put privacy before profit?

    People will not be compensated for collateral damage from any dataleakage, abuse or errors

    A large majority of people say they are going to opt out.

    Many still believe that even opted out data will be sold, stolen oraccidentally lost on a train, or dumped onto the internet

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    12/31

    broadsight Copyright Broadsight Ltd 12

    A Hackers point of view

    History tells us any potential goldmine will be mined.

    Triangulation of Open Data sources

    Buy other data for triangulation

    Open Data has arrived together with Big Computing

    Which side are all the sharpest knives on?

    Its a read/write game.

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    13/31

    broadsight Copyright Broadsight Ltd 13

    The Eternal Triangulation - data finds data, and then it finds you

    How a graduate student de-anonymised anonymised health data from theMassachusetts GIC data in 1997:

    Governor Weld resided in Cambridge, Massachusetts, a city of 54,000residents and seven ZIP codes.

    $20 bought the complete voter rolls of Cambridge, Mass. - a databasecontaining, among other things, the name, address, ZIP code, birth date, andsex of every voter.

    Only six people in Cambridge shared his birth date, only three of them men,and of them, only he lived in his ZIP code.

    In a theatrical flourish, Dr. Sweeney sent the Governors health records (whichincluded diagnoses and prescriptions) to his office.

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    14/31

    broadsight Copyright Broadsight Ltd 14

    The Eternal Triangulationis eternal

    In 2000, Dr Sweeney showed th at 87 percent o f al l Am ericans cou ld b e

    un iquely ident i f ied us ing o nly th ree bits o f inform at ion: ZIP code, bir thd ate,

    and s ex.

    Little has changed.if anything, its worse now

    .this anonymization process is an illusion. Precisely because there are now somany different public datasets to cross-reference, any set of records with a non-trivial amount of information on someones actions has a good chance ofmatching identifiable public records.

    (Pete Warden, OReilly Strata, 2011, quoting Arvind Narayanan,professor ofcomputer science at Princeton. )

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    15/31

    broadsight Copyright Broadsight Ltd 15

    Spear Phishing and other things Bad Guys do

    People with bad intentions are going to send you incredibly attractiveoffers

    (Jeff Jason, Chief Scientist, IBM Entity Analysis)

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    16/31

    broadsight Copyright Broadsight Ltd 16

    Spear Phishing and other things Bad Guys do

    People with bad intentions are going to send you incredibly attractiveoffers

    (Jeff Jason, Chief Scientist, IBM Entity Analysis)

    They are going to triangulate you from various data sources and send you

    very believable scripts based on that very personal data:

    Hobbies Location Lifestyle Worries Friends and acquaintances People you trust

    Hi Mr Patrick. This is theDoctors Surgery. Re yourexamination last week forMan Flu, we thought you

    might like to read this:www.innocentwebname.org

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    17/31

    broadsight Copyright Broadsight Ltd 17

    Spear Phishing and other things Bad Guys do

    People with bad intentions are going to send you incredibly attractiveoffers

    (Jeff Jason, Chief Scientist, IBM Entity Analysis)

    GAME OVER!

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    18/31

    broadsight Copyright Broadsight Ltd 18

    Its not just Bad Guys.

    Cocktail Party, 2020, 20/20 vision

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    19/31

    broadsight Copyright Broadsight Ltd 19

    ..all you need is people who collect lots of data, and share it as abusiness model

    Google Glasses, 2020 Vision

    iPhone atsamehotel lastweek,same time

    iPhone athotel Tue.last week

    Charged for fraud lastyear, not guilty butthe community isdubious...

    Social Media streamsays 89% probabilityhes gay and BNP

    Her shoppingdata shows shehas 80% chanceof being pregnant

    Medicalrecords

    show herhusbandis infertile.No fertilitytreatmentprogramrecorded

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    20/31

    broadsight Copyright Broadsight Ltd 20

    Far fetched?

    The infidelity App map: How iPhone can secretly keep track on lovecheats (Daily Mail, 2011Researchers found that could get stored locationdata out of iPhones if they knew the phone numbers)

    How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did(Forbes, 2012, Target stores algorithms identify pregnant girl)

    Gay? Conservative? High IQ? Your Facebook 'likes' can reveal traits(NBC2013. University of Cambridge's Psychometrics Centre algorithms.)

    BNP member? Beware the Man in the Middle with a Mission!

    (Guardian, 2009 - BNP membership list appears on Wikileaks)

    The other 2 cases are hypothetical, but could come from Government dataalready in the frame for being opened up

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    21/31

    broadsight Copyright Broadsight Ltd 21

    Even Good Guys cause problems! The road to hell is alwayspaved with good intentions - and bad business cases

    We are making a stron g case for the release of an Open NationalAdd ress Dataset . We know that many of you with in th e data, bus iness

    and publ ic sector communi t ies sup por t th is cal l, as do many indiv id ual

    ci t izens (data.gov.uk)

    As one commentator on the blog post pointed out:

    There is no analysis of the disbenefits to the householders of have every fly

    by night marketing companies having their addresses or it use by fraudsters

    or identity theft.

    .dodgy grammar, but spot on analysis.The Original Sin writ large.

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    22/31

    broadsight Copyright Broadsight Ltd 22

    The politics of Data

    the infographic will be the new stump speech, questioning the data

    wil l be the new rebu ttal

    (Alastair Croll, OReilly Data blog)

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    23/31

    broadsight Copyright Broadsight Ltd 23

    The politics of Open Data

    the more informed that strong political partisans were (about globalwarming), the less they agreed with each other

    (Nate Silver, The Signal and the Noise, quoting a paper from Nature)

    3 main politically driven forces:

    Available data is seldom the whole story, but will be seen like a lamp-postwhen looking for car keys by pressure groups

    Some of the data will fuel politically contested issues

    The data itself will become political (Who collected it? How accurate is it?Whose agenda does it serve?) and debased

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    24/31

    broadsight Copyright Broadsight Ltd 24

    The politics of Open Crime Data

    Crime Mapping - what you measure, gets undone

    Inaccurate dataIn December 2011, Surrey Street in Portsmouth wasreported as having 136 crimes, when in fact it had just two.

    breeds inaccurate dataDirect Line Insurance in the same year found

    that 11% of respondents claim to have seen but not reported an incidentbecause they feared it would make it more difficult to rent or sell theirhouse.

    and politically unacceptable dataA service called "Ghetto Tracker"appeared online at the beginning of this week (USA, Sep 2013) and quickly

    drew criticism for its racist and classist overtones.but the service,renamed, remains

    and ultimately: Some communities in the US are starting to resist usingcrime mapping owing to the above dynamics.

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    25/31

    broadsight Copyright Broadsight Ltd 25

    The realpolitik of Open Data (I)

    The influential are gaining more influence.

    A recent study of who uses the British mySociety TheyWorkForYou.com opengovernment initiative found that:

    "people above the age of 54 tend to be over-represented, while those younger

    than 45 are under-represented in comparison to the Internet population. In

    terms of demographics there is a strong male bias and a strong

    overrepresentation of people with a university degree that also translates into

    strong participation from high income groups.

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    26/31

    broadsight Copyright Broadsight Ltd 26

    The realpolitik of Open Data (II)

    Open (Government) Data is "what modern deregulation looks like

    The current transparency agenda [of the UK government, supported by

    prominent Open data advocates] should be recognised as an initiative that also

    aims to enable the marketisation of public services, and this is something that is

    not readily apparent to the general observer.

    Further, whilst democratic ends are claimed in the desire to enable the public

    to hold the state to account via these measures, there is an issue in utilising a

    dichotomy between the state and a notion of the public which does not

    differentiate between citizens and commercial interests

    (Jo Bates, This is what modern deregulation looks like,2012, ManchesterMetropolitan University)

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    27/31

    broadsight Copyright Broadsight Ltd 27

    Next Steps

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    28/31

    broadsight Copyright Broadsight Ltd 28

    The Downside Case

    Get the data out, we will deal with the problems later

    1. The combination of enthusiasts who see no problems, and commercialinterests who intend to make money from the causes of the problems, willensure data will get out without adequate protections

    2. The people who experience the problems will have little redress initially, butresistance will increase via social media channels

    3. There will be scandals, lessons will be learned, but little will be done

    4. until there is one scandal too many, and too many people will have been

    damaged, and the pressure to Do Something will be unavoidable.

    5. Finally there will be (over) regulation, an OfData will be formed, and it will allsettle down to business as usual

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    29/31

    broadsight Copyright Broadsight Ltd 29

    Working for an Upside Case

    1. Accept there is an Original Sin problemdesignfor Bad Guys in theArchitecture in the systems, regulations and economics of Open Data.

    2. Take strong steps to prevent hackinghighly secure referencedatabases, strong anti-hacking capability, screen data for triangulationissues.

    3. Know whose data it isseek permission from data owners for its use, andensure the taxpayer is not funding private profits, nor on the hook for losses.

    4. Toll booths on the roads paved with good intentions- Streamlining legalaction on those whose data misuse caused the damage would force planning

    for hacking and misuse into the service fabric from the get-go

    5. Governance of Open DataOversight by publically accountable bodies,and regulation of commercial data practices beforePandoras Box is opened.There is a case for an OfDat sooner rather than later.

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    30/31

    broadsight Copyright Broadsight Ltd 30

    Appropriate Technologies

    Different rules for different Tiers

    Tier 1: Data with no publicinterest implications

    Tier 2: Data with public interestimplications

    Tier 3: Data with public interestimplications that includespersonal information

    (Final report on Open Data Dialogue - Research Councils UK)

  • 8/13/2019 Friday lunchtime lecture: Open data - the dark side

    31/31

    broadsight Copyright Broadsight Ltd 31

    What can you do as individuals?

    1. Be VigilantThe pressure to release private data will be across the board,Tier 3 data is the gold everyone wants.

    2. Be PreparedIt will be good, responsible citizens who will bear the brunt ofthe mistakes and misdemeanours as they are easier to hack and have

    assets. Good people will need to start to generate bad data.

    3. Opt Out- where you have a choice, and demand a choice where you cant

    4. AgitateTake action against plans that look unwise or downright foolhardy,use Social Media especially to do so.

    5. Get involvedin pushing for a good outcome - organisations are springingup to lobby for the citizens digital rights in the UK and Globally.