© 2011 IBM Corporation1
Privacy by Design (PbD)Confessions of an Architect
Privacy by Design | Time to Take ControlToronto, Canada
January 28th, 2011
Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics
© 2011 IBM Corporation2
Background
Early 80’s: Founded Systems Research & Development (SRD), a custom software consultancy
1989 – 2003: Built numerous systems for Las Vegas casinos including a technology known as Non-Obvious Relationship Awareness (NORA)
2005: IBM acquires SRD, now chief scientist of IBM Entity Analytics
Personally architected, designed and deployed +/- 100 systems, a number of which contained multi-billions of transactions describing 100’s of millions of entities
Selected Affiliations:– EPIC, Member, Advisory Board
– Privacy International, Member, Advisory Board
– Markle Foundation, Member, Task Force on National Security in the Information Age
– Senior Associate, Center for Strategic and International Studies (CSIS)
© 2011 IBM Corporation3
A Late Bloomer to Privacy
1980 – 2001 No clue whatsoever
2001 – 2006 Slowly waking up
2007 – 2011 Today, at best, a student of
privacy
© 2011 IBM Corporation4
A Journey Fraught with Reflection and Rethinking
The greater my privacy and civil liberties awareness
The greater the number of imperfections appear in the rearview mirror
© 2011 IBM Corporation5
Katrina – Missing Persons Reunification Project
Information about status of persons quickly end up scattered across countless databases
– Over 50 such web sites/organizations were identified as having victim related data
– Many people were registered duplicate times in the same database
– Many people were registered duplicate times across databases
– Many people were registered as missing in one database and found in another database
Connecting found persons previously reported as missing becomes nearly impossible
– Too many databases
– Constantly changing data
© 2011 IBM Corporation6
Katrina Reunification Project Statistics
Total data sources 15
Usable records 1,570,000
Unique persons 36,815
Total loved ones reunited >100
© 2011 IBM Corporation7
Katrina – Missing Persons Reunification Project
Privacy by Design– Contractually authorized to delete all the
data after the reunification office completed its work
– Hence, a few months later, all collected data and reporting products were deleted
DESTRUCTION OF EVIDENCE!Data Decommissioning – Destruction of Accountability
© 2011 IBM Corporation8
“G2”My Skunk Works Project
© 2011 IBM Corporation9
G2: Sensemaking on Streams
1) Evaluate new information against previous information … as it arrives.
2) Determine if what is being observing is relevant.
3) Deliver this relevant, actionable insight fast enough to do something about it … as it’s happening.
4) Do this with sufficient accuracy and scale to really matter.
© 2011 IBM Corporation10
From Pixels to Pictures to Insight
Observations
Contextualization
Information inContext
Relevance
Consumer(An analyst, a system, the sensor itself, etc.)
© 2011 IBM Corporation11
G2: Sensemaking on Streams
Domain: People, organizations, places, things, events … proteins, asteroids, and more.
Will simultaneously commingle and make sense over structured, unstructured, biographic, biometric and geospatial data
Multi-lingual
Even curious: If it is unsure, it figures if it is worth researching and may choose to ask Google or maybe even Jeopardy champion to clear up any confusion
© 2011 IBM Corporation12
Harnessing Big Data. New Physics.
More data: better the predictions
More data: bad data … good
More data: less compute
© 2011 IBM Corporation13
Smarter Planet: Example G2 Use Cases
Traffic optimization– Route suggestions pushed to drivers, just-in-time, to
avert significant traffic events
Optimize individual lives– Search results optimized based on predictions about
where you are going next
Pandemic response– A nation able to work right through an extreme global
pandemic with real-time citizen recommendations (e.g., “quarantine yourself!”)
© 2011 IBM Corporation14
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. ALTHOUGH EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.
© 2011 IBM Corporation15
IBM InfoSphere Sensemaking V1.1.0.0
Following two years of skunk works development while guided by privacy by
design goals …
it is just possible that there are more privacy and civil liberties enhancing
capabilities baked-in, during conception and design, than any other general
purpose advanced analytics technology commercially available … on Earth … to
date.
© 2011 IBM Corporation16
PbD: Full Attribution
ABOUT THE FEATURE Every record knows where it came from and when No merge/purge data survivorship processing
IMPORTANCE Universal Declaration of Human Rights has four
articles containing the word “arbitrary” e.g., Article 9 reads “No one shall be subjected to arbitrary arrest, detention or exile.” If you don’t know where the data came from, how can this be non-arbitrary?
The ability to identify every original record is essential for reconciliation and audit
© 2011 IBM Corporation17
PbD: Data Tethering
ABOUT THE FEATURE Adds, changes and deletes from source systems
can be processed Real-time, sub-second (not requiring periodic
batch reloading)
IMPORTANCE Data currency in information sharing
environments is important e.g., when derogatory data in error is corrected in a source system, it is vital such corrections are corrected everywhere, immediately
© 2011 IBM Corporation18
PbD: Analytics on Anonymized Data
ABOUT THE FEATURE Owners of data can anonymize selected fields before
an information transfer Despite the cryptographic form of the data, deep
predictive analytics (including some fuzzy matching) can still be accomplished when fusing this data for discovery and analysis
IMPORTANCE With every copy of data, there is an increased risk of
unintended disclosure Data anonymized before transfer and anonymized at
rest reduces the risk of unintended disclosure And with full attribution, re-identification is by design to
ensure reconciliation and audit
© 2011 IBM Corporation19
PbD: Tamper Resistant Audit Logs
ABOUT THE FEATURE Who searches for what is logged in a consistent
manner Even the database administrator cannot alter the
evidence contained in this log
IMPORTANCE Every now and then people with access and
privileges take a look at records without a legitimate business purpose, e.g., an employee of a banking system looking up their neighbor
Tamper resistant logs make it possible to audit user behavior and can cause chilling-effects on misuse
© 2011 IBM Corporation20
PbD: False Negative Favoring Methods
Patrick T Smith340-900-9000
Patricia Smith340-900-9000
Pat T Smith340-900-9000
Student
??
1 2
3
Patrick T Smith340-900-9000
Patricia Smith340-900-9000
Pat T Smith340-900-9000
Student
Closest. Hence, for sure
EXISTING BEST PRACTICE
1 2
3
© 2011 IBM Corporation21
PbD: False Negative Favoring Methods
ABOUT THE FEATURE A false negative occurs when something that is true is not
detected Sometimes a new record can belong to two different
entities Usually systems select the strongest of the two But had there been only one choice, it would have matched
to the other This is now properly handled, in real-time
IMPORTANCE If a new record gets arbitrarily assigned, you may have
inadvertently created a false positive False positives can adversely effect peoples lives – e.g., the
police find themselves knocking down the wrong door or an innocent passenger is denied the ability to board a plane
© 2011 IBM Corporation22
PbD: False Negative Favoring Methods
Patrick T Smith340-900-9000
Patricia Smith340-900-9000
Pat T Smith340-900-9000
Student
?? NEW
BEST PRACTICE
Patrick T Smith340-900-9000
Patricia Smith340-900-9000
Pat T Smith340-900-9000
Student
100%100%
1 2
3
1 2
3
© 2011 IBM Corporation23
PbD: Self-Correcting False Positives
Which reveals this is a FALSE POSITIVE
John T Smith Jr123 Main Street703 111-2000
DOB: 03/12/1984
John T Smith123 Main Street703 111-2000
DL: 009900991
A plausible claim these two people are the same
1
2 John T Smith Sr123 Main Street703 111-2000
DL: 009900991
Until this record comes into view
3
© 2011 IBM Corporation24
PbD: Self-Correcting False Positives
John T Smith Jr123 Main Street703 111-2000
DOB: 03/12/1984
John T Smith123 Main Street703 111-2000
DL: 009900991
John T Smith Sr123 Main Street703 111-2000
DL: 009900991
New Best Practice:FIXED IN REAL-TIME
(not end of month)
John T Smith123 Main Street703 111-2000
DL: 009900991
1
3
2
2
© 2011 IBM Corporation25
PbD: Self-Correcting False Positives
ABOUT THE FEATURE A false positive is an assertion (claim) that is made, but not true With every new data point presented, all prior assertions are re-
evaluated to ensure they are still correct, and if now incorrect, these are repaired
If two people were thought to be the same because they share the same name, address and phone – then later it is discovered this is a JR and SR (two different people), this is now remedied
In real-time, not end of month
IMPORTANCE False positives can adversely effect peoples lives Without self-correcting false positives, databases start to drift from
the truth and become visibly wrong – necessitating periodic reloading to fix this
Periodic monthly reloading would mean wrong decisions are possible all month until the next reload, even though you knew beforehand
© 2011 IBM Corporation26
PbD: Information Transfer Accounting
Basic Data
Name: Mark T SmithAddress: POB 1346City: Seattle
Phone: (310) 555-0000
Tax ID: 556-99-9999
Balance: $361.43
© 2011 IBM Corporation27
PbD: Information Transfer Accounting
Who Looked
Date Name Why01/09/2010 Ken Wales Teller trans11/24/2010 Susan Callie Fraud invest
© 2011 IBM Corporation28
PbD: Information Transfer Accounting
Sent Where
Date Sent to Why04/19/2010 ADP Payroll synch06/01/2010 Amex Marketing alliance07/16/2010 S&J IncThird party deal12/31/2010 IRS Annual compliance
© 2011 IBM Corporation29
PbD: Information Transfer Accounting
ABOUT THE FEATURE Can record who inspected each record and record this
with the record, mush like a credit report has a list of recent parties who have inquired
Can record what records were transferred to secondary systems, allowing users to inspect information flows
IMPORTANCE It is often cumbersome to learn who has seen what
records or what records have been shared system-to-system
Users can now be easily provided such disclosures increasing transparency and control e.g., able to recall or cancel information transfers from selected sharing partners
© 2011 IBM Corporation30
A Wide Number of Privacy by Design Features
Data Tethering
Analytics on Anonymized Data
Tamper Resistant Audit Log
Information Transfer Accounting
Full Attribution
False Negative Favoring
Self-Correcting False Positives
By design
By design
By design
By design
Mandatory
Mandatory
Mandatory
© 2011 IBM Corporation31
IBM InfoSphere Sensemaking V1.1.0.0
Smarter More Responsible&
© 2011 IBM Corporation32
IBM InfoSphere Sensemaking V1.1.0.0
Challenge
Try to find another general purpose advanced analytics technology with more
privacy and civil liberties enhancing features baked-in by design!
In this competition everyone wins.
© 2011 IBM Corporation33
And more likeminded, nifty features to come …
© 2011 IBM Corporation34
IBM InfoSphere Sensemaking V1.1.0.0
Date of availability: January 28th, 2011 (TODAY!)
~~ Caveat: Limited availability, subject to lab approval ~~
© 2011 IBM Corporation35
Related Reference Material
Big Data. New Physics.
Decommissioning Data: Destruction of Accountability
Source Attribution, Don’t Leave Home Without It
Data Tethering: Managing the Echo
Out-bound Record-level Accountability in Information Sharing Systems
To Anonymize or Not Anonymize, That is the Question
Immutable Audit Logs (IAL’s)
Big Data Flows vs. Wicked Leaks
© 2011 IBM Corporation36
Privacy-Enhancing Technology, State of the Union
Yesterday: Stand-alone privacy-enhancing technologies
– Exist
– If cost extra, adoption is low and slow
– Some researchers wander off – placing attention elsewhere
Today: Privacy by Design– Baked in
– No additional cost
– Some privacy and civil liberties enhancing functionality can even be embedded without an off switch
© 2011 IBM Corporation37
Finally …
Privacy by design is more than just technology.
Equal, if not more attention, must be placed on privacy by design when conceiving process and policy.
© 2011 IBM Corporation38
Privacy by Design (PbD)Confessions of an Architect
Privacy by Design | Time to Take ControlToronto, Canada
January 28th, 2011
Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics
Top Related