Sharing Health Research Data
-
Upload
kelemam -
Category
Technology
-
view
301 -
download
3
description
Transcript of Sharing Health Research Data
![Page 1: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/1.jpg)
SHARING HEALTH RESEARCH DATA
De-identificationMETHODS & EXPERIENCES
Dr. Khaled El EmamElectronic Health Information Laboratory
![Page 2: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/2.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Motivations for De-identification• Obtaining patient consent/authorization – not
practical for large databases and introduces bias
• Compliance to regulations / legislation
• Contractual obligations• Maintain public / consumer /
client trust• Costs of breach notification
![Page 3: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/3.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
![Page 4: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/4.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
![Page 5: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/5.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
![Page 6: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/6.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
![Page 7: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/7.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
A Balance
![Page 8: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/8.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Health information that does not identify an individual and with respect to which there is
no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable
health information.
Definition of De-identified Data
![Page 9: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/9.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
• Just to clear this issue out at the beginning• There are some claims that health data is easy to
re-identify• Often examples are used to support that argument• The evidence does not support these claims
– When data are de-identified properly the probability of a successful re-identification attack is very small
• Let’s consider a few highly publicized examples
Re-identification Attacks
![Page 10: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/10.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
• AOL releases search queries replacing usernames with pseudonyms
• New York Times reporters re-identify one user 4417749
• Her search terms: “tea for good health”, “numb fingers”, “hand tremors”, “dry mouth”, “60 single men”, “dog that urinates on everything”, “landscapers in Lilburn, Ga”, “homes sold in shadow lake subdivision gwinnett county georgia”
• Thelma Arnold, widow living in Lilburn Ga ; she has three dogs
AOL
![Page 11: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/11.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
• It is well known that a large percentage of individuals run ‘vanity’ searches that include their names – Thelma Arnold did
• It is also known that location information can be determined from an individual’s search queries
• Search queries, even if the username is replaced with a pseudonym, cannot be considered de-identified
AOL ?
![Page 12: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/12.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
• Governor Weld of Massachusetts was unwell during a public appearance – the story was covered in the media
• Semi-publicly available insurance claims data matched with voter registration lists
• It was possible to determine which claims records belonged to the Governor
Weld
![Page 13: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/13.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
• This re-identification attack was done before HIPAA came into effect – the insurance claims data would not pass any of the HIPAA de-identification standards
• A recent analysis indicated that Weld was likely re-identified because he was a famous person and there was already a lot of information about him in the media (his admission date, his diagnosis, his discharge date) – the voter registration list was arguably not necessary
• The success rate for such an attack would be lower for general members of the public because the voter registration list is incomplete
Weld ?
![Page 14: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/14.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
• Netflix publicly released movie ratings data in the context of a competition to develop a recommendation algorithm
• Researchers re-identified a couple of records by matching with a publicly available and identifiable movie ratings database (IMDB)
• Results in cancellation of a second competition and litigation started against Netflix for exposing personal information
Netflix
![Page 15: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/15.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
• The re-identifications were not actually verified by Netflix
• Authors of attack admit that the Netflix data was not de-identified (replaced usernames with pseudonyms)
• The false positive rate of the matching was not evaluated (how many people in the IMDB database were actually in the Netflix database ?)
Netflix ?
![Page 16: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/16.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0028071
![Page 17: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/17.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
• Attribute disclosure: discover something new about an individual in the database without knowing which record belongs to that individual
• Identity disclosure: determine which record in the database belongs to a particular individual (for example, determine that record number 7 belongs to Bob Smith – that is identity disclosure)
• HIPAA only cares about identity disclosure
Attribute vs Identity Disclosure
![Page 18: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/18.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Statistically significant relationship (chi-square, p<0.05)
High risk of attribute disclosure
Attribute vs Identity Disclosure
HPV Vaccinated NOT HPV Vaccinated
Religion A 5 40
Religion B 40 5
![Page 19: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/19.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Statistically significant relationship (chi-square, p<0.05)
High risk of attribute disclosure
Attribute vs Identity Disclosure
HPV Vaccinated NOT HPV Vaccinated
Religion A 5 40
Religion B 40 5
![Page 20: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/20.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
After suppression Not statistically significant relationship (chi-square) Low risk of attribute disclosure
Attribute vs Identity Disclosure
HPV Vaccinated NOT HPV Vaccinated
Religion A 5 6
Religion B 6 5
![Page 21: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/21.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Stigmatizing Analytics
![Page 22: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/22.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Health information that does not identify an individual and with respect to which there is
no reasonable basis to believe that the information can be used to identify an
individual
Definition of De-identified Data
![Page 23: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/23.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Direct Identifiers• Fields that would uniquely identify individuals
in a database• Name, address, telephone number, fax
number, MRN, health card number, health plan beneficiary number, license plate number, email address, photograph, biometrics, SSN, SIN, implanted device number
![Page 24: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/24.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Dealing with Direct Identifiers• Defensible approaches:
– Remove those fields– Convert them to one-time or persistent
pseudonyms– Randomize the values
• These approaches will ensure, if done properly, that the probability of recovering the original value is very small
![Page 25: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/25.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Quasi-Identifiers• sex, date of birth or age, geographic locations (such
as postal codes, census geography, information about proximity to known or unique landmarks), language spoken at home, ethnic origin, aboriginal identity, total years of schooling, marital status, criminal history, total income, visible minority status, activity difficulties/reductions, profession, event dates (such as admission, discharge, procedure, death, specimen collection, visit/encounter), codes (such as diagnosis codes, procedure codes, and adverse event codes), country of birth, birth weight, and birth plurality
![Page 26: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/26.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
![Page 27: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/27.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
![Page 28: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/28.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
![Page 29: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/29.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
![Page 30: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/30.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
![Page 31: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/31.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
![Page 32: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/32.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
![Page 33: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/33.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Re-identification Risk Measurement• Risk measurement will depend on:
– Granularity of quasi-identifiers– Region of the country we are talking about– Risk metric used (eg, uniqueness or groups of 5)– Threshold for what is acceptable risk
![Page 34: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/34.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
De-identification Standards• The HIPAA Privacy Rule specifies two de-
identification standards (45 CFR 164.514):– Safe Harbor– Statistical method (also known as the expert
statistician method)
![Page 35: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/35.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Safe Harbor Direct Identifiers and Quasi-identifiers
1. Names2. ZIP Codes (except
first three)3. All elements of dates
(except year)4. Telephone numbers5. Fax numbers6. Electronic mail
addresses7. Social security
numbers8. Medical record
numbers9. Health plan
beneficiary numbers10.Account numbers11. Certificate/license
numbers
HIPAA Safe Harbor
12.Vehicle identifiers and serial numbers, including license plate numbers
13.Device identifiers and serial numbers
14.Web Universal Resource Locators (URLs)
15. Internet Protocol (IP) address numbers
16.Biometric identifiers, including finger and voice prints
17.Full face photographic images and any comparable images;
18. Any other unique identifying number, characteristic, or code
![Page 36: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/36.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Safe Harbor Direct Identifiers and Quasi-identifiers
1. Names2. ZIP Codes (except
first three)3. All elements of dates
(except year)4. Telephone numbers5. Fax numbers6. Electronic mail
addresses7. Social security
numbers8. Medical record
numbers9. Health plan
beneficiary numbers10.Account numbers11. Certificate/license
numbers
HIPAA Safe Harbor
12.Vehicle identifiers and serial numbers, including license plate numbers
13.Device identifiers and serial numbers
14.Web Universal Resource Locators (URLs)
15. Internet Protocol (IP) address numbers
16.Biometric identifiers, including finger and voice prints
17.Full face photographic images and any comparable images;
18. Any other unique identifying number, characteristic, or code
![Page 37: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/37.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Two Problems with Safe Harbor• May be removing too much information on
the ZIP Code and date fields – these fields are useful for many analytical purposes
• Does not provide adequate protection – it is easy to have a Safe Harbor compliant data set with a high risk of re-identification
![Page 38: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/38.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
High Risk Safe Harbor Data - I• If the adversary knows that Bob, 55 year old
male, is in the database
Gender Age ZIP Lab Test
M 55 112 Albumin, Serum
F 53 114Alkaline
Phosphatase
M 24 134 Creatine Kinase
![Page 39: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/39.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
High Risk Safe Harbor Data - II• 2.24m visits, 1.6m patients, NY discharge
data for 2007• Compliant with Safe Harbor
Fields % of patients unique
age, gender, ZIP3 2.54%
age, gender, ZIP3, LOS 21.49%
![Page 40: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/40.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Statistical Method Conditions• A person with appropriate knowledge of and
experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable:I. Applying such principles and methods, determines that
the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information; and
II. Documents the methods and results of the analysis that justify such determination
![Page 41: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/41.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Re-identification Risk Spectrum
![Page 42: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/42.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Overall Risk
![Page 43: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/43.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Overall Risk
![Page 44: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/44.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Overall Risk
![Page 45: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/45.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Overall Risk
![Page 46: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/46.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Overall Risk
![Page 47: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/47.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Overall Risk
![Page 48: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/48.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Managing Re-identification Risk
![Page 49: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/49.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Different Types of Data Releases• The same data set can be disclosed
with different thresholds:– Public data set– Release with conditions for known data
recipients, including the requirement to sign a data sharing agreement, a prohibition on re-identification, and a requirement to pass these conditions to all sub-contractors
– The more conditions the higher quality the data set
![Page 50: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/50.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Example – CA Hospital Discharges• Context: data release to a researcher who will sign a
data use agreement, good practices for managing sensitive health information
• There were ~2.1m patients who had ~3m visits• Risk threshold = 0.2; use average risk across all
patients• Variables:
– Year of birth– Gender– Year of admission– Days since last visit– Length of stay
![Page 51: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/51.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Risk Level
![Page 52: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/52.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Hierarchy
![Page 53: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/53.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
De-identified Data
![Page 54: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/54.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Key Practical Considerations• Data warehouses: de-identification of data extracts
instead of whole data warehouses results in higher quality de-identified data
• Beware of correlated data: data in multiple medical domains are correlated, so one has to be cognizant of inference attacks on data
• Automation: automation can detect outliers and perform selective suppression, which results in higher quality de-identified data
• Transparency: important to ensure that methods have received peer and regulator scrutiny
![Page 55: Sharing Health Research Data](https://reader035.fdocuments.us/reader035/viewer/2022070320/55867c49d8b42a5c6e8b4650/html5/thumbnails/55.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Contact
@kelemam
www.ehealthinformation.ca