Validating EMR Audit Automation Carl A. Gunter University of Illinois Accountable Systems Workshop.

Post on 14-Dec-2015

213 views 0 download

Tags:

Transcript of Validating EMR Audit Automation Carl A. Gunter University of Illinois Accountable Systems Workshop.

Validating EMR Audit Automation

Carl A. GunterUniversity of Illinois

Accountable Systems Workshop

Situation• Access to hospital Electronic

Medical Record (EMR) data suffers risk of high loss in the event of false negatives (incorrect refusal of access).– Example: doctor acting on an

emergency cannot get access to list of allergies.

• Hospital has highly trained personnel in whom much trust is vested.

Consequences• Hospital access systems give

liberal access to records, relying on accountability.

• Insider threats are serious and abuses are widely documented.

• Accesses are too numerous to review manually by experts.

• Automated support is required.

Root Problem Statement

Ideal Approach• Obvious approach: develop

anomaly detector (AD) with rules and train classifiers on bad and good accesses.

• Run the AD on the audit logs and investigate positives manually with domain experts

Problem• This requires considerable

dependence on experts.• Assumes experts know how

to provide labels.• Assumes experts can

formulate rules.• Assumes labeled training sets

exist and that researchers will be able to get access to them.

Validation Problem Statement

• The primary validation approach applied by researchers in this area can be called the Random Object Access Model (ROAM).

• ROAM is based on the premise that anomalous users and accesses look random.

• Strategy– Develop rules and train classifier on real data set

augmented with synthetic random users and accesses.

– Test ability to recognize random users or accesses.

Primary Validation Approach

Pro• Likely that illegitimate

accesses appear random.• Good ROAM classifier

prepares for expert review to identify false positives.

• ROAM classifier may find legitimate but interesting hospital information flows.

• Provides a ready testing strategy reminiscent of “fuzzing”.

Con• There no current quantified

evidence that random accesses and illegitimate accesses have strong overlap.

• Indeed, there is evidence that in some cases legitimate accesses look random.

• Some illegitimate accesses may be systematic in ways that defy detection by ROAM classifiers.

ROAM Assessment

• What are the prospects for alternative models?• Example: introduce specific attacks experienced

“in the wild” similar to network traces enriched with known attacks.

• Another idea: look at problems like masquerading and open terminals.

• Behaviors are not random, but may display learnable characteristics.

Beyond ROAMing

Explored an alternative validation model based on topic classification. Idea:• Patients are “documents” and diagnoses, drugs, etc. are their

“words”. • Use Latent Dirichlet Allocation (LDA) to learn topics that can be

used to classify patients.• Use this to characterize users as readers of documents.• Detect unusual readers.• Detect readers of random topics.Modeling and Detecting Anomalous Topic Access, Siddharth Gupta, Casey Hanson, Carl A. Gunter, Mario Frank, David Liebovitz, and Bradley Malin. IEEE Intelligence and Security Informatics, June 2013.

Random Topic Access Model (RTAM)

Topic Distributions

Diagnosis Topics

Neoplasm Topic Obstetric Topic Kidney Topic

Multidimensional Scaling: Patient Diagnosis

RTAM: Random Users• r ~ Dir() with n dimensions, where n is the number of topics.

a.) Direct or Masquerading User (α<1) : an anomalous user of some specialty gains sole access to the terminal of another user in the hospital.b.) Purely Random User (α=1): user is characterized by completely random behavior, with little semantic congruence to the hospital setting.c.) Indirect User: user type resembles an even blend of the topics of many specialized users.

• Random Topic Access Detection (RTAD): an anomaly detection framework that generates synthetic users using RTA and applies a standard spatial outlier, k-nearest neighbor k-NN detection scheme for classification.

• Methodology1. LDA: define patient topics, and user typing to represent users in the topic

space.2. RTA user injection: generate three types of anomalous users and insert into

each role at a 5% mix rate.3. Detection (k-NN): if the ratio of the avg. distance from a user to its k nearest

spatial neighbors to the avg. pairwise distance among those neighbors is greater than a threshold, call the user anomalous.

4. Evaluation Metric: best Area Under the Curve (AUC) for each , role combination.

Random Topic Access Detection (RTAD)

Results - I

The best AUC across all evaluated dimensions is plotted for each role performing poor for .

Results - II

The best AUC across all evaluated dimensions is plotted for each role performing well or near average for .

• Other strategies besides ROAM may capture new types of threats.

• Good progress on technical measures of validation; need links to expert review and ground truth.

• More evaluation studies are needed.• Important to integrate access audit with general

business intelligence: understanding the roles and workflows of the organization.

Discussion and Conclusions