Modeling and Detecting Anomalous Topic Access
description
Transcript of Modeling and Detecting Anomalous Topic Access
![Page 1: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/1.jpg)
Modeling and Detecting Anomalous Topic Access
Siddharth Gupta1, Casey Hanson2, Carl A Gunter3, Mario Frank4, David Liebovitz4, Bradley Malin6
1,2,3,4Department of Computer Science, 3,5Department of Medicine, 6Department of Biomedical Informatics
1,2,3University of Illinois at Urbana-Champaign, 4University of California, Berkeley, 5Northwestern University, 6Vanderbilt University
![Page 2: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/2.jpg)
• Motivation and Challenges• Our Contributions• Dataset Description• Random Topic Access (RTA) Model• Random Topic Access Detection (RTAD) Model• Evaluation and Results
Outline of the talk
![Page 3: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/3.jpg)
Reported on April 2013
• The University of Florida : 2 offenders illegitimately accessed 15,000 patients over 3 years (March 2009- October 2012).
• Personal information, including names, addresses, date of birth, medical record numbers and Social Security numbers were compromised for the purposes of billing fraud.
• One of the offender was the insider in the hospital without prior.
• How can we efficiently model and detect these types of attacks in the healthcare system.
EMR Access Breach
![Page 4: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/4.jpg)
• Two broad classes of threats:• Inside Threats: the behaviors of hospital users (staff) that adversely affects the
healthcare institution, where they commit financial frauds, medical identity thefts and curiosity accesses to EMR.
• Outside Threats: an outsider entity hires an insider to commit fraud, a visitor accessing records on open computers in some scenarios, untrustable patient seeking information about other patient’s records.
• Ramifications: Irreversible violation of patient privacy and subsequent high cost for hospitals.
• Deterrent: The current legal deterrent is a number of legal regulations, such as the HIPAA and HITECH, which impose specific privacy rules for patients and financial penalties for violating them
Motivation
![Page 5: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/5.jpg)
• Build a classifier on labeled data to differentiate anomalous users from legitimate users.
• Real healthcare data is not labeled.
• Current methods use injection of synthetic anomalous users and evaluate on them.
Classical Detection Methodologies
![Page 6: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/6.jpg)
• In Healthcare information systems the primary mechanism for generating anomalous users is to associate users with random patients in the dataset.
• We call such a system, ROA (random object access).
• The resulting user doesn’t appear to be a plausible attacker in the real hospital setting.
Random Object Access
![Page 7: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/7.jpg)
• Random Topic Access (RTA): we introduce and study a random topic access model or RTA aimed at users whose access may be illegitimate but is not fully random because it is focused on common semantic themes.
• User Simulation: we utilize the latent topic framework to simulate illegitimate users and model them as samples from a Dirichlet distribution over topic multinomials.
• Anomaly Detection Framework: study RTA to detect and evaluate the users having suspicious access patterns.
Our Contributions
![Page 8: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/8.jpg)
Data SetFig a) Summary Statistics for Audit Logs
Fig b)Summary Statistics for Patient Records
![Page 9: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/9.jpg)
• Random Topic Access (RTA) Model: a mechanism for utilizing latent topic structures to represent real users in the population and allow for the synthetic generation of semantically relevant anomalous users.
• Topic modeling can provide a concise description of how a user behaves in the context of his peers and the meaning of that behavior.
• Model users as samples from a Dirichlet distribution over topic multinomials.
Random Topic Access (RTA) Model
![Page 10: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/10.jpg)
Latent Dirichlet Allocation (LDA)
Diagnosis Raw FeaturePatient
...1 0 1 0 1
LDA
Diagnosis Topic FeaturePatient
1 0.2 0.1 0.70
![Page 11: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/11.jpg)
Topic Distributions
![Page 12: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/12.jpg)
Topics Distributions
Diagnosis Topics
Neoplasm Topic Obstetric Topic Kidney Topic
![Page 13: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/13.jpg)
Characterizing Users
Topic 1 Topic 2 Topic 30
0.10.20.30.40.50.60.70.80.9
1
User and Accessed Patient Topic Distributions
Patient 1: 100 times Patient 2: 30 times User
Topic ID
P(To
pic)
Patient 1 Patient 20
10
20
30
40
50
60
70
80
90
100
Number of Accesses
![Page 14: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/14.jpg)
Multidimensional Scaling: Patient Diagnosis
![Page 15: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/15.jpg)
RTA: Simulating Users• r ~ Dir() with n dimensions, where n is the number of topics.
a.) Directed or Masquerading User (α<1) : an anomalous user of some specialty gains sole access to the terminal of another user in the hospital.
b.) Purely Random User (α=1): user is characterized by completely random behavior, with little semantic congruence to the hospital setting
c.) Indirect User: user type resembles an even blend of the topics of many specialized users
![Page 16: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/16.jpg)
Population Distribution
α = 0.01 α = 0.1
α = 1 α = 100
A. Directed Users
B. Purely Random Users C. Indirected Users
![Page 17: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/17.jpg)
Role Distribution
NMH Resident Fellow CPOE
Masquerading Users Purely Random Users
Indirect Users
Anomalous Users
Real Users
![Page 18: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/18.jpg)
• Random Topic Access Detection (RTAD): an anomaly detection framework that generates synthetic users using RTA and applies a standard spatial outlier, k-nearest neighbor k-NN detection scheme for classification.
• Methodology1. LDA: define patient topics, and user typing to represent users in the topic
space.2. RTA user injection: generate three types of anomalous users and insert into
each role at a 5% mix rate.3. Detection (k-NN): if the ratio of the avg. distance from a user to its k nearest
spatial neighbors to the avg. pairwise distance among those neighbors is greater than a threshold, call the user anomalous.
4. Evaluation Metric: best Area Under the Curve (AUC) for each , role combination.
Random Topic Access Detection (RTAD)
![Page 19: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/19.jpg)
Results - I
The best AUC across all evaluated dimensions is plotted for each role performing poor for .
![Page 20: Modeling and Detecting Anomalous Topic Access](https://reader035.fdocuments.us/reader035/viewer/2022062400/56816923550346895de055aa/html5/thumbnails/20.jpg)
Results - II
The best AUC across all evaluated dimensions is plotted for each role performing well or near average for .