Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud...
Transcript of Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud...
© 2014 Noblis, Inc.
Detecting Fraud Conspiracy Using Organizational Big Data
June 17, 2014
ACT-IAC Analytics and Big Data Committee
presented to the
2 © 2014 Noblis, Inc.
Overview of Analytic Approach
Case Examples Exposing Potential Fraud Conspiracy
• Provider K
• Provider H
• Hospital T
Next Steps and Summary
Topics
3 © 2014 Noblis, Inc.
A High-Performance Computing Platform
Cray XMT2
128 node configuration with high bandwidth 3D Torus interconnect
4 tera-bytes of shared memory
100 tera-bytes of Lustre storage
10 gigabit networking
Torque Clusters (2)
16 nodes, 512 cores with 512GB RAM
20 tera-bytes of usable storage
10 gigabit networking
Fraud detection algorithm development and analysis was enabled by a high-performance computing platform
4 © 2014 Noblis, Inc.
An Approach to Reveal New Insights
From Agency Data
From existing agency data sources
A new way to structure and analyze data
Discovers previously unknowable patterns and relationships
Central provider is 100 standard deviations away from peer average in type and volume of connections
Small private psychiatric hospital has same volume of distant connections as a major cancer center
One provider has two distinct geographic clusters, and is linked to entirely different kinds of providers in each location
5 © 2014 Noblis, Inc.
Case Examples
Case Likely Fraud Type
Provider K Referral fraud,
billing fraud
Provider H Stolen or “loaned”
National Provider Identifier (NPI)
Hospital T Patient brokerage,
billing fraud
6 © 2014 Noblis, Inc.
Federal Agency Data Set
A public use data set from Centers for Medicare and Medicaid Services (CMS) that linked medical providers who treat the same Medicare beneficiaries
The data provides a map of how doctors, hospitals and other providers team together to deliver care to the >65 population
Approximately 1M providers, approximately 50M provider relations, over 1B claims
It represents the number of times that two providers billed Medicare for the same patient within any 30-day window in 2011
7 © 2014 Noblis, Inc.
Extract ‘Latent Social Network’ From Claims Data
Claim ID Procedure Provider ID Date Beneficiary ID
0012599 Chiropractic Adjustment
1122334455 (Jones)
2013-07-21 39485922
⁞ ⁞ ⁞ ⁞ ⁞
0012784 Lung Capacity
Test 1234567890
(Smith) 2013-07-23 39485922
0012785 Ear Exam 1029384756 2013-07-23 44992029
Medicare Claims File
Event_1 Event_2 time lag
Jones Smith
Does this imply a relationship between
Jones and Smith?
Same beneficiary
8 © 2014 Noblis, Inc.
Generated Data Exposes “Normal” and
“Anomalous” Behavior
Jones and Smith may not know each other --- or ---
Jones may pass Smith beneficiary IDs and get a kickback when Smith bills for them
How would we know
A single pair of events shows nothing Pairs of events will occur with a certain frequency and distribution Pairs of events that happen ‘too often’ are suspect
How do we exploit this data
9 © 2014 Noblis, Inc.
Two Complementary Approaches
Whole Graph Analysis Sub-Graph Investigation
New Patterns for Detection
Anomalous Behavior
Examine behavioral patterns in specific cases
Requires understanding of service delivery and types of fraud
Vetting of leads for further investigation
Analyze entire graph for outliers based on sets of patterns and normal behaviors
Requires HPC
Develop ranked list of potential leads
10 © 2014 Noblis, Inc.
Whole Graph Anomaly Detection
Define a measure of provider or beneficiary behavior
Define peer groups that are meaningful
Must be able to calculate behavior values for each member of a group
Calculate group average for each peer group Distance of each provider from his peer average Standard deviation of the distances
Analyze providers or beneficiaries whose values are significantly different from their peer average
Validate each “peer-group/behavior-measure” combination by using government resources that
identify fraudsters
11 © 2014 Noblis, Inc.
Example Behavior Outbound Links to Medical Specialties
Each provider has a set of outbound links
NPI:12345, podiatry
NPI:23456, radiology
NPI:45678, radiology
NPI:56789, internist
NPI:67890, cardiology
NPI:54321, physical therapy
95 127
70
77
144
90
NPI:65432, dermatology
Specialty Outbound
Weight Sum
radiology 222
physical
therapy 144
dermatology 90
internist 77
cardiology 70
Specialty Outbound
Links
radiology 2
physical
therapy 1
dermatology 1
internist 1
cardiology 1
Counting the peers of each specialty gives us an idea of the provider’s social network, in terms of the specialties his patients see
Adding up the weights of the links for each specialty tells us about the strength of his association with each specialty
The size of the social network and the weight of links both contribute to overall behavior
12 © 2014 Noblis, Inc.
050
100150200250
radiology
physicaltherapy
dermatologyinternist
cardiology
NPI:12345
Average for podiatry
Behavior What is “Normal”?
For each specialty (e.g., podiatry), calculate the average outbound referral weight to every other specialty
For each provider, calculate the distance from his/her/its specialty average
Calculate the standard deviation of those distances
Ad hoc definition of an outlier:
“A provider whose behavior vector is more than 3 Std. Dev. from his/her/its specialty average.”
1 Std Dev
13 © 2014 Noblis, Inc.
Whole-Graph Anomaly Detection
Surname State Peer Group Number of Standard Deviations
From Peer Group Average
K NY Chiropractor 122.12
B NY Physical Therapy
90.43
B3 VT Family Medicine 79.13
D2 IL Social Worker 61.15
D NY Emergency Medicine
56.89
E TX Specialist 46.33
S TX Optometrist 46.01
K3 IL OB/GYN 45.23
V MA Internal
Medicine 43.62
G2 LA Neurology 40.04
Standard deviations from peer group average
0
1000
2000
3000
4000
5000
6000
0 20 40 60 80 100 120 140
1% of providers >3 standard deviations from their peer group average
The anomaly detection process naturally generates a ranked list of ‘leads’ by sorting the providers in descending order of their extremity
The top ten anomalous providers are shown in the table to the right
# o
f P
rovi
der
s
K B B3 D D2
14 © 2014 Noblis, Inc.
Case #1 Provider K
Identified Pattern Qualifying Questions Likely Fraud Type
Many standard deviations away
from peer average in type and volume
of connections
Referral fraud, billing fraud
Is this a data error?
Who is he linked to in the graph?
Is it really a large institution in the wrong peer group?
Maybe he has already been caught?
15 © 2014 Noblis, Inc.
Examine the Local Subgraphs of Suspicious Cases Use the Latent Social Network Graph to Build Cases
We assembled “Case” descriptions and pulled information from the Web about the persons and organizations for the anomalous providers at the top of the list
Starting with ‘K’ at the top, a single query revealed that ‘K’, ‘B’, and ‘D’ had extremely heavy links to each other
So they were treated as the core of a single case
Provider ‘K’ has a large latent social network
Provider ‘K’ has large edge weight totals, which reflect how many times his name appeared on Medicare claims
Links Weights
Inbound 446 874,960
Outbound 460 890,565
16 © 2014 Noblis, Inc.
Provider K Case Network Diagram of Key Associates
The diagram is centered on “Provider K” who is most central in terms of edge weights in his social network (line thickness reflects edge weight)
• Two providers in K’s network (“N” and “G,” highlighted) are listed on the NY OMIG’s web site in “Final Audit” documents as billing for overpayment by Medicare and Medicaid
• Based on our research, there may be additional anomalous behavior occurring in K’s network of providers (B, D, K2); however, none of these providers are currently listed on the OMIG web site
N
K2
G
D
K
B
17 © 2014 Noblis, Inc.
Provider K Case Case File Includes Known Legal Actions Against Entities
The New York State OMIG website
includes ‘Final Audit’ documents describing
overpayments ranging from a few
thousand dollars to millions
18 © 2014 Noblis, Inc.
Provider K Case Google Street View of Business Practice Locations
Four addresses are listed on Provider K’s website
• Only three can actually be found on Google Maps
• Two of the locations are shown at the right
Were hundreds of thousands of
chiropractic adjustments done in
these storefronts in 2011
19 © 2014 Noblis, Inc.
Case #2 Provider H
Identified Pattern Qualifying Questions Likely Fraud Type
Has connections in distant geographic locations, linked to entirely different kinds of providers in each location
Stolen or “loaned” National Provider Identifier
How far does the typical rehab doc’s network extend?
Did the provider move during 2011?
What does a rehab doc’s ‘social network’ look like?
20 © 2014 Noblis, Inc.
Whole Graph Analysis Discovered a Physician With
Two Distinct Geographic Clusters
Most provider-provider connections are local (<60 miles); some are distant (hospitals, radiologists, labs)
Some providers (2 percent to 10 percent, depends on specialty) move during the course of the year: their connections also show this bifurcated pattern
A rehab physician in Colorado appears to have a compromised provider ID
21 © 2014 Noblis, Inc.
Analysis of Geography + Social Networks
Network data integrated with geographic data revealed new behaviors
0
2
4
6
8
10
12
14
16
AZ
CO
TX
“Colorado links” form a profile consistent with rehab physician:
radiology, cardio care, rehab
“Texas links” form a profile consistent with primary care: other
PC, hospitals, labs
22 © 2014 Noblis, Inc.
Case #3 Hospital T
Identified Pattern Qualifying Questions Likely Fraud Type
Same volume of distant, one-way connections as a
major cancer center, but is a
small community psych facility
Patient brokerage, billing fraud
Given a case with convicted fraudsters, can we find them in our data?
What does their behavior look like in our data?
Are other providers linked to them in suspicious ways?
23 © 2014 Noblis, Inc.
Analyzing A Known Case Using the Network Data
Start with a known Fraud case
Find links of other providers to and from the known fraudsters Based on knowledge of healthcare delivery, evaluate if the kind, weight, physical distance, etc.
of each link is plausible Implausible links become the starting point of further investigation
Refocus on ‘implausibly linked’ providers
Filter out plausibly linked providers
Include and study other linked providers
Consolidate discoveries into coherent case description
Distill knowledge into repeatable algorithm
iterate
24 © 2014 Noblis, Inc.
Start with Known Fraud Case
From 2003 through August 2012, Provider P billed Medicare nearly $70M for services that were not properly rendered, for patients that did not qualify for the services being billed, and for claims for patients who were procured through bribes and kickbacks
Four convictions were made in June 2013 for conspiracy, healthcare fraud, wire fraud, and kickback offenses
In a separate plea in March 2013, a patient broker pled guilty to receiving kickbacks
Can these providers be found in our data?
Can we use the data to understand their fraudulent behavior?
25 © 2014 Noblis, Inc.
Walk the Graph Start With Provider P
Nearest neighbor sub-graph of Provider P
Edges (thickness of line) weighted by number of shared patients
Labeled by grouped primary taxonomy; color coded by state
• Red: Alabama
• Green: California
• Dark Blue: Florida
• Light Blue: Rhode Island (clinical laboratory)
• Black: Colorado (clinical laboratory)
Why are there one-way links to Provider P from a community hospital and a rehab physician in Wetumpka, Alabama?
26 © 2014 Noblis, Inc.
Re-Center Around Wetumpka and
Focus on Links Outside of Alabama
• Red: Georgia
• Orange: Florida
• Green: Louisiana
WETUMPKA
WETUMPKA
BOCA RATON
LAUDERHILL
PLANTATION
NORTH MIAMI
BOCA RATON
HALLANDALE BEACH
N. MIAMI
HOLLYWOOD
MOULTRIE
MOULTRIE
THOMASVILLE
MOULTRIEMOULTRIE
ATLANTA
MONTGOMERY
WAYNESBORO
MOULTRIE
JENNINGS
SHREVEPORT
BATON ROUGE
SHREVEPORT
SHREVEPORT
Previously unknown other Georgia network
Previously unknown other Louisiana network
Incidental other connections
We re-center the analysis around the unusual Wetumpka providers, and extract out their out-of-state linkages
The result is a map of previously unknown, likely fraudulent activity
Known fraud
network
Previously unknown
hub
What is going on here?
27 © 2014 Noblis, Inc.
Graph Analysis Reveals Divergent Behaviors
Green – Louisiana Blue – Mississippi
Louisiana provider draws primarily local patients
Dark Red – Alabama Bright Red – Georgia Orange – Florida Dark Blue – Michigan Azure – Colorado
Georgia provider draws patients from across the US
28 © 2014 Noblis, Inc.
Quantifying Divergent Behavior
The observed pattern for Hospital T is most similar to major national cancer research centers; Hospital T has 59 beds
Pattern is the number of long distance one-way patient links to small hospital
This pattern analysis enables the identification of other suspect cases
Org Name ToCity ToState All Links
In Links in
>120
H. LEE MOFFITT CANCER CENTER AND RESEARCH INSTITUTE TAMPA FL 102 62
ARTHUR G JAMES CANCER HOSPOTAL AND RESEARCH INSTITUTE
COLUMBUS OH 72 43
Hospital T GA 34 29
UIVERSITY OF MIAMI MIAMI FL 64 23
OKLAHOMA HEART HOSPITAL, LLC OKLAHOMA CITY OK 51 21
AMERICAN ONCOLOGIC HOSPITAL PHILADELPHIA PA 80 12
29 © 2014 Noblis, Inc.
In Summary
Leverages our high performance computing platform to rapidly detect and qualify leads
Exposes previously unknown fraudulent behavior
Can be applied to almost any large data set
Increases in power by combining data sets
Our approach: