Click here to view the presentation.
description
Transcript of Click here to view the presentation.
![Page 1: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/1.jpg)
Predictive Tax CompliancePresentation to the IRS
SPSS
Benjamin ChardSenior Solution [email protected]
Sarah MattinglyIRS Account [email protected]
SRA
Ted FischerProject [email protected] or [email protected]
![Page 2: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/2.jpg)
Agenda
Introduction to Data Mining
Predictive Tax Compliance
Using Clementine for Audit Selection
What’s New in Clementine Version 11.1
IRS Refund Fraud Detection Project Case Study
![Page 3: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/3.jpg)
Where Does Data Mining Fit?
Operational Setting•Reporting•Case Mgt
•Claim Scoring
Operational Setting•Reporting•Case Mgt
•Claim Scoring
Build ModelsData MiningWorkbench
Build ModelsData MiningWorkbench
Existing Data •Historical Claims•Current Claims
Existing Data •Historical Claims•Current Claims
![Page 4: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/4.jpg)
‘Data Mining’ vs. ‘Query/Reporting’
Reporting (Tables, Graphics, OLAP)
Provide you with a very good view of what is happening, but within a limited view of the data and only in models defined by the user
YEAR
200120001999
Cou
nt
600
500
400
300
200
100
0
A&B
Assault
B&E
carjacking
Larceny
Murder
MV
Rape
Robbery
other
Incident Count - by day and shift
Count
48 15 43 62 73 68
25 39 101 131 199 100
21 27 106 179 191 102
29 38 101 177 177 103
38 50 105 168 197 107
33 40 88 147 209 107
45 21 52 82 116 112
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
00-04 04-08 08-12 12-16 16-20 20-24
![Page 5: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/5.jpg)
‘Statistics’ vs. ‘Data Mining’ Statistics: Hypothesis Testing
![Page 6: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/6.jpg)
Three classes of data mining algorithms:
Predict who is likely to exhibit specific behavior in the future.
Associate
“Patterns”
Predict
“Relationships”
Cluster
“Differences”
Data
Mining
Group cases that exhibit similar characteristics.
What events occur together? Given a series of actions; what action is likely to occur next?
What is Data Mining?
![Page 7: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/7.jpg)
Predictive Tax Compliance
![Page 8: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/8.jpg)
Predictive Tax Compliance
Tax Collection•Risk Models
Tax Collection•Risk Models
Audit Selection • Audit Models
Audit Selection • Audit Models
Non-Filer Discovery•Soft-Matching
•Prioritization Models
Non-Filer Discovery•Soft-Matching
•Prioritization Models
RegisterRegister AssessAssess CollectCollect
DATA WAREHOUSE
DATA MINING & PREDICTIVE ANALYTICS TOOLS
Right work to the right resources at the right time
![Page 9: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/9.jpg)
Predictive Modeling
Building a predictive profile of the claim that after investigation was flagged as an improper payment regardless of amount.
Select positive investigations Maximize those claims with the highest dollar adjustment found per audit hour.
Minimize the number of no-change audits.
Cat. % nBad 52.01 168
Good 47.99 155Total (100.00) 323
Credit ranking (1=default)
Cat. % nBad 86.67 143
Good 13.33 22Total (51.08) 165
Paid Weekly/MonthlyP-value=0.0000, Chi-square=179.6665, df=1
Weekly pay
Cat. % nBad 15.82 25Good 84.18 133Total (48.92) 158
Monthly salary
Cat. % nBad 90.51 143
Good 9.49 15Total (48.92) 158
Age CategoricalP-value=0.0000, Chi-square=30.1113, df=1
Young (< 25);Middle (25-35)
Cat. % nBad 0.00 0Good 100.00 7Total (2.17) 7
Old ( > 35)
Cat. % nBad 48.98 24Good 51.02 25Total (15.17) 49
Age CategoricalP-value=0.0000, Chi-square=58.7255, df=1
Young (< 25)
Cat. % nBad 0.92 1Good 99.08 108Total (33.75) 109
Middle (25-35);Old ( > 35)
Cat. % nBad 0.00 0Good 100.00 8Total (2.48) 8
Social ClassP-value=0.0016, Chi-square=12.0388, df=1
Management;Clerical
Cat. % nBad 58.54 24
Good 41.46 17Total (12.69) 41
Professional
Cat. % nBad 52.01 168
Good 47.99 155Total (100.00) 323
Credit ranking (1=default)
Cat. % nBad 86.67 143
Good 13.33 22Total (51.08) 165
Paid Weekly/MonthlyP-value=0.0000, Chi-square=179.6665, df=1
Weekly pay
Cat. % nBad 15.82 25Good 84.18 133Total (48.92) 158
Monthly salary
Cat. % nBad 90.51 143
Good 9.49 15Total (48.92) 158
Age CategoricalP-value=0.0000, Chi-square=30.1113, df=1
Young (< 25);Middle (25-35)
Cat. % nBad 0.00 0Good 100.00 7Total (2.17) 7
Old ( > 35)
Cat. % nBad 48.98 24Good 51.02 25Total (15.17) 49
Age CategoricalP-value=0.0000, Chi-square=58.7255, df=1
Young (< 25)
Cat. % nBad 0.92 1Good 99.08 108Total (33.75) 109
Middle (25-35);Old ( > 35)
Cat. % nBad 0.00 0Good 100.00 8Total (2.48) 8
Social ClassP-value=0.0016, Chi-square=12.0388, df=1
Management;Clerical
Cat. % nBad 58.54 24
Good 41.46 17Total (12.69) 41
Professional
![Page 10: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/10.jpg)
Anomaly Detection
Find emerging trends in claims data. Use data mining to show the emerging patterns in current year data. Reported results will present specific cases that either : Exhibit a common pattern or Exhibit an unusual pattern
Unusual cases are deployed to the field investigators for further analysis.
![Page 11: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/11.jpg)
Case Study: Audit Selection Goals
Build models to predict different outcomes. Positive Adjustment (Y/N). DPH group membership. Actual $$ Adjustment.
Historical Cases selected for model build Cases with Prior audit – prior audit and organizational data. All Cases – organizational data only.
Deployment For each outcome combine predictions for those with and
without previous audit data . For each outcome predict using organizational data only.
![Page 12: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/12.jpg)
Clementine Workbench
![Page 13: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/13.jpg)
Case Study: Results
![Page 14: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/14.jpg)
Text Mining and Linguistic ExtractionText Mining and Linguistic ExtractionText Mining and Linguistic ExtractionText Mining and Linguistic Extraction
![Page 15: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/15.jpg)
Text Mining Timeline: Text Extraction
Bag of « Words » extraction
Expressions extraction
Named Entities extraction
Events/SentimentExtraction
Combined with structured data
70’s 80’s 90’s Now
Mr.Smithakawasseenwith
Ahmedonthe
cornerof
ChurchEtc.
Mr. Smithwas seen
Mr. Ahmedcorner
Church St.Magnolia Ave.
Nov 13thMr. Smith -> Person
Mr. Ahmed-> Personaka -> Alias
was seen -> location
Church St. -> AddressMagnolia Ave. -> Address
Nov 13th -> Date
Mr. Smith (Person) -> aka (Alias) -> Mr. Ahmed (Person)was seen (location) -> Church and Magnolia (address) ->
November 13 (Date)
Mr. Ahmed in database wanted for questioning
Suspect-> send agent to this
location
“Mr. Smith aka Mr. Ahmed was seen on the corner of Church St. and Magnolia Ave. on Nov 13 th”
![Page 16: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/16.jpg)
Text Mining Management
General Dictionaries
Organization, Location, Name, Phone Number, etc
Custom Built Subject Dictionaries
Tax Code, Form Names, Commodity, Business, etc
Interactive Synonym Dictionaries
Exclude Dictionaries
NEW!: Classification algorithms enable you to aggregate concepts from a wide variety of unstructured text data and group them into a small number of categories.
![Page 17: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/17.jpg)
What’s NewWhat’s NewWhat’s NewWhat’s New
![Page 18: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/18.jpg)
Binary Classifier – Automation of Many Models
Sophisticated users: hundreds of models (scripting)
Binary Classifier Node imitates this… …but easily, with a pre-built node
![Page 19: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/19.jpg)
Time Series Algorithm
ARIMA & Exponential Smoothing
Expert Modeler – finds best model automatically
Forecast Multiple Series at once
Data Preparation Tools
![Page 20: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/20.jpg)
Optimal Binning
Splitting up numeric data into sub-ranges
New capability to make this optimal for prediction
Existing Capability – Equal bins New Capability – Optimal bins
![Page 21: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/21.jpg)
SPSS Reporting
SPSS Statistics and Graphs Within Clementine
![Page 22: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/22.jpg)
Configuration Management
AuditProcessAudit
Process
Analytical Data StorageAnalytical
Data Storage
Data MiningData
Mining
AuditSelection
AuditSelection
AuditProcessAudit
Process
Analytical Data StorageAnalytical
Data Storage
Data MiningData
Mining
AuditSelection
AuditSelection
Predictive EnterpriseServices (PES) Top Four
![Page 23: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/23.jpg)
Deployment and Integration
Configuration Management
Exporting Data, Models and Streams
Explore and Describe
![Page 24: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/24.jpg)
1. Improve Collaboration
In single project there is the potential to create a large number of models and versions of models: different out variables different algorithms different settings different training samples.
X # different data sets
X # different users
X # different locations.
![Page 25: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/25.jpg)
2. Improve Transparency
Provide information on which models are run on which data.
For audit standards, track who has made changes to the model and when.
Your analytics team from their desktop can see which models were
most recently run on data, so that they would be able to provide this
for internal audits.
![Page 26: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/26.jpg)
3. Automate Process
Combine Clementine, SPSS, SAS & other processes
Scheduling & notification
![Page 27: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/27.jpg)
4. Centralize and Control Access
![Page 28: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/28.jpg)
Contact information
Project personnel: Ted Fischer – [email protected] or
[email protected], 301-731-3534 Anthony Colyandro – [email protected]
or [email protected], 301-731-3524
SRA Director of Business Intelligence Dave Vennergrund – [email protected],
703-803-1614
![Page 29: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/29.jpg)
How do I get SPSS software?
IRSCathy J. Allen
Enterprise System ManagementSoftware Management Section
Idea Branch - MS 5850(304) 264-7279 - voice(304) 279-5309 - cell(304) 260-3033 - [email protected]
SPSS Contacts:Account Executive – Sarah Mattingly
Email: [email protected] – 703-740-2446C – 703-389-6485
Account Manager – Matt MaddenW - 312 651 3894
![Page 30: Click here to view the presentation.](https://reader033.fdocuments.us/reader033/viewer/2022061115/54641795af795967228b627e/html5/thumbnails/30.jpg)
Predictive Tax CompliancePresentation to the IRS
SPSS
Benjamin ChardSenior Solution [email protected]
Sarah MattinglyIRS Account [email protected]
SRA
Ted FischerProject [email protected] or [email protected]