Monitoring of Mails through Performance Tool ( Speed Post, Regd. Mails & Parcels)
Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State...
-
Upload
candice-melton -
Category
Documents
-
view
213 -
download
0
Transcript of Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State...
![Page 1: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/1.jpg)
Data Mining of E-Mails to Support Periodic & Continuous Assurance
Glen L. GrayCalifornia State University at Northridge
Roger DebrecenyUniversity of Hawai`i at Mānoa
5th Symposium on Information Systems Assurance5th Symposium on Information Systems Assurance
Toronto: October 2007
![Page 2: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/2.jpg)
In this Presentation
Continuous monitoring of emails – why? Technologies
Social Network Analysis Text analysis
Challenges Opportunities
![Page 3: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/3.jpg)
Continuous Monitoring of Emails – Why?
Increased focus on forensic approaches to auditing
Increased interest in continuous assurance and monitoring of business processes
Emails = Organization’s DNA Evidential matter on:
Employee & management fraud (overrides) Compliance (e.g., HIPAA) Loss of intellectual property Corporate policies
![Page 4: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/4.jpg)
Enron Email Archive
Released by Federal Energy Regulatory Commission
500K emails 151 Enron employees Cleaned version at Carnegie Mellon
www.cs.cmu.edu/~enron/ Relational DB version at USC
www.isi.edu/~adibi/Enron/Enron_Dataset_Report.pdf
![Page 5: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/5.jpg)
Email Mining Targets
EmailData Mining
Key WordQueries
DeceptionClues
Volume &Velocity
Social NetworkAnalysis
ContentAnalysis
LogAnalysis
![Page 6: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/6.jpg)
Content Analysis
![Page 7: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/7.jpg)
Key Word Queries
Yes, people do say self-incriminating things in their emails Fraud Corporate dysfunction
Overwhelming false positives Need “smart” compound queries Good continuous auditing (CA) candidate
Already scanning for spam, porn, etc.
![Page 8: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/8.jpg)
Sender Deception -- Content
Deceptive emails include: Fewer first-person pronouns to dissociate
themselves from their own words Fewer exclusive words, such as but and
except, to indicate a less complex story More negative emotion words because of the
sender’s underlying feeling of guilt More action verbs to, again, indicate a less
complex story
![Page 9: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/9.jpg)
Sender Deception -- Identification
Writeprint features Lexical -- characters & words
Function words Root words
Syntactic -- sentences Structural -- paragraphs Content-specific
![Page 10: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/10.jpg)
Sender Deception -- Identification
Number of potential features unlimited Optimum number can vary by
context and language Developing user profiles and comparing new
emails to profiles would be challenging for real-time CA
![Page 11: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/11.jpg)
Temporal/Log Analysis
![Page 12: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/12.jpg)
Volume & Velocity
Volume = number of emails a person sends and/or receives over a period of time.
Velocity = how quickly the volume changes. Many external factors (e.g., vacations, seasonal
activities, etc.) impact these numbers Need “rolling histogram”
![Page 13: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/13.jpg)
Volume & Velocity
Key issue -- determining the optimum time intervals to sample the data
Continuous monitoring cannot be continuous in terms of sampling in real time
Comparing hourly, daily, and even weekly volumes and velocities will result in many false positives
Optimum time internal could vary by job title
![Page 14: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/14.jpg)
Social Network Analysis
![Page 15: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/15.jpg)
Social Network Analysis
Social relationships as an undirected graph Importance of understanding relationships
within the flow of email exchanges
![Page 16: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/16.jpg)
Social Network Analysis in Emails
Emails semi-structured data sender primary recipient(s) copied recipient(s) date subject line
Social groups and cliques CA = who doesn’t belong?
![Page 17: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/17.jpg)
Thread Analysis – This?Time
S R
C
C
SR
C
C
R
C
C
S
S
R
C
C
![Page 18: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/18.jpg)
Thread Analysis – Or this?Time
S
R
C
C
S
R
R
C
S
C
R
R
S
R
![Page 19: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/19.jpg)
Integrating Content Analysis and Social Network Analysis
EmailData Mining
Key WordQueries
DeceptionClues
Volume &Velocity
Social NetworkAnalysis
ContentAnalysis
LogAnalysis
![Page 20: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/20.jpg)
Challenges of Email Mining
Textual Inconsistent use of abbreviations Misspelled words Smileys etc. etc. Replies, replies, and more replies…
Inability to identify: Identities of email participants
[email protected] Roles and responsibilities
![Page 21: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/21.jpg)
What Enron Emails Show?
People do say the darnest things What did he know and when did he know it? Verified numerous bodies of email data
mining research Content analysis Social network analysis
![Page 22: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/22.jpg)
Tools
Content monitoring eSoft Corporation’s ThreatWall Symantec’s Mail Security 8x00 Series Vericept Corporation’s Vericept Content 360º Reconnex Corporation’s iGuard Appliance InBoxer, Inc. Anti-Risk Appliance
Social networks Microsoft SNARF Heer Vizter
![Page 23: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/23.jpg)
Research Opportunities
![Page 24: Data Mining of E-Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of.](https://reader035.fdocuments.us/reader035/viewer/2022062804/56649e9e5503460f94b9f10b/html5/thumbnails/24.jpg)
Research Questions
Role of email monitoring in overall CA environment?
Join SNA with examination of textual patterns. Link SNA with control environment Frauds/control overrides footprint? What email cleaning is required for CA purposes? Privacy and policy issues? Lessons from existing commercial products?