Sensitive Information Sweep Using Cornell’s Spider Wyman Miles, Cornell University Kerry Havens,...
-
Upload
jon-varney -
Category
Documents
-
view
213 -
download
0
Transcript of Sensitive Information Sweep Using Cornell’s Spider Wyman Miles, Cornell University Kerry Havens,...
Sensitive Information Sweep
Using Cornell’s Spider
Wyman Miles, Cornell University
Kerry Havens, University of Colorado at Boulder
Steve Lovaas, Colorado State University
Overview
• Quick Background
• The Technical Problem (Kerry)
• The Organizational Problem (Steve)
• Spider (Wyman)
• Summary & Questions
What is “Sensitive Information”?
• A Growing Concern
• A Moving Target
• SSN, Credit Card, Driver’s License, Medical Records, Student Information, Proprietary Research,…
• Data in Context – Aggregation
Why Are We All Here?
• The Front Page!
• CDW-G 2006 Survey – more than 3 million college students may have lost personal information in the last year.
• Identity theft is the fastest growing crime in the U.S.
• By far the biggest culprit? Lost or stolen computers.
Regulations, Standards, & Laws
• Federal – HIPAA, FERPA, SarbOx, GLB,… Identity Theft Protection Act?
• State – Many states passing identity theft protection laws; New York & Colorado have state CISO
• Industry – PCIDSS
The Technical Problem:Finding sensitive information in a
haystack
Kerry Havens
University of Colorado at Boulder
SSN Remediation
• At CU-Boulder, SSNs were used as a student identifier before 2004
• House Bill 03-1175 was approved in 2003 requiring institutions to change this method to ensure the privacy of a student’s social security number
• CU-Boulder started issuing student IDs to new students in July 2004 and converting SSNs to SIDs in 2005
Where the data is not stored
• File type exclusions – fine tuning– Binary files where the data cannot be read– Received input from community for fine tuning
• False positives– International telephone numbers– Examples for web form validation
• Why is the department webpage asking for SSNs?
OS and File Encoding Problems
• HTML encoding problems• Representations (pictures) of sensitive
data are not found– Examples include PDF
• Searching a UNIX filesystem– Preparing the file before searching for private
data– For example, using strings to extract text from
text/binary hybrids like .doc or .xls
Where the data is stored
• Typical file types of discovered data– Gradebooks– Course web pages– Homework assignments– Travel authorization forms– Personal financial documents– Email
Regular Expressions
• Returns too much data: /\d{3}-\d{2}-\d{4}/
• Searching for environment specific data in the hope that common data will lead us to more data:/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
• State specific information can be found at
http://www.ssa.gov/employer/stateweb.htm
Regular Expressions
• Let’s dissect this…
/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
Regular Expressions
• Let’s dissect this…
/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
Boundary
Regular Expressions
• Let’s dissect this…
/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
First acceptable digit
Regular Expressions
• Let’s dissect this…
/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
2, 4, or 6 digits in a row
Regular Expressions
• Let’s dissect this…
/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
Delimited by dash or space
Regular Expressions
• Let’s dissect this…
/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
Colorado specific prefix, not delimited
CU Experiences
• Pitfalls– Users’ interpretations of the log file– Fine tuning file extension exceptions and
regular expressions
• Recommendations– Keep current environment in mind
The Organizational Problem:a really big haystack
Steve Lovaas
Network Security Manager
Colorado State University
Organizational Vision
• Support from the top – Cabinet-level committee driving the project– Spurred by headlines and state mandates– VP for IT who really gets security
• Campus PR campaign– Web site– Public meetings
• Tied SSN purge to the rollout of a new CSUID in Fall 2006
Using Resources
• Project Constraints– Tight timeline– No budget – Not a trivial programming project
• Buy / Build / Leverage tools?
• Goal: 100% coverage vs. Best Effort
• Spider chosen for Windows, Linux, Mac
• Manual searching on AIX, mainframe
Ultimate Responsibility
• Original thought: deans / dept. heads
• Revised edition: individual employees
• Developed a personal attestation for for every employee to sign, submitted in bulk by colleges
• More work for central IT
• Senior VP: Doing the scan and signing the form is a CONDITION OF EMPLOYMENT
Individual Attestation Form
• Every employee• 2 choices:
– I don’t interact with SSNs in the course of my job
– SSNs in all electronic files under my control have been removed or encrypted
• VP for IT must approve exceptions
CSU Experiences
• Pitfalls– Beta tool for a live project requires quick response
and careful management of user expectations & acceptance
– Careful of deadlines, it’s a lot of work!
• Recommendations– Don’t do this kind of project without active support
from the very top– Anticipate the need for analysis/parsing tools– Have a supported encryption solution for exceptions
Cornell Spider
Wyman Miles
Sr. Security Engineer
Cornell University
A Brief History of Spider
• Early 2005, scan Web for SSNs
• Later, scan disk images for SSNs/CCNs
• March 2006, debut at BU Security Camp
• April 2006, Educause, demand for a Windows version
• Version 1.0 in May, 2.0 in June
A Brief History, II
• June 2006, major feedback from Steve: bug reports, tests, feature requests
• Engine developed that same month: internal incident response
• OSX Spider Sept 2006
• Windows Spider rewrite
• April 2007, GPL release of all Spiders
Current Spider
• SSN, SIN, CCN, NINO discovery in many file types
• Various data type validators
• Web scanning, back to its roots
• Scan for data in unallocated space
• Faster. More readable source
Various Spiders
• Windows Spider, aka Spider3
• OSX Spider
• Engine, general UNIX spider
• LinSpider, our oldest version
• Spider Simple: Windows Spider preconfigured to skip noisy files
Future Spider
• Feature set convergence between Engine, OSX, Windows
• Community Development
• Possible I2 hosting of distribution and documentation
• More documentation!
• Client-Server model revisited
Spider Log
Spider at Cornell
• Incident response: a compromise has happened, what was at risk?
• Pre-emptive– Dan Elswit, CALS Security Officer
Spider in CIT
• CIT abandoned SSNs a few years ago, but they remain
• Tech support uses Spider Simple to discover lurking SSNs
• Manual process
Athletics
• Spider Simple
• Unique log names to network share
• Centralized analysis
Spider Downloads
• http://www.cit.cornell.edu/security/tools
Summary
• Purging sensitive information is something we’re going to have to get good at
• Get support from the highest levels• Tune regular expressions and file/ext skip
lists for your environment• Anticipate parsing needs, exceptions• New Spider features, more users, broader
OS support• Spider also for ongoing support, forensics
Questions?
• Wyman Miles:– [email protected]
• Kerry Havens:– [email protected]
• Steve Lovaas:– [email protected]
• The Spider users’ list:– [email protected]