Sensitive Information Sweep
-
Upload
nicholai-dima -
Category
Documents
-
view
16 -
download
1
description
Transcript of Sensitive Information Sweep
Sensitive Information Sweep
Using Cornell’s Spider
Wyman Miles, Cornell University
Kerry Havens, University of Colorado at Boulder
Steve Lovaas, Colorado State University
Overview
• Quick Background
• The Technical Problem (Kerry)
• The Organizational Problem (Steve)
• Spider (Wyman)
• Summary & Questions
What is “Sensitive Information”?
• A Growing Concern
• A Moving Target
• SSN, Credit Card, Driver’s License, Medical Records, Student Information, Proprietary Research,…
• Data in Context – Aggregation
Why Are We All Here?
• The Front Page!
• CDW-G 2006 Survey – more than 3 million college students may have lost personal information in the last year.
• Identity theft is the fastest growing crime in the U.S.
• By far the biggest culprit? Lost or stolen computers.
Regulations, Standards, & Laws
• Federal – HIPAA, FERPA, SarbOx, GLB,… Identity Theft Protection Act?
• State – Many states passing identity theft protection laws; New York & Colorado have state CISO
• Industry – PCIDSS
The Technical Problem:Finding sensitive information in a
haystack
Kerry Havens
University of Colorado at Boulder
SSN Remediation
• At CU-Boulder, SSNs were used as a student identifier before 2004
• House Bill 03-1175 was approved in 2003 requiring institutions to change this method to ensure the privacy of a student’s social security number
• CU-Boulder started issuing student IDs to new students in July 2004 and converting SSNs to SIDs in 2005
Where the data is not stored
• File type exclusions – fine tuning– Binary files where the data cannot be read– Received input from community for fine tuning
• False positives– International telephone numbers– Examples for web form validation
• Why is the department webpage asking for SSNs?
OS and File Encoding Problems
• HTML encoding problems• Representations (pictures) of sensitive
data are not found– Examples include PDF
• Searching a UNIX filesystem– Preparing the file before searching for private
data– For example, using strings to extract text from
text/binary hybrids like .doc or .xls
Where the data is stored
• Typical file types of discovered data– Gradebooks– Course web pages– Homework assignments– Travel authorization forms– Personal financial documents– Email
Regular Expressions
• Returns too much data: /\d{3}-\d{2}-\d{4}/
• Searching for environment specific data in the hope that common data will lead us to more data:/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
• State specific information can be found at
http://www.ssa.gov/employer/stateweb.htm
Regular Expressions
• Let’s dissect this…
/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
Regular Expressions
• Let’s dissect this…
/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
Boundary
Regular Expressions
• Let’s dissect this…
/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
First acceptable digit
Regular Expressions
• Let’s dissect this…
/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
2, 4, or 6 digits in a row
Regular Expressions
• Let’s dissect this…
/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
Delimited by dash or space
Regular Expressions
• Let’s dissect this…
/\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} |
(52[1-4]|65[0-3])\d{6})\b/
Colorado specific prefix, not delimited
CU Experiences
• Pitfalls– Users’ interpretations of the log file– Fine tuning file extension exceptions and
regular expressions
• Recommendations– Keep current environment in mind
The Organizational Problem:a really big haystack
Steve Lovaas
Network Security Manager
Colorado State University
Organizational Vision
• Support from the top – Cabinet-level committee driving the project– Spurred by headlines and state mandates– VP for IT who really gets security
• Campus PR campaign– Web site– Public meetings
• Tied SSN purge to the rollout of a new CSUID in Fall 2006
Using Resources
• Project Constraints– Tight timeline– No budget – Not a trivial programming project
• Buy / Build / Leverage tools?
• Goal: 100% coverage vs. Best Effort
• Spider chosen for Windows, Linux, Mac
• Manual searching on AIX, mainframe
Ultimate Responsibility
• Original thought: deans / dept. heads
• Revised edition: individual employees
• Developed a personal attestation for for every employee to sign, submitted in bulk by colleges
• More work for central IT
• Senior VP: Doing the scan and signing the form is a CONDITION OF EMPLOYMENT
Individual Attestation Form
• Every employee• 2 choices:
– I don’t interact with SSNs in the course of my job
– SSNs in all electronic files under my control have been removed or encrypted
• VP for IT must approve exceptions
CSU Experiences
• Pitfalls– Beta tool for a live project requires quick response
and careful management of user expectations & acceptance
– Careful of deadlines, it’s a lot of work!
• Recommendations– Don’t do this kind of project without active support
from the very top– Anticipate the need for analysis/parsing tools– Have a supported encryption solution for exceptions
Cornell Spider
Wyman Miles
Sr. Security Engineer
Cornell University
A Brief History of Spider
• Early 2005, scan Web for SSNs
• Later, scan disk images for SSNs/CCNs
• March 2006, debut at BU Security Camp
• April 2006, Educause, demand for a Windows version
• Version 1.0 in May, 2.0 in June
A Brief History, II
• June 2006, major feedback from Steve: bug reports, tests, feature requests
• Engine developed that same month: internal incident response
• OSX Spider Sept 2006
• Windows Spider rewrite
• April 2007, GPL release of all Spiders
Current Spider
• SSN, SIN, CCN, NINO discovery in many file types
• Various data type validators
• Web scanning, back to its roots
• Scan for data in unallocated space
• Faster. More readable source
Various Spiders
• Windows Spider, aka Spider3
• OSX Spider
• Engine, general UNIX spider
• LinSpider, our oldest version
• Spider Simple: Windows Spider preconfigured to skip noisy files
Future Spider
• Feature set convergence between Engine, OSX, Windows
• Community Development
• Possible I2 hosting of distribution and documentation
• More documentation!
• Client-Server model revisited
Spider Log
Spider at Cornell
• Incident response: a compromise has happened, what was at risk?
• Pre-emptive– Dan Elswit, CALS Security Officer
Spider in CIT
• CIT abandoned SSNs a few years ago, but they remain
• Tech support uses Spider Simple to discover lurking SSNs
• Manual process
Athletics
• Spider Simple
• Unique log names to network share
• Centralized analysis
Spider Downloads
• http://www.cit.cornell.edu/security/tools
Summary
• Purging sensitive information is something we’re going to have to get good at
• Get support from the highest levels• Tune regular expressions and file/ext skip
lists for your environment• Anticipate parsing needs, exceptions• New Spider features, more users, broader
OS support• Spider also for ongoing support, forensics
Questions?
• Wyman Miles:– [email protected]
• Kerry Havens:– [email protected]
• Steve Lovaas:– [email protected]
• The Spider users’ list:– [email protected]