Sample of Data Security and Knowledge Discovery Research at the University of Texas at Dallas Dr....
-
Upload
richard-kelley -
Category
Documents
-
view
217 -
download
4
Transcript of Sample of Data Security and Knowledge Discovery Research at the University of Texas at Dallas Dr....
Sample of Data Security and Knowledge Discovery
Research at the University of Texas at Dallas
Dr. Bhavani ThuraisinghamDr. Latifur Khan
Dr. Murat KantarciogluDr. Kevin Hamlen
September 20, 2007
204/20/23 18:16
Outline
0 Data and Applications Security- Information sharing, Geospatial data management,
Surveillance, Secure web services, Privacy, Dependable information management, Intrusion detection
0 Data Mining an d Knowledge Discovery- Data Mining for Security Applications, Data Mining for
Bioinformatics, Data Mining for Data and Software Quality
304/20/23 18:16
Research Group: Data and Applications Security
0 Core Group- Prof. Bhavai Thuraisingham (Professor & Director,
Cyber Security Research Center)- Prof. Latifur Khan (Director, Data Mining Laboratory)- Prof. Murat Kantarcioglu (Joined Fall 2005, PhD.
Purdue)- Prof. Kevin Hamlen (Peer to Peer systems Security,
Joined 2006 from Cornell U.)0 Students and Funding
- Over 20 PhD Students, 40 MS students (combined)- Research grants Air Force Office of Scientific
Research NSF, NGA, Raytheon, - - - -
404/20/23 18:16
Vision 1: Assured Information Sharing
PublishData/Policy
ComponentData/Policy for Agency A
Data/Policy for Coalition
PublishData/Policy
ComponentData/Policy for Agency C
ComponentData/Policy for Agency B
PublishData/Policy
1. Friendly partners
2. Semi-honest partners
3. Untrustworthy partners
Research funded by two
grants from AFOSR
504/20/23 18:16
Vision 2: Secure Geospatial Data Management
Data Source A
Data Source B
Data Source CSECURITY/ QUALITY
Semantic Metadata ExtractionDecision Centric FusionGeospatial data interoperability through web servicesGeospatial data miningGeospatial semantic web
Tools for Analysts
Research Supported by Raytheon on pne grant; working on robust prototypes on second grant
604/20/23 18:16
Vision 3: Surveillance and Privacy
Raw video surveillance data
Face Detection and Face Derecognizing system
Suspicious Event Detection System
Manual Inspection of video data
Comprehensive security report listing suspicious events and people detected
Suspicious people found
Suspicious events found
Report of security personnel
Faces of trusted people derecognized to preserve privacy
704/20/23 18:16
Example Projects
0 Assured Information Sharing
- Secure Semantic Web Technologies
- Social Networks and game playing
- Privacy Preserving Data Mining
0 Geospatial Data Management
- Secure Geospatial semantic web
- Geospatial data mining
0 Surveillance
- Suspicious Event Detention
- Privacy preserving Surveillance
- Automatic Face Detection, RFID technologies
0 Cross Cutting Themes
- Data Mining for Security Applications (e.g., Intrusion detection, Mining Arabic Documents); Dependable Information Management
804/20/23 18:16
Social Networks
0 Individuals engaged in suspicious or undesirable behavior rarely act alone
0 We can infer than those associated with a person positively identified as suspicious have a high probability of being either:- Accomplices (participants in suspicious activity)- Witnesses (observers of suspicious activity)
0 Making these assumptions, we create a context of association between users of a communication network
904/20/23 18:16
Privacy Preserving Data Mining
0 Prevent useful results from mining
- Introduce “cover stories” to give “false” results
- Only make a sample of data available so that an adversary is unable to come up with useful rules and predictive functions
0 Randomization and Perturbation
- Introduce random values into the data and/or results
- Challenge is to introduce random values without significantly affecting the data mining results
- Give range of values for results instead of exact values
0 Secure Multi-party Computation
- Each party knows its own inputs; encryption techniques used to compute final results
1004/20/23 18:16
Framework for Geospatial Data Security
DATA PRESENTATION COMPONENTS
Access Control Module
Geospatial Data Registration
spatial and temporal registration of geospatial data
Data Integration Services&
Data Repository Access
DATA ACCESS LAYER
DAC/RBAC Policy Specification
Policy ReasoningEngine
Trust & Privacy Management
Authentic Data Publication
Auditing
Misuse Detection
SECURITY LAYER
OpenGeospatialConsortiumFramework
Core &ApplicationSchemas
GeospatialFeatures
GeographyMarkupLanguage
Metadata
GIS Web ServicesTraditional GIS
Wrapper
GeospatialDataRepositories
1104/20/23 18:16
Data Mining for Surveillance
0 We define an event representation measure based on low-level features
0 This allows us to define “normal” and “suspicious” behavior and classify events in unlabeled video sequences appropriately
0 A visualization tool can then be used to enable more efficient browsing of video data
1204/20/23 18:16
Data Mining for Intrusion Detection / Worm Detection
TrainingData Classification
Hierarchical Clustering (DGSOT)
Testing
Testing Data
SVM Class Training
DGSOT: Dynamically growing self organizing treeSVM: Support Vector Machine
1304/20/23 18:16
Intrusion Detection: Results
Training Time, FP and FN Rates of Various Methods
MethodsAverage
Accuracy
Total Training Time
Average FP
Rate (%)
Average FN
Rate (%)
Random Selection 52% 0.44 hours 40 47
Pure SVM 57.6% 17.34 hours 35.5 42
SVM+Rocchio Bundling
51.6% 26.7 hours 44.2 48
SVM + DGSOT 69.8% 13.18 hours 37.8 29.8
1404/20/23 18:16
Information Assurance Education
Current CoursesIntroduction to Information Security: Prof. ShaTrustworthy Computing: Prof. Sha Cryptography: Profs. Sudborough, MuratInformation Assurance: Prof. YenData and Applications Security: Prof. Bhavani ThuraisinghamBiometrics: Prof. Bhavani Privacy: Prof. Murat KantarciogluSecure Language, prof. Kevin HamlenDigital Forensics: Prof. Bhavani Thuraisingham
Future CoursesNetwork Security: Profs. Ventatesan, Sarac Security Engineering: Profs. Bastani, CooperIntrusion Detection: Profs. Khan, ThuraisinghamDigital Watermarking: Prof. Prabhakaran
Courses at AFCEA and AF BasesKnowledge Management, Data Mining for Counter-terrorism, Data Security, preparing a course on SOA and NCES with Prof. Alex Levis - GMU and Prof. Hal Sorenson - UCSD)
1504/20/23 18:16
Knowledge Discovery in Images
0 Goal: Find unusual changesProcess:
- Use data mining to model normal differences between images
- Find places where differences don’t match model
0 Questions to be answered:
- What are the right mining techniques?
- Can we get useful results?
1604/20/23 18:16
Change Detection:
0 Trained Neural Network to predict “new” pixel from “old” pixel- Neural Networks good for multidimensional continuous data- Multiple nets gives range of “expected values”
0 Identified pixels where actual value substantially outside range of expected values- Anomaly if three or more bands (of seven) out of range
0 Identified groups of anomalous pixels
1704/20/23 18:16
Multimedia/Image Mining
Images Segments Blob-tokens
Automatically annotate images then retrieve based on the textual annotations.
1804/20/23 18:16
Web Page Prediction: Problem Description
?
Financial Aid Information (P3)
Office of admission (P1)
VIP web page (P2)
What page is Next??
1904/20/23 18:16
Web Page Prediction: Architecture
User sessions
MarkovModel
Dempster’s Rule
Feature Extraction
SVM Sigmoid mappingSVM
output
ANN Sigmoid mappingANN
output
Markovprediction
SVMprediction
ANNPrediction
fusion
FinalPrediction
2004/20/23 18:16
Misuse/Misinformation/ Insider threat
0 %50 of corporate breaches or losses of information that were made public in the past year were insider attacks
0 %50 of those insider attacks were the thefts of information by employees
0 It is hard to model individuals!!!0 Role based access control provides tools to model given
roles0 Challenge: How to develop models for predicting normal
usage of a role vs misuse?0 Challenge: How to integrate misuse, auditing and access
control systems?0 Current Status: We are developing misuse detection system
based on clustering; Risk-based analysis
2104/20/23 18:16
Time Constrained KDD: Proposal to AFOSR with UIUC
0 The military must continually carry out the followed operations:
- Surveillance: monitor the behavior of the people or objects to see if they are deviating from the norm; Maneuver – Place the enemy in a position of disadvantage through the flexible application of combat power; Mass: the effects of overwhelming combat power at the decisive place and time; Attack: an attempt to actively strike at the enemy, as opposed to a defensive plan.
0 Track the enemy and DETER him during surveillance and maneuver stage through
- Knowledge Discovery: Extract concepts from the stream data arriving from the sensors; Time Constrained Activity Analysis: Extract knowledge from the enemy activities arriving in the form of streams; Ontology Management: Develop ontologies and subsequently conduct multi-modal data analysis of the multimedia data captured and resolve conflicts and uncertainty; Resource Allocation: Utilize the knowledge discovered, apply decision theories and determine resource allocation
2204/20/23 18:16
Some Experiences with Tools
0 Tools developed in-house
- Image mining tool, Data Sharing Tool,
- Intrusion detection/Malicious code detection tools, Web page prediction tool
- Multimedia mining/Image extraction including MPEG7 feature descriptors
- Cluster visualization tool
0 External tools
- Oracle data mining product
- IDIS data mining tool
- WEKA data mining tool
- XML SPIE and QUIP
- INTEL OpenCV
2304/20/23 18:16
Technical and Professional Accomplishments
Publications of research in top journals and conferences, books IEEE Transactions, ACM Transactions, 8 books published and 2 books in preparation including one on UTD research (Data Mining Applications, Awad, Khan and Thuraisingham)
Member of Editorial Boards/Editor in Chief Journal of Computer Security, ACM Transactions on Information and Systems Security, IEEE Transactions on Dependable and Secure Computing, IEEE Transactions on Knowledge and Data Engineering, Computer Standards and Interfaces - - -
Advisory Boards / Memberships/OtherPurdue University CS Department, Invitations to write articles in Encyclopedia Britannica on data mining, Keynote addresses, Talks at DFW NAFTA and Chamber of Commerce, Commercialization discussions of data mining tools for security
Awards and Fellowships IEEE Fellow, AAAS Fellow, BCS Fellow, IEEE Technical Achievement Award, IEEE Senior Member
2404/20/23 18:16
Our Model: R&D, Technology Transfer Standardization and Commercialization Basic Research (6-1 Type)
Funding agencies such as NSF, AFOSR, NGA, - - - -, etc. ; Publish our research in top journals (ACM and IEEE Transactions)
Applied Research Some federal funding (e.g., from government programs) and Commercial Corporations (e.g., Raytheon); Our current collaboration with AFRL-ARL
Technology Transfer / DevelopmentWork with corporations such as Raytheon to showcase our research to sponsors (e.g., GEOINT) and transfer research to operational programs such as DCGS
StandardizationOur collaborations with OGC, OASIS and standardization of our research (e.g., GRDF)
Commercialization Patents, Work with VCs, Corporations, SBIR, STTR for commercialization of our tools (e.g., our work on data mining tools)
2504/20/23 18:16
Our Vision for Assured Information Sharing/KDD
Time constrained KDD(Future)
Link Analysis(AFOSR, Texas)
Game Theory(AFOSR Dependable
Information Management(Texas)
Misinformation/Misuse(AFOSR)
Geospatial(NGA, Raytheon)
Semantic Web(NSF, AFOSR)
Incentive based Knowledgemanagement(Future)
AssuredInformationSharing/KDDPrivacy
Preserving data mining(Texas)
Technologies will contribute to Assured Information Sharing
2604/20/23 18:16
Our Collaborations inAssured Information Sharing and KDD
Time Constrained KDD(UIUC)
Link Analysis(UGA, UAZ)
Game Theory(UTD Management School)
Dependable Information Management(UCR, UTSA)
Misinformation/Misuse(Purdue)
Geospatial(UMN, UCD, Purdue, WVU, UCF)
Semantic Web(UMBC, UTSA)
Knowledgemanagement(SUNY Buffalo)
AssuredInformationSharing/KDDPrivacy
Preserving data mining(Purdue)