Semantically-Enabled Digital Investigations

25
Semantically-enabled Digital Investigations by Spyridon Dosis

Transcript of Semantically-Enabled Digital Investigations

Page 1: Semantically-Enabled Digital Investigations

Semantically-enabled Digital Investigations

by Spyridon Dosis

Page 2: Semantically-Enabled Digital Investigations

Outline

• Problem

• Background

• Developed Method

• Demonstration

• Conclusions

2023-04-15 ISACA Dagen 2013

Page 3: Semantically-Enabled Digital Investigations

Problem Area

• Complex attacks against networked systems

• Multiple data sources of possible evidentiary value– Volume & Variety– ”looking for a needle in a stack of

needles” – Paul Pillar, CIA CoA

• Analysis of the collected digital data– Least formalized process step– Rely on investigators’ expertise and

experience2023-04-15 ISACA Dagen 2013

Page 4: Semantically-Enabled Digital Investigations

Digital Evidence / Investigations

• Reliable digital data that support hypothesizing about a security incident

• Sound methods for collecting and interpreting digital data

• Reconstruct events found to be criminal (DF)

• Investigate and learn from information security breaches (IR)

2023-04-15 ISACA Dagen 2013

Page 5: Semantically-Enabled Digital Investigations

Forensic Tools

• Interpreters between data abstraction layers– e.g. Reconstruct raw disk data into

filesystem hierarchy and objects (files, directories)

• Evidence- but not investigation-centric design

• Limited tool interoperability– Manual integration of tool findings– Multiple (proprietary, undocumented)

data formats/models

2023-04-15 ISACA Dagen 2013

Page 6: Semantically-Enabled Digital Investigations

A Digital Investigation Example

ISACA Dagen 20132023-04-15

Page 7: Semantically-Enabled Digital Investigations

Semantic Web & Linked Data Technologies

• ”… information is given well-defined

meaning, better enabling computers

and people to work in cooperation” –

(Tim Berners Lee, 2001)

• Ontology – ”explicit and formal

specification of a conceptualization”– Entities, attributes, relationships

• Metadata - Context-based or domain-

specific annotation of data

• Reason and inference of implicit facts2023-04-15 ISACA Dagen 2013

Page 8: Semantically-Enabled Digital Investigations

Semantic Web Architecture

• URI/IRI enables global data object

identification

• XML provides a machine readable,

validatable data encoding scheme

• RDF(S) is a metadata data model and

knowledge representation language– Subject-Property-Object/Value statements– Class and Property hierarchies

• OWL 2 is a more expressive KR

language for specifying ontologies– Restrictions, Equivalence, Cardinality,

Property Chains

• Rule and RDF-query languages2023-04-15 ISACA Dagen 2013

Page 9: Semantically-Enabled Digital Investigations

Method Overview

2023-04-15 ISACA Dagen 2013

Data CollectionSemantic Representation

Ontological ReasoningRule-based Reasoning

Integrated Query

Page 10: Semantically-Enabled Digital Investigations

Domain Ontologies

• Introduced a set of lightweight domain-specific OWL ontologies– Storage Media– Network Traffic

– Windows Firewall Log, WHOIS RIR DB– Malicious Networks Reputation List– Malware Detection

2023-04-15 ISACA Dagen 2013

Page 11: Semantically-Enabled Digital Investigations

Evidence Representation (Graph)

2023-04-15 ISACA Dagen 2013

Page 12: Semantically-Enabled Digital Investigations

Semantic Representation

• Resource Unique Identification Scheme

• Parsing tools able to process each source type with respect to the domain ontology

2023-04-15 ISACA Dagen 2013

Page 13: Semantically-Enabled Digital Investigations

Evidence Integration

• Automated linking among (homo/hetero-)geneous evidence

sources based on key properties & matching rules

2023-04-15 ISACA Dagen 2013

Page 14: Semantically-Enabled Digital Investigations

Evidence Correlation

• Link instances of dissimilar type across a shared domain

• Temporal Correlation– Rules for establishing time

instant & interval relations among recovered artifacts

• Mereological Correlation– “partOf” transitivity relations

2023-04-15 ISACA Dagen 2013

Page 15: Semantically-Enabled Digital Investigations

Semantic Integration & Correlation

2023-04-15 ISACA Dagen 2013

Page 16: Semantically-Enabled Digital Investigations

Integrated Query

• Purpose-built triplestore (graph) database engine can store the final dataset– Up to billions of triples

• SQL-like queries against the integrated/correlated evidence set

• Graph pattern matching techniques

2023-04-15 ISACA Dagen 2013

Page 17: Semantically-Enabled Digital Investigations

A PoC Instantiation

• Evidence Manager

• Filtering / Pre-processing

• Semantic Parser

• Inference Engine

• Classification, Inverse & Transitive Properties

• Rule & Query Engines 2023-04-15 ISACA Dagen 2013

Page 18: Semantically-Enabled Digital Investigations

Experiment A

2023-04-15 ISACA Dagen 2013

Page 19: Semantically-Enabled Digital Investigations

Experiment B

2023-04-15 ISACA Dagen 2013

Page 20: Semantically-Enabled Digital Investigations

Sample Query

• “Is any file resident on the disk malicious and if yes where has it been downloaded from and which ISP did the IP belong to?”

2023-04-15 ISACA Dagen 2013

Page 21: Semantically-Enabled Digital Investigations

Sample Query

SELECT DISTINCT ?pathName ?uri ?ipvalue ?asnumber ?linkWHERE {?file rdf:type digitalmedia:File .?file digitalmedia:hasPathName ?pathName .?file digitalmedia:hasMD5 ?md5 .?httpbody integration:HTTPContentToMediaFile ?file .?file integration:MediaFileToVTFile ?vtfile .?vtfile virustotal:hasAVReport ?report .?report virustotal:hasPermanentLink ?link .?httpresp http:body ?httpbody .?httpreq http:requestURI ?uri .?httpreq http:resp ?httpresp .?http packetcapture:hasHTTPRequest ?httpreq .?http rdf:type packetcapture:HTTP .?tcpflow packetcapture:hasApplicationLayerProtocol ?http .?tcpflow packetcapture:hasDestinationIP ?destip .?destip packetcapture:hasIPValue ?ipvalue .?destip integration:PcapIPToWHOISIpAddr ?whoisip .?whoisip whois:isContainedInRange ?range .?range whois:hasRange ?rangeValue .?range whois:isContainedInAS ?as .?as whois:hasNetName ?netname .?as whois:hasASNumber ?asnumber

2023-04-15 ISACA Dagen 2013

Page 22: Semantically-Enabled Digital Investigations

Example Hypothesies-Queries

• Have there been any unsuccessful connection attempts from systems in the same network as the one that hosted the malicious file?

• Which disk files have been created or accessed shortly after the malicious file was downloaded?

• Has there been any successful connection between our system and a known malicious host?

• Which files have been accessed shortly before the host communicated with any blacklisted network host?

• Which websites have been visited by the user shortly before the download of the malicious file?

2023-04-15 ISACA Dagen 2013

Page 23: Semantically-Enabled Digital Investigations

Summary

• Ability to represent and integrate heterogeneous data

• Supports the formulation and execution of complex queries

• Expandable (ontologies, rules, queries)

• Computational complexity depends on the ontology, rules, amount of data

• Reliance to online data sources may affect the accuracy of the results

2023-04-15 ISACA Dagen 2013

Page 24: Semantically-Enabled Digital Investigations

Future Work

• Advanced reasoning capabilities (e.g. detect

anti-forensic inconsistencies)

• Extended analysis techniques (e.g. additional

data sources, user activities)

• Large scale performance evaluation, distributed

architecture

• User-friendly graphical interface for rule/query

formulation and result navigation

2023-04-15 ISACA Dagen 2013

Page 25: Semantically-Enabled Digital Investigations

Thank you

2023-04-15 ISACA Dagen 2013