October 2014 Webinar: Cybersecurity Threat Detection
-
Upload
sqrrl -
Category
Data & Analytics
-
view
50 -
download
2
Transcript of October 2014 Webinar: Cybersecurity Threat Detection
Securely explore your data
CYBERSECURITY THREAT DETECTION
Deriving Insights with Sqrrl and Spark GraphX Adam Fuchs, CTO October 2014
WHAT WE’LL DISCUSS
3 © 2014 Sqrrl Data, Inc. | All Rights Reserved
• Security Analytics using (Big) Cybersecurity Data • You’ve been breached – what’s at stake?
• Dealing with the new security dilemma • The ‘Linked Data’ Approach
• Case study: internal network breach • Overview of scenario
• Data modeling with Sqrrl
• Detecting anomalies with Sqrrl and GraphX • Visual, contextual research and analysis
THE NUMBERS DON’T LIE
© 2014 Sqrrl Data, Inc. | All Rights Reserved | Proprietary and Confidential 4
229 87%
90% $12.7M Source: Mandiant Source: Verizon
Source: Verizon Source: Ponemon
TARGETED ATTACKS HAVE CHANGED THE GAME
5 © 2014 Sqrrl Data, Inc. | All Rights Reserved Source: Battery Ventures
WHAT DOES THIS MEAN FOR US?
• You’ve been breached. Deal with it.
• Empower the investigator
• Research and respond: better, faster, smarter
• It’s all about speed to understanding
© 2014 Sqrrl Data, Inc. | All Rights Reserved 6
Dissolution of the secure perimeter
© 2014 Sqrrl Data, Inc. | All Rights Reserved 7
Detecting attacks requires more (i.e. BIG) data
But your tools can’t handle the big data wave
So attackers are spilling in
THE SECURITY DATA DILEMMA
BIG DATA TRANSFORMED
© 2014 Sqrrl Data, Inc. | All Rights Reserved
Linked Contextual Knowledge
Perimeter Data
Network Data
Endpoint Data
Security Data
VPN FW
Network Data
Proxy NetFlow
Application Data
HR USB
Users
Websites Internal Servers
Client Devices Assets
Analysis
Search
Exploration
Reports
Anomalies
Data sources
Machine Learning
8
ARCHITECTURAL OVERVIEW
© 2014 Sqrrl Data, Inc. | All Rights Reserved 9
Commodity Hardware
HDFS + Accumulo
Raw Events Entity/Relationship Model
Query Engine Bulk/Graph Processing
Visualization / API ML + Anomaly Detection
Physical
Data Storage
Data Model
Processing
Interface
Audit
Cryptography
Labeling + P
olicy
Security
BREACH DETECTION SCENARIO
© 2014 Sqrrl Data, Inc. | All Rights Reserved
BREACH Compromised Laptop
NETFLOW:
NETWORK SCAN WINDOWS EVENT LOGS:
PASS THE HASH NETFLOW:
EXFIL
STOLEN CREDENTIALS WINDOWS EVENT LOGS: Unusually excessive logins
DB DUMP MSSQL EVENT LOG: Unscheduled backup
i
RECON / DELIVERY EXPLOIT / INSTALL C2 / ACTION
p a
W q
mins hours days weeks months
11
CASE STUDY MODEL
© 2014 Sqrrl Data, Inc. | All Rights Reserved 12
Data Sources
Users
Hosts
login
Linked Meta Model
flow
login
DNS records
Netflow
Host logs
Database logs
External Alerts
CASE STUDY EXAMPLE MAPPING
© 2014 Sqrrl Data, Inc. | All Rights Reserved 13
Netflow Records
startTime endTime sourceIP destIP source
Port destPort protocol tcpFlags bytesIn bytesOut
10/22/14 8:58
10/22/14 8:58 10.0.2.15 192.168.0.123 37051 139 TCP ...RS. 100 3355
10/22/14 8:45
10/22/14 8:45 10.0.2.15 192.168.0.6 0 3328 ICMP ...... 40 100
10/22/14 8:59
10/22/14 8:59
192.168.0.119 10.0.2.15 139 60071 TCP .A..S. 46 351
10.0.2.15
192.168.0.123
Class=Flow, totalBytes = 3455
192.168.0.6
Class=Flow, totalBytes = 140
CASE STUDY EXAMPLE DATA
© 2014 Sqrrl Data, Inc. | All Rights Reserved 14
Jane
Class=User: id=Jane,
loginAttempts=82
192.168.10.94 login
74.129.94.19
Class=Host: id=74.129.94.19,
bytesTransfered={2014-09-30/01:00:
64472381}
Class=Host: id=192.168.10.94,
hostname=kali, bytesTransfered={2014-09-30/01:00:
64472381}
flow
192.168.10.120
Class=Host: id=192.168.10.120, hostname=msserv bytesTransfered=
{2014-09-30/04:00: 42318}
INVESTIGATION PROCESS
© 2014 Sqrrl Data, Inc. | All Rights Reserved 15
1. Set the Stage 2. Enable Search
and Discovery 3. Automate
Analysis
• Define the security-centric entity/relationship model
• Extract and maintain the model
• Visually navigate assets and actors in the network
• Drill down to the raw data seeding the model
• Use behavioral analytics to build expectations of ‘normal’
• Flag entities as potentially ‘abnormal’ and sniff them out
APACHE SPARK 101 We use Spark because: 1. Meets core processing
requirements • Pre-canned algorithms • Native support for graph
processing • Simple programmability
2. Good performance • Low latency for many small
jobs • Scalability for big jobs
3. Natural fit • Ties with Hadoop ecosystem
simplified integration
© 2014 Sqrrl Data, Inc. | All Rights Reserved 17
ROUND-TRIPPING WITH SPARK
© 2014 Sqrrl Data, Inc. | All Rights Reserved 18
Algorithmic Enrichment
SqrrlGraphInputFormat SqrrlGraph.update(uuid, values)
Sqrrl Graph Store
Input Data
Ingest/ Extract
Serve/Analyze
Sqrrl UI • DNS • Netflow • Windows
Logs • DB logs • Alert data
STRUCTURAL FEATURES
© 2014 Sqrrl Data, Inc. | All Rights Reserved 19
Triangle Counting: • Given node A, find edges AB, AC, BC • For nodes B, C in A’s neighborhood, is
P(BC) > E/N2
Node Degree: • Given node A, how many nodes
within 1 or 2 edges?
Page Rank: • Iteratively transfer weight
proportionally to neighbors • Converges on entity importance
SPARK OUTLIER DETECTION
• Use GraphX to load Sqrrl graph model • Entities: Users, Hosts
• Relationships: Flows, Logins (both user and host) • Loads an RDD with Sqrrl graph in Spark
• For every node, generate features: • GraphX built-in methods:
• Degree, Triangle Count, PageRank
• Implemented in Spark by Sqrrl: • edgeWeightTotal, totalNeighborDegree
© 2014 Sqrrl Data, Inc. | All Rights Reserved 20
Detail on data flow and algorithms
SPARK OUTLIER DETECTION
• Transform statistics to feature matrix, run PCA • Creates ranked list of high-variance dimensions, most
likely indicative of an entity’s “outlierness” • PCA run with Spark MLLib
• Top feature pairs: • totalNeighborDegree vs. edgeWeightTotal • Degree vs. edgeWeightTotal
• Create “distance” measure using pairs to flag anomalies
© 2014 Sqrrl Data, Inc. | All Rights Reserved 21
Detail on data flow and algorithms