The Heatmap - Why is Security Visualization so Hard?
-
Upload
raffael-marty -
Category
Internet
-
view
2.832 -
download
2
description
Transcript of The Heatmap - Why is Security Visualization so Hard?
Raffael Marty, CEO
The HeatmapWhy is Security Visualization so Hard?
Area41 Zurich, Switzerland June 2, 2014
Secur i ty. Analyt ics . Ins ight .2
Heatmaps
Secur i ty. Analyt ics . Ins ight .3
I am Raffy - I do Viz!
IBM Research
Secur i ty. Analyt ics . Ins ight .4
Attacks have changed:
• Targeted
• Objectives beyond
monetization
• Low and Slow
• Multiple access vectors
• Remotely controlled
The (New) Threat Landscape
APT 1 Unit 61398 (61398部�)
Motivations have changed:
• Nation state sponsored
• Political, economic, and military
advantage
• Monetization / Crimeware
• Religion
• Hacktivism
Security approaches failed due to:
• Reliance on past knowledge /
signatures
• Systems are too rigid (e.g, schema)
• Poor scalability
• Limited knowledge exchange
Secur i ty. Analyt ics . Ins ight .5
How Compromises Are Detected
Mandiant M Trends Report 2014 Threat Report
Attackers in networks before detection
27 days
229 days
Average time to resolve a cyber attack
Successful attacks per company per week1.4
Average cost per company per year$7.2M
Secur i ty. Analyt ics . Ins ight .6
Our Security Goals
!!
Find Intruders and ‘New Attacks’!!
Discover Exposure Early!!
Communicate Findings
Secur i ty. Analyt ics . Ins ight .7
Visualize Me Lots (>1TB) of Data
!! SecViz is Hard!
Secur i ty. Analyt ics . Ins ight .8
Visualize 1TB of Data - What Graph?
drop reject NONE ctl acceptDNS Update Failed
Log InIP Fragments
Max Flows InitiatedPacket Flood
UDP FloodAggressive Aging
BootpRenew
Log OutRelease
NACKConflict
DNS Update SuccessfulDNS record not deleted
DNS Update RequestPort Flood
1 10000 100000000
How much information does each of the graphs convey?
Secur i ty. Analyt ics . Ins ight .9
The Heatmap
Matrix A, where aij are integer values mapped to a color scale.
aij = 1 10 20 30 40 50 60 70 80 >90
42
rows
columns
Secur i ty. Analyt ics . Ins ight .10
Mapping Data to a Heatmap
values = how often was <row_item> seen
time
rows = source ip
columns = time
Secur i ty. Analyt ics . Ins ight .11
Mapping Log Records to Heatmaps
May 5 23:57:50 pixl-ram sudo: pam_unix(sudo:session): session opened for user root by ram(uid=0)
root
ram
peg
sue
}
∆t .. time bin
Secur i ty. Analyt ics . Ins ight .11
Mapping Log Records to Heatmaps
May 5 23:57:50 pixl-ram sudo: pam_unix(sudo:session): session opened for user root by ram(uid=0)
root
ram
peg
sue
}
∆t .. time bin
Secur i ty. Analyt ics . Ins ight .11
Mapping Log Records to Heatmaps
May 5 23:57:50 pixl-ram sudo: pam_unix(sudo:session): session opened for user root by ram(uid=0)
root
ram
peg
sue
}
∆t .. time bin
⨍()=+1
Secur i ty. Analyt ics . Ins ight .12
• Scales well to a lot of data (can aggregate ad infinitum)
• Shows more information than a bar chart
• Flexible ‘measure’ mapping
• frequency count
• sum(variable) [avg(), stddev(), …]
• distinct count(variable)
Why Heatmaps?
Secur i ty. Analyt ics . Ins ight .12
• Scales well to a lot of data (can aggregate ad infinitum)
• Shows more information than a bar chart
• Flexible ‘measure’ mapping
• frequency count
• sum(variable) [avg(), stddev(), …]
• distinct count(variable)
Why Heatmaps?
• BUT information content is limited!
• Aggregates too highly in time and potentially value dimensions
Secur i ty. Analyt ics . Ins ight .13
Data Visualization Workflow
Overview Zoom / Filter Details on Demand
Secur i ty. Analyt ics . Ins ight .14
Heatmap
• Can pack millions of records (although highly aggregated)
• Allows for zoom-in to expose detail
• By itself exposes patterns
• Great ‘navigation’ tool to drill into different, ‘non-scalable’ visualization
!
• No other visualization possesses these properties
Data Visualization Workflow - Overview
Secur i ty. Analyt ics . Ins ight .15
1. Labels
HeatMap Challenges - Display
<1px per label 1000s of rows
Secur i ty. Analyt ics . Ins ight .16
2. Mouse-Over
• What information to show?
• Position - x/y coordinates
• Original records
• Query backend for each position?
HeatMap Challenges - Display
Secur i ty. Analyt ics . Ins ight .17
3. Sorting
• Random
• Alphabetically
• Based on values
• Similarity
• What algorithm?
• What distance metric?
• Leverage third data field / context?
HeatMap Challenges - Displayrandom row order
rows clustered
user
Secur i ty. Analyt ics . Ins ight .18
4. Overplotting
• How to summarize multiple rows in one pixel?
• Sum?
• Overplot x and y axes?
• Undo overplot on zoom?
1 row -> 1 pixel
n rows -> 1 pixel
1 row -> m pixels
}∑
HeatMap Challenges - Display
Secur i ty. Analyt ics . Ins ight .19
1. Time Selection
• Take screen resolution into account(you have 1000 pixels and you query 1005 seconds?)
• Chose start AND end time? • Communicate to user what data is available?
HeatMap Challenges - Interaction
start time end time
Secur i ty. Analyt ics . Ins ight .20
2. Zoom and Pan
• Re-query for more
detail?
HeatMap Challenges - Interaction
Secur i ty. Analyt ics . Ins ight .21
3. Color Scales / Ranges
• discrete
• continuous
• different colors
• multiple anchors
HeatMap Challenges - Interaction
Secur i ty. Analyt ics . Ins ight .22
4. Exposure - Mapping data to color
HeatMap Challenges - Interaction
values
frequency
dark colors under utilized
Secur i ty. Analyt ics . Ins ight .23
5. Pivot
HeatMap Challenges - Interaction
destinationAddress
Secur i ty. Analyt ics . Ins ight .23
5. Pivot
HeatMap Challenges - Interaction
destinationAddress
sourceAddress WHERE destinationAddress = 81.223.6.41
Secur i ty. Analyt ics . Ins ight .24
Different backend technologies (big data)
• Key-value store
• Search engine
• GraphDB
• RDBMS
• Columnar - can answer analytical questions
• Hadoop (Map Reduce)
• good for operations on ALL data
HeatMap Challenges - Backend
Other things to consider:
• Caching
• Joins
Secur i ty. Analyt ics . Ins ight .25
• Showing relationships
-> link graphs
!
!
!
• Showing multiple dimensions and their inter-
relatedness
-> || coords
What’s the HeatMap Not Good At
Secur i ty. Analyt ics . Ins ight .26
Heatmaps Are Good Starting Points … BUT
Overview Zoom / Filter Details on Demand
Secur i ty. Analyt ics . Ins ight .27
Leverage Data Mining to Summarize Data
Overview Zoom / Filter Details on DemandOverview
• Leverage data mining (clustering) to create an overview
• Summarizing dozens of dimensions into a two-dimensional overview
Secur i ty. Analyt ics . Ins ight .28
Self Organizing Maps
• Clustering based on a single data dimension
• for example “attackers”
• It’s hard to
• engineer the right features
• avoid over-learning
• interpret the clusters
3
2
1
3 clusters
Raffael . Marty @ pixlcloud . com
29
Examples
Secur i ty. Analyt ics . Ins ight .30
VincentT h i s hea tmap s h o w s behavior over time. !In this case, we see activity per user. We can see that ‘vincent’ is visually different from all of the other users. He shows up very lightly o v e r t h e e n t i re t i m e period. This seems to be something to look into. !Purely v isual , wi thout understanding the data were we able to find this.
Secur i ty. Analyt ics . Ins ight .33
Firewall Heatmap
Secur i ty. Analyt ics . Ins ight .34
Showing Activity per Destination Address
Secur i ty. Analyt ics . Ins ight .35
Changing Color Exposure
Secur i ty. Analyt ics . Ins ight .36
Zoom In
Secur i ty. Analyt ics . Ins ight .37
Pivot to Source Address
Secur i ty. Analyt ics . Ins ight .38
Seriate
Secur i ty. Analyt ics . Ins ight .40
Expanding Detailsource destination port source port
Secur i ty. Analyt ics . Ins ight .41
Intra-Role Anomaly - Random Order
users
time
dc(machines)
Secur i ty. Analyt ics . Ins ight .42
Intra-Role Anomaly - With Seriation
Secur i ty. Analyt ics . Ins ight .43
Intra-Role Anomaly - Sorted by User Role
Administrator
Sales
Development
Finance
Secur i ty. Analyt ics . Ins ight .43
Intra-Role Anomaly - Sorted by User Role
Administrator
Sales
Development
Finance
Admin???
Secur i ty. Analyt ics . Ins ight .44
• Millions of rows
• High-cardinality fields
!
!
• Where to start analysis?
• Formulate some hypotheses
• Informs visualization process and data preparation
• Our hypothesis and assumption
• Machines that get passed and blocked might be of interest
• Low-frequency sources are not interesting
Firewall Datafirewall data data type cardinality distribution
source ip ipv4 10-10^6 dependsdest ip ipv4 10-10^6 depends
source port int 65535 dependsdest port int
int 65535 highly skewed
bytes in/out int - skewed action bool / int 3 -
direction / iface bool / str small -
Secur i ty. Analyt ics . Ins ight .45
Visual Mapping
}
∆t .. time bin - aggregation
source
10.0.0.1
10.0.0.2
10.0.0.3
10.0.0.4
block & pass
blockpasscolor mapping:
Secur i ty. Analyt ics . Ins ight .46
Low-Frequency Behavior
sum <= 10; outbound sum <= 10; inbound
36k rows
source ip
Secur i ty. Analyt ics . Ins ight .47
Outbound Blocks
What’s That?
Oct 25 11:56:14.123128 rule 238/0(match): block in on xl1: 212.254.110.98.80 > 195.186.129.127.1338: . 3660196221:3660197653(1432) ack 906644 win 32936 (DF) Oct 25 11:57:18.140007 rule 238/0(match): block in on xl1: 212.254.110.98.80 > 195.186.129.127.1338: . 0:1432(1432) ack 1 win 32936 (DF) Oct 25 11:58:22.156195 rule 238/0(match): block in on xl1: 212.254.110.98.80 > 195.186.129.127.1338: . 0:1432(1432) ack 1 win 32936 (DF) Oct 25 11:59:26.170915 rule 238/0(match): block in on xl1: 212.254.110.98.80 > 195.186.129.127.1338: . 0:1432(1432) ack 1 win 32936 (DF)
less pflog.txt | grep xl1 | grep "rule 238" | sed -e 's/\(Oct .. ..\):..:..\........*/\1/' | uniq -c
6 Oct 25 03 8 Oct 25 05 3 Oct 25 06 25 Oct 25 07 9 Oct 25 08 117 Oct 25 09 127 Oct 25 10 169 Oct 25 11 178 Oct 25 12 158 Oct 25 13 187 Oct 25 14 354 Oct 25 15 111 Oct 25 16 104 Oct 25 17 33 Oct 25 18 17 Oct 25 19
A clear increase in rule 238 traffic
Secur i ty. Analyt ics . Ins ight .48
High Frequency Sources Over Time
block & pass
blockpass
sum > 10
672 rows
Secur i ty. Analyt ics . Ins ight .49
High Frequency Traffic Split Up
inbound outbound192.168.0.201!
195.141.69.42 !
195.141.69.43!
195.141.69.44 !
195.141.69.45!
195.141.69.46 !
212.254.110.100!
212.254.110.101!
212.254.110.107!
212.254.110.108!
212.254.110.109!
212.254.110.110!
212.254.110.98!
212.254.110.99 !
62.245.245.139 !
Secur i ty. Analyt ics . Ins ight .50
Outbound Traffic - Some Questions To Ask
• What happened mid-way through?
• Why is anything outbound blocked?
• What are the top and bottom machines doing?
• Did we get a new machine into the network?
• Some machines went away?
195.141.69.42
Secur i ty. Analyt ics . Ins ight .51
195.141.69.42 - Interactions action
port
dest
Secur i ty. Analyt ics . Ins ight .53
Zooming in on Top Rows
!212.254.110.100 212.254.110.101 212.254.110.102 212.254.110.103 212.254.110.104 212.254.110.105 212.254.110.106 212.254.110.107 212.254.110.108 212.254.110.109 212.254.110.110 212.254.110.111 212.254.110.112 212.254.110.113 212.254.110.114 212.254.110.115 212.254.110.116 212.254.110.117 212.254.110.118 212.254.110.119 212.254.110.120 212.254.110.121 212.254.110.122 212.254.110.123 212.254.110.124 212.254.110.125 212.254.110.126 212.254.110.127 212.254.110.66 212.254.110.96 212.254.110.97 212.254.110.98 212.254.110.99
• Hardly any pass-block
Oct 22 14:20:08.351202 rule 237/0(match): block in on xl0: 66.220.17.151.80 > 212.254.110.103.1881: S 1451746674:1451746678(4) ack 1137377281 win 16384 (DF)
Secur i ty. Analyt ics . Ins ight .53
Zooming in on Top Rows
!212.254.110.100 212.254.110.101 212.254.110.102 212.254.110.103 212.254.110.104 212.254.110.105 212.254.110.106 212.254.110.107 212.254.110.108 212.254.110.109 212.254.110.110 212.254.110.111 212.254.110.112 212.254.110.113 212.254.110.114 212.254.110.115 212.254.110.116 212.254.110.117 212.254.110.118 212.254.110.119 212.254.110.120 212.254.110.121 212.254.110.122 212.254.110.123 212.254.110.124 212.254.110.125 212.254.110.126 212.254.110.127 212.254.110.66 212.254.110.96 212.254.110.97 212.254.110.98 212.254.110.99
• Hardly any pass-block
212.254.110.102
Oct 16 13:14:05.627835 rule 0/0(match): pass in on xl0: 66.220.17.151.80 > 212.254.110.102.1977: S 1841864015:1841864019(4) ack 1308753921 win 16384 (DF) !SYN ACK for real Web traffic passed
Secur i ty. Analyt ics . Ins ight .54
This Guy Sure Keeps Busy
212.254.144.40
dest port
Secur i ty. Analyt ics . Ins ight .55
• Attackers are very successful
• Data could reveal adversaries
• We have a big data analytics problem
• We need the right analytics and visualizations
• Security visualization is hard
• Data visualization workflow is a promising approach
• Heatmaps are great for overviews
• We need a set of heuristics and workflows
Recap