The Heatmap - Why is Security Visualization so Hard?

58
Raffael Marty, CEO The Heatmap Why is Security Visualization so Hard? Area41 Zurich, Switzerland June 2, 2014

description

This presentation explores why it is so hard to come up with a security monitoring (or shall we call it security intelligence) approach that helps find sophisticated attackers in all the data collected. It explores the question of how to visualize a billion events. To do so, the presentation dives deeply into heatmaps - matrices - as an example of a simple type of visualization. While these heatmaps are very simple, they are incredibly versatile and help us think about the problem of security visualization. They help illustrate how data mining and user experience design help get a handle of the security visualization challenges - enabling us to gain deep insight for a number of security use-cases.

Transcript of The Heatmap - Why is Security Visualization so Hard?

Page 1: The Heatmap - Why is Security Visualization so Hard?

Raffael Marty, CEO

The HeatmapWhy is Security Visualization so Hard?

Area41 Zurich, Switzerland June 2, 2014

Page 2: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .2

Heatmaps

Page 3: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .3

I am Raffy - I do Viz!

IBM Research

Page 4: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .4

Attacks have changed:

• Targeted

• Objectives beyond

monetization

• Low and Slow

• Multiple access vectors

• Remotely controlled

The (New) Threat Landscape

APT 1 Unit 61398 (61398部�)

Motivations have changed:

• Nation state sponsored

• Political, economic, and military

advantage

• Monetization / Crimeware

• Religion

• Hacktivism

Security approaches failed due to:

• Reliance on past knowledge /

signatures

• Systems are too rigid (e.g, schema)

• Poor scalability

• Limited knowledge exchange

Page 5: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .5

How Compromises Are Detected

Mandiant M Trends Report 2014 Threat Report

Attackers in networks before detection

27 days

229 days

Average time to resolve a cyber attack

Successful attacks per company per week1.4

Average cost per company per year$7.2M

Page 6: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .6

Our Security Goals

!!

Find Intruders and ‘New Attacks’!!

Discover Exposure Early!!

Communicate Findings

Page 7: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .7

Visualize Me Lots (>1TB) of Data

!! SecViz is Hard!

Page 8: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .8

Visualize 1TB of Data - What Graph?

drop reject NONE ctl acceptDNS Update Failed

Log InIP Fragments

Max Flows InitiatedPacket Flood

UDP FloodAggressive Aging

BootpRenew

Log OutRelease

NACKConflict

DNS Update SuccessfulDNS record not deleted

DNS Update RequestPort Flood

1 10000 100000000

How much information does each of the graphs convey?

Page 9: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .9

The Heatmap

Matrix A, where aij are integer values mapped to a color scale.

aij = 1 10 20 30 40 50 60 70 80 >90

42

rows

columns

Page 10: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .10

Mapping Data to a Heatmap

values = how often was <row_item> seen

time

rows = source ip

columns = time

Page 11: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .11

Mapping Log Records to Heatmaps

May 5 23:57:50 pixl-ram sudo: pam_unix(sudo:session): session opened for user root by ram(uid=0)

root

ram

peg

sue

}

∆t .. time bin

Page 12: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .11

Mapping Log Records to Heatmaps

May 5 23:57:50 pixl-ram sudo: pam_unix(sudo:session): session opened for user root by ram(uid=0)

root

ram

peg

sue

}

∆t .. time bin

Page 13: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .11

Mapping Log Records to Heatmaps

May 5 23:57:50 pixl-ram sudo: pam_unix(sudo:session): session opened for user root by ram(uid=0)

root

ram

peg

sue

}

∆t .. time bin

⨍()=+1

Page 14: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .12

• Scales well to a lot of data (can aggregate ad infinitum)

• Shows more information than a bar chart

• Flexible ‘measure’ mapping

• frequency count

• sum(variable) [avg(), stddev(), …]

• distinct count(variable)

Why Heatmaps?

Page 15: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .12

• Scales well to a lot of data (can aggregate ad infinitum)

• Shows more information than a bar chart

• Flexible ‘measure’ mapping

• frequency count

• sum(variable) [avg(), stddev(), …]

• distinct count(variable)

Why Heatmaps?

• BUT information content is limited!

• Aggregates too highly in time and potentially value dimensions

Page 16: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .13

Data Visualization Workflow

Overview Zoom / Filter Details on Demand

Page 17: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .14

Heatmap

• Can pack millions of records (although highly aggregated)

• Allows for zoom-in to expose detail

• By itself exposes patterns

• Great ‘navigation’ tool to drill into different, ‘non-scalable’ visualization

!

• No other visualization possesses these properties

Data Visualization Workflow - Overview

Page 18: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .15

1. Labels

HeatMap Challenges - Display

<1px per label 1000s of rows

Page 19: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .16

2. Mouse-Over

• What information to show?

• Position - x/y coordinates

• Original records

• Query backend for each position?

HeatMap Challenges - Display

Page 20: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .17

3. Sorting

• Random

• Alphabetically

• Based on values

• Similarity

• What algorithm?

• What distance metric?

• Leverage third data field / context?

HeatMap Challenges - Displayrandom row order

rows clustered

user

Page 21: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .18

4. Overplotting

• How to summarize multiple rows in one pixel?

• Sum?

• Overplot x and y axes?

• Undo overplot on zoom?

1 row -> 1 pixel

n rows -> 1 pixel

1 row -> m pixels

}∑

HeatMap Challenges - Display

Page 22: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .19

1. Time Selection

• Take screen resolution into account(you have 1000 pixels and you query 1005 seconds?)

• Chose start AND end time? • Communicate to user what data is available?

HeatMap Challenges - Interaction

start time end time

Page 23: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .20

2. Zoom and Pan

• Re-query for more

detail?

HeatMap Challenges - Interaction

Page 24: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .21

3. Color Scales / Ranges

• discrete

• continuous

• different colors

• multiple anchors

HeatMap Challenges - Interaction

Page 25: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .22

4. Exposure - Mapping data to color

HeatMap Challenges - Interaction

values

frequency

dark colors under utilized

Page 26: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .23

5. Pivot

HeatMap Challenges - Interaction

destinationAddress

Page 27: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .23

5. Pivot

HeatMap Challenges - Interaction

destinationAddress

sourceAddress WHERE destinationAddress = 81.223.6.41

Page 28: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .24

Different backend technologies (big data)

• Key-value store

• Search engine

• GraphDB

• RDBMS

• Columnar - can answer analytical questions

• Hadoop (Map Reduce)

• good for operations on ALL data

HeatMap Challenges - Backend

Other things to consider:

• Caching

• Joins

Page 29: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .25

• Showing relationships

-> link graphs

!

!

!

• Showing multiple dimensions and their inter-

relatedness

-> || coords

What’s the HeatMap Not Good At

Page 30: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .26

Heatmaps Are Good Starting Points … BUT

Overview Zoom / Filter Details on Demand

Page 31: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .27

Leverage Data Mining to Summarize Data

Overview Zoom / Filter Details on DemandOverview

• Leverage data mining (clustering) to create an overview

• Summarizing dozens of dimensions into a two-dimensional overview

Page 32: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .28

Self Organizing Maps

• Clustering based on a single data dimension

• for example “attackers”

• It’s hard to

• engineer the right features

• avoid over-learning

• interpret the clusters

3

2

1

3 clusters

Page 33: The Heatmap - Why is Security Visualization so Hard?

Raffael . Marty @ pixlcloud . com

29

Examples

Page 34: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .30

VincentT h i s hea tmap s h o w s behavior over time. !In this case, we see activity per user. We can see that ‘vincent’ is visually different from all of the other users. He shows up very lightly o v e r t h e e n t i re t i m e period. This seems to be something to look into. !Purely v isual , wi thout understanding the data were we able to find this.

Page 35: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .33

Firewall Heatmap

Page 36: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .34

Showing Activity per Destination Address

Page 37: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .35

Changing Color Exposure

Page 38: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .36

Zoom In

Page 39: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .37

Pivot to Source Address

Page 40: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .38

Seriate

Page 41: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .40

Expanding Detailsource destination port source port

Page 42: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .41

Intra-Role Anomaly - Random Order

users

time

dc(machines)

Page 43: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .42

Intra-Role Anomaly - With Seriation

Page 44: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .43

Intra-Role Anomaly - Sorted by User Role

Administrator

Sales

Development

Finance

Page 45: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .43

Intra-Role Anomaly - Sorted by User Role

Administrator

Sales

Development

Finance

Admin???

Page 46: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .44

• Millions of rows

• High-cardinality fields

!

!

• Where to start analysis?

• Formulate some hypotheses

• Informs visualization process and data preparation

• Our hypothesis and assumption

• Machines that get passed and blocked might be of interest

• Low-frequency sources are not interesting

Firewall Datafirewall data data type cardinality distribution

source ip ipv4 10-10^6 dependsdest ip ipv4 10-10^6 depends

source port int 65535 dependsdest port int

int 65535 highly skewed

bytes in/out int - skewed action bool / int 3 -

direction / iface bool / str small -

Page 47: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .45

Visual Mapping

}

∆t .. time bin - aggregation

source

10.0.0.1

10.0.0.2

10.0.0.3

10.0.0.4

block & pass

blockpasscolor mapping:

Page 48: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .46

Low-Frequency Behavior

sum <= 10; outbound sum <= 10; inbound

36k rows

source ip

Page 49: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .47

Outbound Blocks

What’s That?

Oct 25 11:56:14.123128 rule 238/0(match): block in on xl1: 212.254.110.98.80 > 195.186.129.127.1338: . 3660196221:3660197653(1432) ack 906644 win 32936 (DF) Oct 25 11:57:18.140007 rule 238/0(match): block in on xl1: 212.254.110.98.80 > 195.186.129.127.1338: . 0:1432(1432) ack 1 win 32936 (DF) Oct 25 11:58:22.156195 rule 238/0(match): block in on xl1: 212.254.110.98.80 > 195.186.129.127.1338: . 0:1432(1432) ack 1 win 32936 (DF) Oct 25 11:59:26.170915 rule 238/0(match): block in on xl1: 212.254.110.98.80 > 195.186.129.127.1338: . 0:1432(1432) ack 1 win 32936 (DF)

less pflog.txt | grep xl1 | grep "rule 238" | sed -e 's/\(Oct .. ..\):..:..\........*/\1/' | uniq -c

6 Oct 25 03 8 Oct 25 05 3 Oct 25 06 25 Oct 25 07 9 Oct 25 08 117 Oct 25 09 127 Oct 25 10 169 Oct 25 11 178 Oct 25 12 158 Oct 25 13 187 Oct 25 14 354 Oct 25 15 111 Oct 25 16 104 Oct 25 17 33 Oct 25 18 17 Oct 25 19

A clear increase in rule 238 traffic

Page 50: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .48

High Frequency Sources Over Time

block & pass

blockpass

sum > 10

672 rows

Page 51: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .49

High Frequency Traffic Split Up

inbound outbound192.168.0.201!

195.141.69.42 !

195.141.69.43!

195.141.69.44 !

195.141.69.45!

195.141.69.46 !

212.254.110.100!

212.254.110.101!

212.254.110.107!

212.254.110.108!

212.254.110.109!

212.254.110.110!

212.254.110.98!

212.254.110.99 !

62.245.245.139 !

Page 52: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .50

Outbound Traffic - Some Questions To Ask

• What happened mid-way through?

• Why is anything outbound blocked?

• What are the top and bottom machines doing?

• Did we get a new machine into the network?

• Some machines went away?

195.141.69.42

Page 53: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .51

195.141.69.42 - Interactions action

port

dest

Page 54: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .53

Zooming in on Top Rows

!212.254.110.100 212.254.110.101 212.254.110.102 212.254.110.103 212.254.110.104 212.254.110.105 212.254.110.106 212.254.110.107 212.254.110.108 212.254.110.109 212.254.110.110 212.254.110.111 212.254.110.112 212.254.110.113 212.254.110.114 212.254.110.115 212.254.110.116 212.254.110.117 212.254.110.118 212.254.110.119 212.254.110.120 212.254.110.121 212.254.110.122 212.254.110.123 212.254.110.124 212.254.110.125 212.254.110.126 212.254.110.127 212.254.110.66 212.254.110.96 212.254.110.97 212.254.110.98 212.254.110.99

• Hardly any pass-block

Oct 22 14:20:08.351202 rule 237/0(match): block in on xl0: 66.220.17.151.80 > 212.254.110.103.1881: S 1451746674:1451746678(4) ack 1137377281 win 16384 (DF)

Page 55: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .53

Zooming in on Top Rows

!212.254.110.100 212.254.110.101 212.254.110.102 212.254.110.103 212.254.110.104 212.254.110.105 212.254.110.106 212.254.110.107 212.254.110.108 212.254.110.109 212.254.110.110 212.254.110.111 212.254.110.112 212.254.110.113 212.254.110.114 212.254.110.115 212.254.110.116 212.254.110.117 212.254.110.118 212.254.110.119 212.254.110.120 212.254.110.121 212.254.110.122 212.254.110.123 212.254.110.124 212.254.110.125 212.254.110.126 212.254.110.127 212.254.110.66 212.254.110.96 212.254.110.97 212.254.110.98 212.254.110.99

• Hardly any pass-block

212.254.110.102

Oct 16 13:14:05.627835 rule 0/0(match): pass in on xl0: 66.220.17.151.80 > 212.254.110.102.1977: S 1841864015:1841864019(4) ack 1308753921 win 16384 (DF) !SYN ACK for real Web traffic passed

Page 56: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .54

This Guy Sure Keeps Busy

212.254.144.40

dest port

Page 57: The Heatmap - Why is Security Visualization so Hard?

Secur i ty. Analyt ics . Ins ight .55

• Attackers are very successful

• Data could reveal adversaries

• We have a big data analytics problem

• We need the right analytics and visualizations

• Security visualization is hard

• Data visualization workflow is a promising approach

• Heatmaps are great for overviews

• We need a set of heuristics and workflows

Recap