Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View...

25
Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao 1 , Jing Gao 1 , Deepak Turaga 2 , Long Vu 2 , and Alain Biem 2 1 Department of Computer Science and Engineering, University at Buffalo; 2 IBM T.J. Watson Research Center 1

Transcript of Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View...

Page 1: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Temporal Multi-View Inconsistency Detection for Network Traffic Analysis

WWW’15 Florence, Italy

Houping Xiao1, Jing Gao1,

Deepak Turaga2, Long Vu2, and Alain Biem2

1Department of Computer Science and Engineering, University at Buffalo;2 IBM T.J. Watson Research Center

1

Page 2: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

• Motivation

• Challenges

• Proposed Framework• Temporal Multi-View Inconsistency Detection (TMVID)

• Experiments

• Conclusions

Outline

2

Page 3: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Motivation

• Multiple views information• Network traffic data typically involve multiple views

• Example: Network traffic data can be collected through different protocols, such as TCP, UDP, and ICMP

• Question?• Which host has suspicious behavior?

• Our solution• Calculate the degree of receiving inconsistent

information across multiple views

• Higher degree of inconsistency – More suspicious

3

Page 4: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

How to Find Inconsistent Behavior

• Single view approach• Apply many anomaly detector algorithms on each view of

the data and then compare the detector scores

• However, the detector scores may be noisy and fail to consider the intrinsic relationship between different views

• Analyze the behavior of host across multiple views

4

Page 5: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

• First apply existing anomaly detection algorithm• Convert data from different views into comparable features and

discard noisy information• However, after the application of anomaly detectors on each

view, it is still challenging to compare anomaly detector outputs from different views

5

Raw detector scores from network traffic flow on 4 views

Host ID

De

tect

or

Sco

reOur Solution

Page 6: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Our Solution

• Project multi-view data into a new space where inconsistent and consistent hosts can be well separated

• Identify detector clusters and compare at the cluster level• In each source, detectors can be partitioned into clusters

so that detectors in the same cluster share similar behavior patterns on hosts across multiple views

• The behavior of the underlying detector cluster should be consistent across multiple views

6

Page 7: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Temporal Behavior

• Observations• Behavior of hosts evolves over time

• The temporal patterns of hosts’ behavior must be taken into consideration when finding inconsistency

• Example: a host with a very high volume of network traffic is normal on weekdays, while it’s suspicious on weekends

• Solution• In each view, timestamps will be partitioned into clusters

• Temporal behavior over timestamp clusters should be consistent across multiple views

7

Page 8: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Proposed Framework

8

host

hostdete

ctor

dete

ctor

𝑽𝒊𝒆𝒘𝟏

𝑽𝒊𝒆𝒘𝟐

𝑽𝒊𝒆𝒘𝒋

𝑽𝒊𝒆𝒘𝑴

𝑽𝒊𝒆𝒘𝑴−𝟏

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝑵−𝟏

Component 1Anomaly Detector System

Temporal Multi-View Inconsistency Detection (TMVID )

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝑵

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝟐

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝟏

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝒋

Observed tensor

Page 9: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Proposed Framework

9

Component 2Joint Probabilistic Tensor Factorization

host

hostdete

ctor

dete

ctor

𝑽𝒊𝒆𝒘𝟏

𝑽𝒊𝒆𝒘𝟐

𝑽𝒊𝒆𝒘𝒋

𝑽𝒊𝒆𝒘𝑴

𝑽𝒊𝒆𝒘𝑴−𝟏

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝑵−𝟏

Component 1Anomaly Detector System

Temporal Multi-View Inconsistency Detection (TMVID )

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝑵

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝟐

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝟏

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝒋

Detector cluster assignment matrixObserved tensor

Identity matrixLatent tensor

Timestamps cluster assignment matrix

Page 10: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Proposed Framework

10

Inconsistency Score

Component 3Inconsistency Score

Computation

Component 2Joint Probabilistic Tensor Factorization

host

hostdete

ctor

dete

ctor

𝑽𝒊𝒆𝒘𝟏

𝑽𝒊𝒆𝒘𝟐

𝑽𝒊𝒆𝒘𝒋

𝑽𝒊𝒆𝒘𝑴

𝑽𝒊𝒆𝒘𝑴−𝟏

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝑵−𝟏

Component 1Anomaly Detector System

Temporal Multi-View Inconsistency Detection (TMVID )

InconsistentHosts

+

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝑵

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝟐

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝟏

𝑫𝒆𝒕𝒆𝒄𝒕𝒐𝒓𝒋

Latent tensor

Page 11: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Joint Probabilistic Tensor Factorization

11

• is the latent tensor. Each entry stands for the detector score at the u-th detector cluster and w-th timestamp cluster for v-th host

• is the d-th projection matrix, which constructs the multi-linear mapping between the observed detector tensors and the latent tensors

• is the residue tensor. Each entry is assumed to follow a Gaussian distribution

Page 12: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Joint Probabilistic Tensor Factorization

12

• Parameter set:

• The log-likelihood of given observed tensors:

Page 13: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Joint Probabilistic Tensor Factorization

13

• Assumptions:• The behavior of anomaly detectors should be similar

across different views• The behavior of hosts on timestamp should be

similar across different views• Based on these assumptions, we introduce the penalized

log-likelihood function:

Where

Page 14: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Joint Probabilistic Tensor Factorization

14

• Goal:

Constraints: Projection matrices should be similar across views

Factorization error

Page 15: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Inconsistency Score Computation

• Inconsistency Score

𝒌𝒌 𝒌 𝒌⋯ ⋯

𝑪

𝑫

𝑪 𝑪𝑪

𝑫 𝑫𝑫

15

Page 16: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Experiment Set-up

• Datasets:• Synthetic datasets

• Two Real-world datasets• Collected from IBM enterprise networks

• Network Traffic Flow Data

• Domain Name System Data

16

Page 17: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Table 2: F-Measure Comparison

Effectiveness Comparison

Vote/

mean

Vote/

min

Vote/

maxMean Min Max NMF TMVID

Synth-1 .81 .85 .80 .10 .15 .10 .80 1.0

Synth-2 .86 .92 .85 .20 .21 .20 .82 1.0

Synth-3 .90 .95 .85 .25 .30 .28 .87 1.0

17

# detectors # hosts # timestamps # views

Synth-1 50 1000 200 3

Synth-2 50 2000 300 10

Synth-3 100 5000 500 50

Table 1: Statistics of Synthetic Data sets

Results: For the F-Measure, the higher, the better. It’s seen from the table that the proposed TMVID can achieve highest F-measure.

Page 18: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Scalability V.S. # Hosts

18

TMVID

# Hosts Time(s)

1.0 × 102 1.3

1.0 × 103 10.5

5.0 × 103 50.8

1.0 × 104 107.8

5.0 × 104 533.4

1.0 × 105 1107.5

Pearson Correlation 0.9998

Results: The scalability of the proposed algorithm is almost linear with respect to the number of hosts

Page 19: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Scalability V.S. # Views

19

# Views

Ru

nn

ing

Tim

e/s

ec

Results: The scalability of the proposed algorithm is linear with respect to the number of views

Page 20: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Network Traffic Flow

20

Host ID

Inco

nsi

ste

ncy

Sco

re

Results: Figure of the inconsistency scores for hosts. Most of the hosts are considered as consistent, while only a small set of hosts receives very high inconsistency scores

Page 21: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Case Study

21

Det

ect

or

Sco

re

Top1 Inconsistent Host

Det

ect

or

Sco

re

Top1 Consistent Host

Results: For inconsistent host, the detector score patterns of views on both timestamp and detector clusters are well separated in the subspace found by the joint probabilistic tensor factorization, while the behavior of consistent host is almost the same across views

Page 22: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Case Study

22

Top 2 Inconsistent Hosts

Timestamp

Det

ect

or

Sco

re

Top 2 Consistent Hosts

Timestamp

Det

ect

or

Sco

re

Results: For inconsistent hosts, the patterns of detector clusters’ are quite different across multiple views, while the patterns are similar for consistent hosts across views, ignoring noise

Page 23: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Case Study

23

Top 2 Inconsistent Hosts

Detector ID

De

tect

or

Sco

re

Top 2 Consistent Hosts

Detector ID

Det

ect

or

Sco

re

Results: For inconsistent hosts, the patterns of timestamp clusters’ vary a lot across views, especially for view 2 and view 4, whose patterns are obviously different from that of view 1 and view 3. However, the patterns are quite similar for consistent hosts

Page 24: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

• Developed a novel framework (TMVID) to conduct inconsistency detection from multiple views of temporal data

• Proposed joint probabilistic tensor factorization to extract the common behavior hidden in multiple views, and presented how to calculate inconsistency score for each host

• Demonstrated the efficacy of TMVID to capture inconsistency in multi-view temporal data on synthetic and real-world network traffic data sets

Conclusions

24

Page 25: Temporal Multi-View Inconsistency Detection for Network ...€¦ · Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW’15 Florence, Italy Houping Xiao

Thank You!Questions?

25