Network Tomography Based on Flow Level Measurements Dogu Arifler Ph.D. Defense Committee Members:...
-
Upload
elfrieda-watson -
Category
Documents
-
view
219 -
download
0
Transcript of Network Tomography Based on Flow Level Measurements Dogu Arifler Ph.D. Defense Committee Members:...
Network Tomography Based on Flow Level Measurements
Dogu Arifler
Ph.D. Defense
Committee Members:
Prof. Ross Baldick
Prof. Melba M. Crawford
Prof. Gustavo de Veciana (Co-advisor)
Prof. Brian L. Evans (Co-advisor)
Prof. Theodore S. Rappaport
Prof. Sanjay Shakkottai
April 19, 2004
2
Outline
Introduction Background and motivation Overview of contributions
Methodology for inferring network resource sharing Conditional sampling Flow filtering Dimensionality reduction
Validation Simulation studies Application to real data with the bootstrap
Conclusion Summary Future work
3
Inference of network properties
Motivation: Network managers need information about properties of networks to better plan for services and diagnose performance problems
Problem: In general, properties of networks outside one’s administrative domain are unknown Little or no information on routing and topology Little or no information on link and server utilizations
Solution: Network tomography Inferring characteristics of networks from available network
traffic measurements Application of statistical methods to network measurements
4
Inference of congested resource sharing
Internet service providers Diagnose misconfigurations,
link failures
End users Assess routing diversity Infer how resources are allocated
Content providers Balance workload among servers Plan placement of caches
Wireless service providers Evaluate adequacy of backhaul
link capacity Determine if access point is
configured properly
Wireless hot spot
Congested content server
Link failure
5
Related work
Brute force: via a Unix utility, traceroute Cooperation of routers along packet’s route required Providers unwilling to disclose information for security concerns
Topology visualization: skitter [CAIDA], rocketfuel [UWA]
Location-based approximations [Savage, Cardwell, Anderson, 1999]
Packets destined for given network address generally follow the same path
Statistical techniques on packet level measurements Correlation of end-to-end packet losses
[Harfoush, Bestavros, Byers, 2000]
Clustering based on minimizing entropy of inter-packet spacing [Katabi, Bazzi, Yang, 2001]
Correlation of end-to-end packet losses and delays [Rubenstein, Kurose, Towsley, 2002]
6
Network tomography based on flows
Packet level measurements are Data intensive to collect and store Dependent on cooperation of network and/or collaboration of users Complex to analyze
Propose a significantly different strategy to infer network properties Correlation of passive flow level measurements available at a local
measurement site
A flow is a sequence of packets associated with a given instance of an application Packets corresponding to transfer of a Web page, file, e-mail, etc.
Flow is an abstraction at higher protocol layers, i.e. closer to the application layer
7
Flow level measurements
Flow records Summary information Easier to collect and store State-of-the-art networking
equipment can collect flow records (e.g. Cisco NetFlow, sFlow, Argus)
Records contain Source/destination IP addresses, port numbers, number of
packets and bytes in the flow, and start time and end time of flow
Data warehouse
Records
Monitored link
packets of a flow
timeout
time
start time end time
response time
identifier 1
identifier 2
8
TCP flows
Approximately 80% of flows in the Internet are transferred via TCP [CAIDA, 1999]
TCP adapts its data transmission rate to available network capacity Congested link bandwidth sharing among flows is roughly fair
One performance measure for TCP flows is perceived throughput Amount of data in bytes (flow size) divided by response time
Premise: Throughputs of TCP flows that temporally overlap at a congested resource are correlated
time
available capacityflow 1
flow 2
9
Overview of contributions
New approach to network tomography based on flow level measurements
Methodology for inferring congested resource sharing:
1. Conditional sampling strategy Estimation of correlation matrix from pairwise correlations
2. Flow filtering criteria Preprocessing flow records: omitting flows based on size in
bytes, duration, and number of packets
3. Dimensionality reduction Exploratory factor analysis via principal component method
4. Validation with measured data Bootstrap methods to estimate confidence intervals for factor
analysis results
10
Outline
Introduction Background and motivation Overview of contributions
Methodology for inferring network resource sharing Conditional sampling Flow filtering Dimensionality reduction
Validation Simulation studies Application to real data with the bootstrap
Conclusion Summary Future work
11
Throughput of a flow class
Flow class is a collection of flow records that have a common identifier, e.g. source/destination address
How can one infer which flow classes share resources? Correlate flow class throughput processes given by
Contribution #1
time
. . . . . .
class 2
class 1Flow records collected at a measurement site
12
Conditional sampling of random processes
Which flow class throughput samples can be used to capture flow class throughput correlations?
Use a pairwise approach to estimate correlation matrix Estimate throughput correlations between class pairs by using
samples at times when class pair is active
Construct correlation matrix R with elements
Contribution #1
, i ji j c cR
time
consider red and blue classes
activity of a class during n
13
Flow filtering
Can one better capture correlations due to resource sharing if only a subset of flow records are used?
Throughputs of short TCP flows are noisy, because they do not have an opportunity to “learn” the congestion state
Amount of temporal overlap between a long TCP flow and a short TCP flow is small
What is the impact of short flows and long flows on throughput correlations? Model instantaneous link bandwidth available to a flow as an
autoregressive process Analyze the effect of flow duration and amount of overlap
between flows on throughput correlation
Contribution #2
14
Autoregressive model for available bandwidth
Suppose that link bandwidth available to a flow at time i is a first-order autoregressive process denoted by B(i)
Express perceived throughputs of flows f1 and f2 as
where model the inability of a short TCP flows to “learn” the congestion state of the network
2fs
2fe
overlap
time
1fd
1fe
10fs
2fd
Contribution #2
15
10 20 30 40 500
0.2
0.4
0.6
0.8
1
0 5 10 15 200
0.2
0.4
0.6
0.8
1d
f2=10
df2
=20d
f2=30
df2
=40
Correlation between flow throughputs
Duration of f1=20 Perfectly overlapping flows
high correlation for temporally overlapping flows
correlation depends on overlap relative tothe longer flow
effect of noise vanishes as flow duration increases, and correlation approaches 1
Contribution #2
Duration of f1 and f2Start time of f2
Co
rrel
atio
n
2fs
2fe
overlap
time
1fd
1fe
10fs
2fd
16
Flow filtering criteria
Resource sharing flow classes Long flows with large amounts of overlap result in high throughput
correlations, but this situation does not arise frequently Long flows overlapping with short flows result in lower correlations “Noisy” short flows result in lower correlations even when the
amount of overlap is large
Removing large- and small-sized flows helps in capturing positive throughput correlations due to resource sharing Long (short) flows will typically be large (small) in size Unlike duration of a flow, size of a flow is invariant regardless of
the capacity of links Flow size is the proper attribute to consider for filtering out flows
Contribution #2
17
Exploratory factor analysis
Interpretation of flow class throughput correlation matrix to infer resource sharing is difficult
Correlation structure of flow class throughputs can often be represented by a few latent factors
Orthogonal factor model ( m ≤ p ):
No hypothesis on m, but factors must have high explanatory power
Λij are loadings (or weights) of each factor on a variable
Contribution #3
18
Principal component method
Determine m “significant”eigenvalues of R using Kaiser’s rule [Kaiser, 1960]
Variances of factors are given by eigenvalues
1 1 1 1, , , , , , , ,T
T T Tm m p p m m p p R ξ ξ ξ ξ ξ ξ
Contribution #3
eig
en
valu
e1 2 43 5 6 7
1variance of a normalized variable
…
m significanteigenvalues
1 1 1 1ˆ ˆ ˆ ˆ, , , , ,
TT T T
m m m m R ξ ξ ξ ξ Ψ ΛΛ Ψ
2
1
ˆˆ 1m
i ijj
Use spectral decomposition on R to estimate Λ and Eigenvalue-eigenvector pairs (i, ξi), 1 ≤ i ≤ p
where
19
Inference of resource sharing
Structure of a pp correlation matrix R is explained by a pm factor loading matrix Λ Columns of Λ represent shared congested resources Magnitudes of loadings tell us which shared resource has the
most effect on the variability of class throughput Loading matrix can be rotated via varimax rotation to obtain Λ*
that potentially gives a better description of resource sharing
Contribution #3
11 12
21 22*
31 32
41 42
51 52
Λ
Class 1
Class 2
Class 3
Class 4
Class 5
Factor 1 Factor 2
Classes 1, 2 and 5 share one resourceClasses 3 and 4 share another resource
Consider five flow classes and suppose that the correlation matrix has two significant eigenvalues
Factor loading with the largest magnitude in each row is boxed
20
Outline
Introduction Background and motivation Overview of contributions
Methodology for inferring network resource sharing Conditional sampling Flow filtering Dimensionality reduction
Validation Simulation studies Application to real data with the bootstrap
Conclusion Summary Future work
21
TCP simulations
Primary goals of simulations: Evaluate effectiveness of exploratory factor analysis in identifying
flow classes that share resources in a controlled environment Find a range of flow sizes that better capture network’s
congestion dynamics
Simulations are performed using OPNET Modeler A discrete-event environment for network modeling and
simulation (http://www.opnet.com)
Simulate 2 hour-long file download activity File requests from users arrive according to a Poisson process Each user downloads a file whose size is chosen from a
lognormal distribution with mean 16 kB, std 131 kB [Downey,2001]
File sizes, request times, and download response times are recorded to create NetFlow-like data for statistical analysis
22
Assessment of factor model
Need a metric to evaluate if loadings correctly determine which classes are associated with which resources
Define squared error loss
Couple explanatory power with squared error loss to evaluate factor analysis in inferring resource sharing Assess inference accuracy Empirically search for size thresholds for filtering out flows to
improve accuracy
Λ̂
0Λ : “Ideal” loading matrix
: Estimated loading matrix
23
Tree topology with three bottlenecks
Each file server-subnet pair is a flow class
Bottlenecks A1, A2, and A3 are loaded equally
Effect of offered load by classes and filtering out small and/or large flows on inference will be investigated
A1
A2
A3
S1
file server
10 Mbps LANs with 10 workstations
1
2
3
4
5
6
7
Consider a scenario in which users in seven subnets download files from a file server
24
Tree topology with three bottlenecks: results
Explanatory power Accuracy of loadings
0.075 0.1 0.125 0.15 0.175 0.2 0.22545
50
55
60
65
70
75
80
85
90
Originalv>4 kBv>8 kBv<16 kBv<32 kB4<v<32 kB
0.075 0.1 0.125 0.15 0.175 0.2 0.2250
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5Originalv>4 kBv>8 kBv<16 kBv<32 kB4<v<32 kB
Load offered by each class on corresponding bottleneck
% V
aria
nce
Sq
ua
red
err
or
loss
Squared error loss decreases with increasing offered load
Filtering out small and large flows has significant benefits
Load offered by each class on corresponding bottleneck
4 8 16 32flow size (kB)
Compromise between statistical accuracy and reliability of inference!
Explanatory power increases with increasing offered load
25
Interaction of coupled traffic
Consider a “linear” network to evaluate the effect of interactions of coupled network traffic
Can throughputs of two flow classes that do not share a link be correlated due to interactions through another flow class?
Results of fluid simulations show that degree of correlation between throughputs of classes not sharing a link is negligible
file server 3
10 Mbps LANs with 10 workstations
1
2
3
file server 1
file server 2
26
Interaction of coupled traffic: an example
Consider the “linear” network below
Discard flows with sizes < 4 kB or > 32 kB Based on 2 significant factors, determine factor loadings Rotated factor loading estimates
Rows correspond to classes Columns correspond to shared links
file server 3
10 Mbps LANs with 10 workstations
1
2
3
file server 1
file server 2
80% 80%Background traffic utilizes 20% ofbottleneck links
(20%)
(40%)(40%)
27
Wireless LANs
802.11b wireless LANs with 20 users Differentiate between two cases in which poor throughput
performance (40 kbps) is being reported
Discard flows with sizes < 4 kB or > 32 kB Correlate throughputs of 4 users, eigenvalues are
Underprovisioned backhaul link: {3.0254, 0.6139, 0.2066, 0.1541} Poor signal strength: {1.2571, 0.9530, 0.9416, 0.8484}
Backhaul link underprovisioned for traffic generated by wireless users
Access point’s location is not optimal with respect to users
Stations operate at 11 Mbps
Stations operate at 1 Mbps
file server file server1 Mbps 11 Mbps
28
Discussion of wireless LAN results
Consider bottlenecks with capacity 1 Mbps M active users, each having Ni active flows M is almost constant (has low variance) Total number of active flows N = N1+N2+…+NM
user 1
user 2
user M
user 1
user 2
user M
……
backhaul link1 Mbps
access point1 Mbps
Resource bandwidth allocated to flows =
Resource bandwidth allocated to flows =
One common source for variability
Each user has its ownsource for variability
1
N
1
iMN
(per user scheduling)
(per flow allocation)
29
Summary of methodology
Flow filtering
Bootstrap Exploratoryfactor analysis
Conditional sampling
Network tomography
30
The bootstrap
Validation with real data is extremely difficult! Unlike controlled simulations, we do not know routing information
We would like to be able to make inferential statements Estimate 95% confidence intervals for eigenvalues and loadings Modify Kaiser’s rule for selecting significant eigenvalues
The bootstrap, a computer-based method, can be used to compute confidence intervals [Efron and Tibshirani, 1993]
From data at hand, construct empirical distribution and generate many realizations
No distributional assumptionson data required
Applicable to any statistic, s(X), simple or complicated
(B independent replications)
samples of size n
*1 * *1
*2 * *2
* * *
ˆ 1
ˆ 2
ˆB B
s
s
B s
X X
X X
X X
n̂F
Contribution #4
31
Real data: preprocessing
Two NetFlow datasets from UT Austin’s border router
Assume that traffic is stationary over one-hour periods Choose two incoming flow classes that are very likely to
experience congestion at the server Select IP addresses associated with AOL and HotMail Divide each class into two: AOL1, AOL2 and HotMail1, HotMail2
Filter flow records based on Packets: Discard flows consisting of only 1 packet Duration: Discard flows with duration shorter than 1 second Size: Discard flows with sizes < 8 kB or > 64 kB
Collection date Period TCP records
Dataset2002 11/06/2002 12:58-2:07 PM 5,173,385
Dataset2004 01/21/2004 12:58-1:26 PM 4,440,697
32
Real data: eigenvalues
Parent class (AOL and HotMail) throughput correlation is -0.07 for Dataset2002 and 0.05 for Dataset2004
95% bootstrap confidence intervals of eigenvalues of throughput correlation matrix of 4 classes AOL1, AOL2, HotMail1, and Hotmail2 given below
2 significant factors with explanatory power of 72% for Dataset2002 and 63% for Dataset2004
Eigenvalue
Dataset2002
95% confidence interval
Dataset2004
95% confidence interval
1 (1.5457, 1.7900) (1.3646, 1.4786)
2 (1.0861, 1.3206) (1.0237, 1.1603)
3 (0.7058, 0.9150) (0.8230, 0.9690)
4 (0.2194, 0.4458) (0.5413, 0.6379)
33
Real data: factor loadings
Based on 2 significant factors, determine factor loadings Rotated factor loading estimates:
Rows correspond to classes Columns correspond to shared infrastructure
Estimate 95% bootstrap confidence intervals for loadings to establish accuracy
With 95% confidence, we can identify which flow classes share infrastructure!
Dataset2002 Dataset2004
AOL1AOL2HotMail1Hotmail2
AOL1AOL2HotMail1Hotmail2
34
Outline
Introduction Background and motivation Overview of contributions
Methodology for inferring network resource sharing Conditional sampling Flow filtering Dimensionality reduction
Validation Simulation studies Application to real data with the bootstrap
Conclusion Summary Future work
35
Methodology for inferring resource sharing
1. Define the flow classes of interest, C
2. Set flow filtering thresholds for packets, duration, and size
3. Determine flows F that satisfy the filtering criteria
4. Compute flow class throughputs at discretized times
5. Through conditional sampling, estimate pairwise correlations
6. Find number of factors m using eigenvalues of the correlation matrix and modified Kaiser's rule
7. Perform exploratory factor analysis based on m factors
8. Rotate factor loadings using varimax rotation
9. Determine which flow classes have the largest loading on a given factor: Inference of shared congested resources
36
Impact of research
Application of a structural analysis technique, factor analysis, to explore network properties
Methodology for inferring resource sharing Use of bootstrap methods to make inferential statements
about resource sharing Possible applications
Network monitoring and root cause analysis of poor performance Problem diagnosis and off-line evaluation of congestion status of
networks Route configuration by service providers Configuration and placement of access points in wireless LANs Development of new network service charging schemes
37
Future work
An active measurement approach Probe packets have been used in previous network research Propose “probe flows” for on-demand inference, control of
temporal overlaps, and sending “right-sized” flows Key question: How many probes are required for reliable
inference?
Wireless networks Investigate possibility of clustering wireless users experiencing
“similar network conditions” based only on flow measurements Explore applicability to optimal access point and/or backhaul link
configuration more extensively
Validation with more extensive datasets Use flow records from major internet service providers, possibly
accompanied by routing information
38
Outline
Introduction Background and motivation Overview of contributions
Methodology for inferring network resource sharing Conditional sampling Flow filtering Dimensionality reduction
Validation Simulation studies Application to real data with the bootstrap
Conclusion Summary Future work
39
Publications related to dissertation
Journal D. Arifler, G. de Veciana, and B. L. Evans, “Network tomography
based on flow level measurements,” IEEE/ACM Trans. on Networking, submitted Feb. 2004.
Conferences D. Arifler, G. de Veciana, and B. L. Evans, “Network tomography
based on flow level measurements,” in IEEE Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, May 2004, to appear.
D. Arifler, G. de Veciana, and B. L. Evans, “Inferring path sharing based on flow level TCP measurements,” in IEEE Proc. Int. Conf. on Communications, June 2004, to appear.
40
Other publications
Self-similarity D. Arifler and B. L. Evans, “Modeling the self-similar behavior of
packetized MPEG-4 video using wavelet-based methods,” in Proc. Int. Conf. on Image Processing, Sep. 2002.
Measurement-based network traffic analysis S. Li, S. Park, D. Arifler, “SMAQ: A measurement-based tool for
traffic modeling and queueing analysis. Part I – Design methodologies and software architecture,” IEEE Communications Magazine, vol. 36, no. 8, pp. 56-65, Aug. 1998.
S. Li, S. Park, D. Arifler, “SMAQ: A measurement-based tool for traffic modeling and queueing analysis. Part II – Network applications,” IEEE Communications Magazine, vol. 36, no. 8, pp. 66-77, Aug. 1998.