Network Flow Analysis
-
Upload
guest23ccda3 -
Category
Education
-
view
1.085 -
download
2
description
Transcript of Network Flow Analysis
Network Flow AnalysisNetwork Flow Analysis
Mark Mark MeissMeiss
Presentation for Presentation for NaNNaN-Group-Group
October 4, 2004October 4, 2004
OverviewOverview
Data descriptionData description
–– The Internet2 (Abilene) data networkThe Internet2 (Abilene) data network
–– NetflowNetflow traffic data traffic data
Data collectionData collection
Data analysisData analysis
–– TechniquesTechniques
–– Preliminary resultsPreliminary results
Future workFuture work
What is Abilene?What is Abilene?
Internet2 (Abilene) is a nationwide high-Internet2 (Abilene) is a nationwide high-speed data network for research andspeed data network for research andhigher education.higher education.–– Network backbone runs at 10 Network backbone runs at 10 GbpsGbps
–– Over 220 member institutionsOver 220 member institutions
–– Peers with over 40 other research networksPeers with over 40 other research networks
Abilene uses the same protocols asAbilene uses the same protocols asInternet1 but only carries academic traffic.Internet1 but only carries academic traffic.–– This is like the old This is like the old NSFnetNSFnet or or vBNSvBNS
Why is Abilene Interesting?Why is Abilene Interesting?
The Abilene network is a The Abilene network is a transittransit network.network.–– It includes both international and domestic traffic.It includes both international and domestic traffic.
–– It offers a good view of server networks.It offers a good view of server networks.
–– Commercial transmit networks do not share trafficCommercial transmit networks do not share trafficdata.data.
The Abilene network is The Abilene network is uncongesteduncongested..–– Statistics will not be biased by packet loss.Statistics will not be biased by packet loss.
The Abilene network contains The Abilene network contains students.students.–– Students are unconcerned about niceties of law.Students are unconcerned about niceties of law.
–– There is a lot of peer-to-peer and There is a lot of peer-to-peer and ““greygrey”” traffic. traffic.
What is What is ““NetflowNetflow””??
In the early 1990In the early 1990’’s, Cisco introduced as, Cisco introduced a
new network router architecture.new network router architecture.
The The ““line cardsline cards”” in their new routers in their new routers
contained a hardware hash table forcontained a hardware hash table for
current network connections.current network connections.
Somebody got the bright idea of sendingSomebody got the bright idea of sending
entries from the table onto the networkentries from the table onto the network
before clearing them from the hash table.before clearing them from the hash table.
What is a Network Flow?What is a Network Flow?
A A network flownetwork flow consists of one or more packets sent consists of one or more packets sentfrom a from a source (IP, port)source (IP, port) to a to a destination (IP, port)destination (IP, port) using usinga certain a certain transport protocoltransport protocol during some time interval. during some time interval.
Example:Example:
Source: 156.56.103.1, port 80 Source: 156.56.103.1, port 80
DestDest.: 149.159.250.21, port 6132.: 149.159.250.21, port 6132
Protocol: TCP Protocol: TCP
Packets: 20 Packets: 20
The above network flow would be typical for a WebThe above network flow would be typical for a Webconnection.connection.
Wait a Minute!Wait a Minute!
DonDon’’t TCP connections involve two-wayt TCP connections involve two-waycommunication?communication?–– Yes, so every TCP connection is actually Yes, so every TCP connection is actually twotwo flows flows
from the point of view of from the point of view of NetflowNetflow..
UDP and ICMP are stateless, so how can they beUDP and ICMP are stateless, so how can they beaggregated into flows?aggregated into flows?–– We assume that packets with matching 5-tuplesWe assume that packets with matching 5-tuples
during some period of time are part of the same flow.during some period of time are part of the same flow.
IsnIsn’’t it hard for a router to keep up with this?t it hard for a router to keep up with this?–– Yes, so most modern routers Yes, so most modern routers samplesample the flow data at the flow data at
a ratio of about 100:1.a ratio of about 100:1.
How is How is NetflowNetflow transmitted? transmitted?
Most modern routers support the Most modern routers support the ““NetflowNetflowv5v5”” format for representing flows. format for representing flows.–– This includes a variety of additionalThis includes a variety of additional
information about each flow.information about each flow.
The router uses UDP to send packetsThe router uses UDP to send packetscontaining between 1 and 30 flow recordscontaining between 1 and 30 flow recordsto a management workstation.to a management workstation.
–– (In this case, the management workstation is(In this case, the management workstation issitting on my desk.)sitting on my desk.)
Netflow-v5 Header FormatNetflow-v5 Header Format
[padding]
# of flows in packetversion number
engine type engine ID
export time (ns)
router uptime (ms)
sequence number
export time (sec. since 1970-01-01 00:00:00 UTC)
Netflow-v5 Flow Record FormatNetflow-v5 Flow Record Format
dest. masksource mask
ToSTCP flags
source AS destination AS
[padding]
protocol[padding]
source IP address
destination IP address
SNMP ifIndex (in) SNMP ifIndex (out)
IP address of next-hop router
total number of packets
total number of octets
router uptime at start of flow (ms)
router uptime at end of flow (ms)
source port destination port
How Much Data is There?How Much Data is There?
The Abilene routers generate betweenThe Abilene routers generate between700,000,000 and 800,000,000 flows per700,000,000 and 800,000,000 flows perday.day.–– At 48 bytes per record, that amounts toAt 48 bytes per record, that amounts to
around 35 GB of data.around 35 GB of data.
–– Flows come in at a rate of about 3.4 Mbps.Flows come in at a rate of about 3.4 Mbps.
–– Data compresses at a ratio of about 2.8:1.Data compresses at a ratio of about 2.8:1.
Most existing tools canMost existing tools can’’t handle thist handle thisvolume of data.volume of data.
WhatWhat’’s the Motivation?s the Motivation?
Okay, so IOkay, so I’’m storing egregious amounts ofm storing egregious amounts of
data and making my hard drive whimperdata and making my hard drive whimper……
what for?what for?
Flow Data as a Behavioral NetworkFlow Data as a Behavioral Network
Think of a single flow as defining an Think of a single flow as defining an edgeedge from a from asource nodesource node to a to a destination nodedestination node..
The resulting network describes the Internet The resulting network describes the Internet asasitit’’s actually being used.s actually being used.–– Many possible biases are eliminated.Many possible biases are eliminated.
–– A lot of dynamic information is included.A lot of dynamic information is included.
Most structural analysis of the Internet hasMost structural analysis of the Internet has(necessarily) focused on its (necessarily) focused on its physical physical structure.structure.
Imagine a Google based on data about whereImagine a Google based on data about wherepeople actually go!people actually go!
Behavioral Anomaly DetectionBehavioral Anomaly Detection
My main interest is in recognizing differentMy main interest is in recognizing differenttypes of behavior based on flow data.types of behavior based on flow data.–– Can I determine whether a port is running aCan I determine whether a port is running a
peer-to-peer application?peer-to-peer application?
–– Can I see the spread of a new worm acrossCan I see the spread of a new worm acrossthe network?the network?
–– Can I determine what kind of behavior is theCan I determine what kind of behavior is theprelude to an attack?prelude to an attack?
–– Can I find new peer-to-peer applicationsCan I find new peer-to-peer applicationsbefore the word is out?before the word is out?
Preliminary ResultsPreliminary Results
I wish this section had more, but II wish this section had more, but I’’m reallym reallyjust getting off the groundjust getting off the ground……
The size of data has been a majorThe size of data has been a majorchallenge.challenge.
–– The network formed by a day of flow data hasThe network formed by a day of flow data hasabout 29.7 million nodes and 128 millionabout 29.7 million nodes and 128 millionedges.edges.
–– Just finding a way of converting a set ofJust finding a way of converting a set ofcaptured flows to a sparse matrixcaptured flows to a sparse matrixrepresentation has been difficult.representation has been difficult.
Degree DistributionDegree Distribution
Determining Clients and ServersDetermining Clients and Servers
Every network connection involves two hosts:Every network connection involves two hosts:–– The The clientclient is the system that is the system that initiatesinitiates the connection. the connection.
–– The The serverserver is the system that is the system that acceptsaccepts the connection. the connection.
Because of sampling, weBecause of sampling, we’’re as likely to see there as likely to see theclient-to-server side as the server-to-client side.client-to-server side as the server-to-client side.–– This makes the direction basically meaningless.This makes the direction basically meaningless.
We can We can guessguess which is which using the port which is which using the portinformation.information.–– The The more commonmore common port number indicates the port number indicates the server.server.
–– The The less commonless common port number indicates the port number indicates the client.client.
Strength DistributionStrength Distribution
This is the distribution of the total numberThis is the distribution of the total number
of octets in and out of each node.of octets in and out of each node.
Special problem for client/server version ofSpecial problem for client/server version of
the networkthe network
–– If we direct all flows from server to client,If we direct all flows from server to client,
what do we do when we only have a volumewhat do we do when we only have a volume
for the opposite direction?for the opposite direction?
–– For now, I treat the network as beingFor now, I treat the network as being
undirectedundirected for studying strength. for studying strength.
AS NumbersAS Numbers
An An ““autonomous systemautonomous system”” is the basic is the basic
building block of the Internet.building block of the Internet.
–– An AS is responsible for its own interiorAn AS is responsible for its own interior
routing.routing.
–– An AS is usually a large organization.An AS is usually a large organization.
For example, IU has its own AS, as does AT&T.For example, IU has its own AS, as does AT&T.
Top 10 Top 10 ASesASes on Internet2 on Internet2
By degreeBy degree
1.1. HotmailHotmail
2.2. MicrosoftMicrosoft
3.3. Microsoft-EuropeMicrosoft-Europe
4.4. North Carolina (NCREN)North Carolina (NCREN)
5.5. Michigan (MERIT)Michigan (MERIT)
6.6. University of WashingtonUniversity of Washington
7.7. MITMIT
8.8. UC-BerkeleyUC-Berkeley
9.9. UMassUMass
10.10. China (CERNET)China (CERNET)
By strengthBy strength
1.1. AbileneAbilene
2.2. University of OregonUniversity of Oregon
3.3. HotmailHotmail
4.4. MicrosoftMicrosoft
5.5. North Carolina (NCREN)North Carolina (NCREN)
6.6. UCSDUCSD
7.7. UCLAUCLA
8.8. Michigan (MERIT)Michigan (MERIT)
9.9. University of WashingtonUniversity of Washington
10.10. UMassUMass
TCP PortsTCP Ports
Top 10 TCP Ports on Internet2Top 10 TCP Ports on Internet2
By degreeBy degree
1.1. WebWeb
2.2. GnutellaGnutella
3.3. MS MessengerMS Messenger
4.4. SQL ServerSQL Server
5.5. Web (Encrypted)Web (Encrypted)
6.6. GnutellaGnutella
7.7. MailMail
8.8. Web Tunneling (8082)Web Tunneling (8082)
9.9. BitTorrentBitTorrent
10.10. UsenetUsenet
By strengthBy strength
1.1. WebWeb
2.2. iperfiperf
3.3. iperfiperf
4.4. UsenetUsenet
5.5. RTP (Streaming)RTP (Streaming)
6.6. iperfiperf
7.7. SSHSSH
8.8. BitTorrentBitTorrent
9.9. Port 388 ?!?Port 388 ?!?
10.10. FTPFTP
Where Do I Go Next?Where Do I Go Next?
Start to look at the dynamics of the network.Start to look at the dynamics of the network.
Focus on individual ports.Focus on individual ports.
Examine clustering coefficients.Examine clustering coefficients.
Attempt to filter out spoofed traffic.Attempt to filter out spoofed traffic.
Consider the server-only and client-onlyConsider the server-only and client-only
networks.networks.
–– This will involve treating flows as edges in a This will involve treating flows as edges in a bipartitebipartitegraph.graph.
Cluster nodes, Cluster nodes, ASesASes, and ports., and ports.
Thank You!Thank You!
Any thoughts, questions, comments,Any thoughts, questions, comments,
complaints, or observations are allcomplaints, or observations are all
welcome!welcome!