Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow...

20
Geography of Internet2 Netflow David A. J. Ripley Indiana University, Center for Applied Cybersecurity Research. Tony H. Grubesic Indiana University, Department of Geography. Timothy C. Matisziw University of Missouri, Department of Geography/Department of Civil and Environmental Engineering.

Transcript of Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow...

Page 1: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

Geography of Internet2 Netflow

David A. J. RipleyIndiana University, Center for Applied Cybersecurity Research.

Tony H. GrubesicIndiana University, Department of Geography.

Timothy C. MatisziwUniversity of Missouri, Department of Geography/Department of Civil and

Environmental Engineering.

Page 2: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• geography

• 1. a. The science which has for its object the description of the earth's surface, treating of its form and physical features, its natural and political divisions, the climate, productions, population, etc., of the various countries.

[Oxford English Dictionary, 2nd Edition, 1989.]

Page 3: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• Historically – spatial and temporal movement of goods, money, people and information.

• The Internet doesn’t move goods or people directly, but it can influence their movement.

• The Internet transports information.

– Information ≈ Money

Page 4: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• Access and Awareness

– What kinds of data are available?

– How are they acquired, from whom?

– What are their strengths and limitations?

• Unfamiliar territory

– Opportunities!

Page 5: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• Interest in Internet traffic volumes

– Backbone traffic levels not interesting enough

– IP-level a bit too specific

• Policy/access difficulties

– Interested in relative connectedness (and volume of traffic) between locations.

• For certain values of “location.”

Page 6: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• Internet2 Network Netflow records:

– Sourced from 9 core routers

– 100:1 Sampling

– /21 Anonymization required

– ASN is not present/reliable

– Includes CPS

• So it’s not just academic traffic.

– Typically approx 2 billion records per day.

Page 7: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required
Page 8: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• Needed some way to identify source and destination locations

– The policy-mandated /21 anonymization presents a problem.

– As does the lack of ASN

• How much of a problem is the anonymization?

Page 9: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• IP Anonymization – how much of a problem is it?– Surprisingly, not quite as much as we thought.

– We’re not interested in specific hosts.

• 12,000 prefixes announced on I2 Network;– 36% are 21 bits or shorter

– Assuming constant utilization*, that’s 98% of addresses unambiguously identified to the level of an AS.

Page 10: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• Admittedly, there are some assumptions that definitely need to be tested.

• But still not as bad as we initially thought it might be.

Page 11: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• ASN is a reasonable identifier for our purposes.

– It may be a very poor way to determine location.

• Local vs. national organizations; IU vs. Comcast.

– But it gives us a location for the flow of information/money/whatever

• Which may not have anything to do with the location of the end systems.

Page 12: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• Scripted WHOIS lookups

– Registering entity’s name and postal address.

• Geocoding

– Most GIS packages will do this, with greater precision than we were interested in.

– Centroid of ZIP code areas.

Page 13: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• 24 hour period

• ~2 billion flow records

• 175,000 unique sIPs, 232,000 unique dIPs

• 29,709 ZIP code areas.

• Aggregate traffic hourly by sZIP/dZIP pair.

Page 14: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• Capturing Connectivity

– Degree of node - simple measure of connectivity – number of connections from one node to another.

– Weight with traffic volume?

– Calculate for both ingress and egress

• Although there are issues here with NetFlow

Page 15: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required
Page 16: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required
Page 17: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required
Page 18: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required
Page 19: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• Repeatable methodology

– Approximate endpoints by… location/corporate entity.

– Not ideal, but maybe good enough

– Working with what’s available

• Novel research community & direction

– Additional efforts in the pipeline

Page 20: Geography of Internet2 Netflow - Carnegie Mellon University · •Internet2 Network Netflow records: –Sourced from 9 core routers –100:1 Sampling –/21 Anonymization required

• Questions, comments?

[email protected]