Measuring Internet Speed: Current Challenges and Future ... · interest in measuring user Internet...

8
Measuring Internet Speed: Current Challenges and Future Recommendations Nick Feamster Princeton University Jason Livingood Comcast Abstract Government organizations, regulators, consumers, Internet service providers, and application providers alike all have an interest in measuring user Internet “speed”. Access speeds have increased by an order of magnitude in past years, with gigabit speeds available to tens of millions of homes. Ap- proaches must evolve to accurately reflect the changing user experience and network speeds. This paper offers historical and technical background on current speed testing methods, highlights their limitations as access network speeds con- tinue to increase, and offers recommendations for the next generation of Internet “speed” measurement. 1 Introduction Various governmental organizations have begun to rely on so-called “Internet speed tests” to measure broadband In- ternet speed. Examples of these programs include the Fed- eral Communications Commission’s “Measuring Broadband America” program [7], California’s CALSPEED program [4], the United Kingdom’s Home Broadband Performance Pro- gram [23], and various other initiatives in states including Minnesota [18], New York [1921], and Pennsylvania [26]. These programs have various goals, ranging from assessing whether ISPs are delivering on advertised speeds to assessing potentially underserved rural areas that could benefit from broadband infrastructure investments. The accuracy of measurement is critical to these assess- ments, as measurements can inform everything from invest- ment decisions to policy actions and even litigation. Unfortu- nately, these efforts sometimes rely on outmoded technology, making the resulting data unreliable or misleading. This pa- per describes the current state of speed testing tools, outlines their limitations, and explores paths forward to better inform the various technical and policy ambitions and outcomes. Some current speed test tools were well-suited to measur- ing access link capacity a decade ago but are no longer useful because they made a design assumption that the Internet Ser- vice Provider (ISP) last mile access network was the most constrained (bottleneck) link. This is no longer a good as- sumption, due to the significant increases in Internet access speeds due to new technologies. Ten years ago, a typical ISP in the United States may have delivered tens of megabits per second (Mbps). Today, it is common to have ten times faster (hundreds of megabits per second), and gigabit speeds are available to tens of millions of homes. The performance bottleneck has often shifted from the ISP access network to a user’s device, home WiFi network, network interconnections, speed testing infrastructure, and other areas. A wide range of factors can influence the results of an In- ternet speed test, including: user-related considerations, such as the age of the device; wide-area network considerations, such as interconnect capacity; test-infrastructure considera- tions, such as test server capacity; and test design, such as whether the test runs while the user’s access link is otherwise in use. Additionally, the typical web browser opens multiple connections in parallel between an end user and the server to increasingly localized content delivery networks (CDNs), reflecting an evolution of applications that ultimately effects the user experience. These developments suggest the need to evolve our under- standing of the utility of existing Internet speed test tools, and consider how these tools may need to be redesigned to present a more representative measure of a user’s Internet experience. 2 Background In this section, we discuss and define key network perfor- mance metrics, introduce the general principles of Internet “speed tests” and explore the basic challenges facing any speed test. 2.1 Performance Metrics When people talk about Internet “speed”, they are generally talking about throughput. End-to-end Internet performance is typically measured with a collection of metrics—specifically throughput (i.e., “speed”), latency, and packet loss. Figure 1 shows an example speed test from a mobile phone on a home WiFi network. It shows the results of a “native” speed test from the Ookla Android speed test application [?] run in New Jersey, a canonical Internet speed test. This native application reports the users ISP, the location of the test server destination, and the following performance metrics: Throughput is the amount of data that can be transferred be- tween two network endpoints over a given time interval. For example, throughput can be measured between two points in a given ISP’s network, or it can be measured for an end-to-end path, such as between a client device and a server at some other place on the Internet. Typically a speed test measures 1 arXiv:1905.02334v3 [cs.NI] 31 Oct 2019

Transcript of Measuring Internet Speed: Current Challenges and Future ... · interest in measuring user Internet...

Page 1: Measuring Internet Speed: Current Challenges and Future ... · interest in measuring user Internet “speed”. Access speeds have increased by an order of magnitude in past years,

Measuring Internet Speed:Current Challenges and Future Recommendations

Nick FeamsterPrinceton University

Jason LivingoodComcast

AbstractGovernment organizations, regulators, consumers, Internetservice providers, and application providers alike all have aninterest in measuring user Internet “speed”. Access speedshave increased by an order of magnitude in past years, withgigabit speeds available to tens of millions of homes. Ap-proaches must evolve to accurately reflect the changing userexperience and network speeds. This paper offers historicaland technical background on current speed testing methods,highlights their limitations as access network speeds con-tinue to increase, and offers recommendations for the nextgeneration of Internet “speed” measurement.

1 IntroductionVarious governmental organizations have begun to rely onso-called “Internet speed tests” to measure broadband In-ternet speed. Examples of these programs include the Fed-eral Communications Commission’s “Measuring BroadbandAmerica” program [7], California’s CALSPEED program [4],the United Kingdom’s Home Broadband Performance Pro-gram [23], and various other initiatives in states includingMinnesota [18], New York [19–21], and Pennsylvania [26].These programs have various goals, ranging from assessingwhether ISPs are delivering on advertised speeds to assessingpotentially underserved rural areas that could benefit frombroadband infrastructure investments.

The accuracy of measurement is critical to these assess-ments, as measurements can inform everything from invest-ment decisions to policy actions and even litigation. Unfortu-nately, these efforts sometimes rely on outmoded technology,making the resulting data unreliable or misleading. This pa-per describes the current state of speed testing tools, outlinestheir limitations, and explores paths forward to better informthe various technical and policy ambitions and outcomes.

Some current speed test tools were well-suited to measur-ing access link capacity a decade ago but are no longer usefulbecause they made a design assumption that the Internet Ser-vice Provider (ISP) last mile access network was the mostconstrained (bottleneck) link. This is no longer a good as-sumption, due to the significant increases in Internet accessspeeds due to new technologies. Ten years ago, a typicalISP in the United States may have delivered tens of megabitsper second (Mbps). Today, it is common to have ten timesfaster (hundreds of megabits per second), and gigabit speeds

are available to tens of millions of homes. The performancebottleneck has often shifted from the ISP access network to auser’s device, home WiFi network, network interconnections,speed testing infrastructure, and other areas.

A wide range of factors can influence the results of an In-ternet speed test, including: user-related considerations, suchas the age of the device; wide-area network considerations,such as interconnect capacity; test-infrastructure considera-tions, such as test server capacity; and test design, such aswhether the test runs while the user’s access link is otherwisein use. Additionally, the typical web browser opens multipleconnections in parallel between an end user and the serverto increasingly localized content delivery networks (CDNs),reflecting an evolution of applications that ultimately effectsthe user experience.

These developments suggest the need to evolve our under-standing of the utility of existing Internet speed test tools,and consider how these tools may need to be redesigned topresent a more representative measure of a user’s Internetexperience.

2 BackgroundIn this section, we discuss and define key network perfor-mance metrics, introduce the general principles of Internet“speed tests” and explore the basic challenges facing any speedtest.

2.1 Performance MetricsWhen people talk about Internet “speed”, they are generallytalking about throughput. End-to-end Internet performance istypically measured with a collection of metrics—specificallythroughput (i.e., “speed”), latency, and packet loss. Figure 1shows an example speed test from a mobile phone on a homeWiFi network. It shows the results of a “native” speed testfrom the Ookla Android speed test application [?] run in NewJersey, a canonical Internet speed test. This native applicationreports the users ISP, the location of the test server destination,and the following performance metrics:Throughput is the amount of data that can be transferred be-tween two network endpoints over a given time interval. Forexample, throughput can be measured between two points in agiven ISP’s network, or it can be measured for an end-to-endpath, such as between a client device and a server at someother place on the Internet. Typically a speed test measures

1

arX

iv:1

905.

0233

4v3

[cs

.NI]

31

Oct

201

9

Page 2: Measuring Internet Speed: Current Challenges and Future ... · interest in measuring user Internet “speed”. Access speeds have increased by an order of magnitude in past years,

Figure 1: Example metrics from an Ookla Speedtest in New Jersey,a canonical Internet speed test.

both downstream (download), from server to client, and up-stream (upload), from client to server (Bauer et al. [2] offeran in-depth discussion of throughput metrics). Throughputis not a constant; it changes from minute to minute based onmany factors, including what other users are doing on theInternet. Many network performance tests, such as the FCCtest [7] and Ookla’s speed test, include additional metrics thatreflect the user’s quality of experience.

Latency is the time it takes for a single data packet to travel toa destination. Typically latency is measured in terms of round-trip latency, since measuring one-way latency would requiretight time synchronization and the ability to instrument bothsides of the Internet path. Latency generally increases withdistance, due to factors such as the speed of light for opticalnetwork segments; other factors can influence latency, includ-ing the amount of queueing or buffering along an end-to-endpath, as well as the actual network path that traffic takes fromone endpoint to another. TCP throughput is inversely propor-tional to end-to-end latency [?]; all things being equal, then,a client will see a higher throughput to a nearby server than itwill to a distant one.

Jitter is the variation between two latency measurements.Large jitter measurements are problematic.

Packet Loss Rate is typically computed as the number oflost packets divided by the number of packets transmitted.Although high packet loss rates generally correspond to worseperformance, some amount of packet loss is normal because aTCP sender typically uses packet loss as the feedback signalto determine the best transmission rate. Many applicationssuch as video streaming are designed to adapt well to packetloss without noticeably affecting the end user experience, sothere is no single level of packet loss that automatically trans-lates to poor application performance. Additionally, certainnetwork design choices, such as increasing buffer sizes, canreduce packet loss, but at the expense of latency, leading to acondition known as “buffer bloat” [3, 13].

Figure 2: TCP Dynamics.

2.2 Speed Test Principles and Best PracticesActive Measurement. Today’s speed tests are generally re-ferred to as active measurement tests, meaning that they at-tempt to measure network performance by introducing newtraffic into the network (i.e., so-called “probe traffic”). This isin contrast to passive tests, which observe traffic passing overa network interface to infer performance metrics. For speedtesting, active measurement is the recognized best practice,but passive measurement can be used to gauge other perfor-mance factors, such as latency, packet loss, video quality, andso on.Measuring the Bottleneck Link. A typical speed test sendstraffic that traverses many network links, including the WiFilink inside the user’s home network, the link from the ISPdevice in the home to the ISP network, and the many networklevel hops between the ISP and the speed test server, whichis often hosted on a network other than the access ISP. Thethroughput measurement that results from such a test in factreflects the capacity of the most constrained link, sometimesreferred to as the “bottleneck” link—the link along the end-to-end path that is the limiting factor in end-to-end throughput.If a user has a 1 Gbps connection to the Internet but theirhome WiFi network is limited to 200 Mbps, then any speedtest from a device on the WiFi network to the Internet willnot exceed 200 Mbps. Bottlenecks can exist in an ISP accessnetwork, in a transit network between a client and server, inthe server or server data-center network, or other places. Inmany cases the bottleneck is located somewhere along theend-to-end path that is not under the ISP’s or user’s directcontrol.Use of Transmission Control Protocol. Speed tests typi-cally use the Transmission Control Protocol (TCP) to mea-sure throughput. In keeping with the nature of most Internetapplication transfers today—including, most notably, webbrowsers—most speed tests use multiple parallel TCP con-nections. Understanding TCP’s operation is critical to thedesign of an accurate speed test. Any TCP-based speed testshould be: (1) long enough to measure steady-state transfer;(2) recognize that TCP transmission rates naturally vary overtime, and (3) use multiple TCP connections. Figure 2 showsTCP’s dynamics, including the initial slow start phase. Dur-ing TCP slow start, the transmission rate is far lower than thenetwork capacity. Including this period as part of a through-

2

Page 3: Measuring Internet Speed: Current Challenges and Future ... · interest in measuring user Internet “speed”. Access speeds have increased by an order of magnitude in past years,

(a) Five successive runs ofOokla Speedtest yield vari-able results on downstreamthroughput.

(b) Internet Health Test runsin succession to six differentservers. The test measures con-sistently lower throughput andalso shows variability, bothto different servers and acrosssuccessive test runs.

Figure 3: Successive runs of different throughput tests.

put calculation will result in a throughput measurement that isless than the actual available network capacity. If test durationis too short, the test will tend to underestimate throughput. Asa result, accurate speed test tools must account for TCP slowstart. Additionally, instantaneous TCP throughput continuallyvaries because the sender tries to increase its transfer rate in anattempt to find and use any spare capacity (a process knownas “additive increase multiplicative decrease” or AIMD).Inherent Variability. A speed test measurement can producehighly variable results. Figure 3 shows an illustrative exampleof typical variability that a speed test might yield, both forInternet Health Test (IHT) and Ookla Speedtest. These mea-surements were performed successively on the same Comcastconnection provisioned for 200 Mbps downstream and 10Mbps upstream throughput. The tests were performed insuccession. Notably, successive tests yield different measure-ments. IHT, a web front-end to a tool called the NetworkDiagnostic Test (NDT), also consistently and significantlyunder-reports throughput, especially at higher speeds.

3 Limitations of Existing Speed TestsExisting speed tests have a number of limitations that havebecome more acute in recent years, largely as a result of fasterISP access links and the proliferation of home wireless net-works. The most profound change is that as network accesslinks have become faster, the network bottleneck has movedfrom the ISP access link to elsewhere on the network. Adecade ago, the network bottleneck was commonly the accessISP link; with faster ISP access links, the network bottleneckmay have moved any number of places, from the home wire-less network to the user’s device itself. Other design factorsmay also play a role, including how measurement samplesare taken and the provisioning of the test infrastructure itself.

3.1 User-Related ConsiderationsThe home wireless network. Speed tests that are run over ahome wireless connection often reflect a measurement of theuser’s home wireless connection, not that of the access ISP,because the WiFi network itself is usually the lowest capacity

Figure 4: Distribution of download speeds across different devicetypes. Older devices do not support 802.11ac, so fail to consistentlyhit 100 Mbps.

link between the user and test server [1,5,16,25,27,30]. Manyfactors affect the performance of the user’s home wirelessnetwork, including: distance to the WiFi Access Point (AP)and WiFi signal strength, technical limitation of a wirelessdevice and/or AP, other users and devices operating on thesame network, interference from nearby APs using the samespectrum, and interference from non-WiFi household devicesthat operate on the same spectrum (e.g., microwave ovens,baby monitors, security cameras).

Many past experiments demonstrate that the user’s WiFi—not the ISP—is often the network performance bottleneck.Sundaresan et al. found that whenever downstream through-put exceeded 25 Mbps, the user’s home wireless networkwas almost always the bottleneck [30]. Although the studyis from 2013, and both access link speeds and wireless net-work speeds have since increased, the general trend of homewireless bottlenecks is still prevalent.

Client hardware and software. Client types range fromdedicated hardware, to software embedded in a device onthe user’s network, to native software made for a particularuser operating system, and web browsers. Client type hasan important influence on the test results, because some maybe inherently limited or confounded by user factors. Dedi-cated hardware examples include the SamKnows whiteboxand RIPE Atlas probe. Embedded software refers to exam-ples where the software is integrated into an existing networkdevice such as cable modem, home gateway device, or WiFiaccess point. A native application is software made specif-ically to run on a given operating system such as Android,iOS, Windows, and Mac OS. Finally, web-based tests simplyrun from a web browser. In general, dedicated hardware andembedded software approaches tend to be able to minimizethe effect of user-related factors and are more accurate as aresult.

Many users continue to use older wireless devices in theirhomes (e.g., old iPads and home routers) that do not sup-port higher speeds. Factors such as memory, CPU, operatingsystem, and network interface card (NIC) can significantly

3

Page 4: Measuring Internet Speed: Current Challenges and Future ... · interest in measuring user Internet “speed”. Access speeds have increased by an order of magnitude in past years,

(a) Ookla router-based test. (b) Ookla native desktop test.

Figure 5: Ookla Speedtest, router-based test and native desktop testfrom the same home network.

affect throughput measurements. For example, if a user hasa 100 Mbps Ethernet card in their PC connected to a 1 GbpsInternet connection, their speed tests will never exceed 100Mbps and that test result cannot be said to represent a capacityissue in the ISP network; it is a device limitation. As a result,many ISPs document recommended hardware and softwarestandards [32], especially for 1 Gbps connections. The limita-tions of client hardware can be more subtle. Figure 4 showsan example using iPhone released in 2012–2015. This showsthat any user with an iPhone 5s or older is unlikely to reach100 Mbps, likely due to the lack of a newer 802.11ac wirelessinterface.

Router-based testing vs. device-based testing. Figure 5shows an example of two successive speed tests. Figure 5auses software embedded in the user’s router, so that no othereffects of the local network could interfere. Figure 5b showsthe same speed test (i.e., Ookla Speedtest), on the same net-work, performed immediately following the router-based testusing native software on a mobile device over WiFi. Thethroughput reported from the user’s mobile device on thehome network is almost half of the throughput that is reportedwhen the speed test is taken directly from the router.

Competing “cross traffic”. At any given time, a singlenetwork link is simultaneously carrying traffic from manysenders and receivers. Thus, any single network transfer mustshare the available capacity with the competing traffic fromother senders—so-called cross traffic. Although sharing ca-pacity is natural for normal application traffic, a speed testthat shares the available capacity with competing cross traf-fic will naturally underestimate the total available networkcapacity. Client-based speed tests cannot account for crosstraffic; because the client cannot see the volume of other traf-fic on the same network, whereas a test that runs on the user’shome router can account for cross traffic when conductingthroughput measurements.

3.2 Wide-Area Network Considerations

Impaired ISP Access Network Links An ISP’s “last mile”access network links can become impaired. For example,the quality of a DOCSIS connection to a home can becomeimpaired by factors such as a squirrel chewing through a lineor a bad ground wire. Similarly, fixed wireless connectionscan be impaired by weather or leaves blocking the antenna.To mitigate the potential for an individual impairment undulyinfluencing ISP-wide results, tests should be conducted witha large number of users.

Access ISP capacity. Capacity constraints within an ISP’snetwork can exist, whether in the access network, regionalnetwork (metropolitan area), or backbone network. Regionaland backbone networks usually have excess capacity so theonly periods when they may be constrained would be theresult of a disaster (e.g., hurricane damage) or temporaryconditions such fiber cuts or BGP hijacking. Usually ISPcapacity constraints arise in the last-mile access networks,which are by nature shared in the first mile or first networkelement, (e.g., passive optical networking (PON), DOCSIS,DSL, 4G/5G, WiFi, point-to-point wireless).

Transit and interconnect capacity. Another significant con-sideration is the connection to “transit” and “middle mile”networks. The interconnects between independently oper-ated networks may also introduce throughput bottlenecks. Asuser speeds reach 1 Gbps, ensuring that there are no capacityconstraints on the path between the user and test server—especially across transit networks—is a major consideration.In one incident in 2013, a bottleneck in the Cogent transitnetwork reduced NDT throughput measurements by as muchas 90%. Test results improved when Cogent began prioritiz-ing NDT test traffic over other traffic. Transit-related issueshave often affected speed tests. In the case of the FCC’s MBAplatform, this prompted them to add servers on the Level 3network to isolate the issues experienced with M-Lab’s in-frastructure and the Cogent network, and M-Labs has alsoadded additional transit networks to reduce their reliance onone network.

Middleboxes. End-to-end paths often have devices along thepath, called “middleboxes”, which can affect performance.For example, a middlebox may perform load balancing orsecurity functions (e.g., malware detection, firewalls). Asaccess speeds increase, the capacity of middleboxes mayincreasingly be a constraint, which will mean that test resultswill reflect the capacity of those middleboxes rather than theaccess link or other measurement target.

Rate-limiting. Application-layer or destination-based ratelimiting, often referred to as throttling, can also cause theperformance that users experience to diverge from conven-tional speed tests. Choffnes et al. have developed Wehe,which detects application-layer rate limiting [31]; thus far,the research has focused on HTTP-based video streamingde-prioritization and rate-limiting. Such rate limiting couldexist at any point on the network path, though most com-

4

Page 5: Measuring Internet Speed: Current Challenges and Future ... · interest in measuring user Internet “speed”. Access speeds have increased by an order of magnitude in past years,

monly it may be expected in an access network or on thedestination server network. In the latter case, virtual serversor other hosted services may be priced by peak bitrate andtherefore a hard-set limit on total peak bitrate or per-user-flowbitrate may exist. Web software such as Nginx has featuresfor configuring rate limiting [22], as cloud-based servicesmay charge by total network usage or peak usage; for exam-ple, Oracle charges for total bandwidth usage [24], and FTPservices often enforce per-user and per-flow rate limits [12].

Rate-boosting. Rate-boosting is the opposite of rate limiting;it can enable a user to temporarily exceed their normal provi-sioned rate for a limited period. For example, a user may havea 100 Mbps plan but may be allowed to burst to 250 Mbpsfor limited periods if spare capacity exists. This effect wasnoted in the FCCs first MBA report in 2011 and led to use ofa longer duration test to measure “sustained” speeds [8]. Suchrate-boosting techniques appear to have fallen out of favor,perhaps partly due greater access speeds or the introductionof new technologies such as DOCSIS channel bonding.

3.3 Test Infrastructure ConsiderationsBecause speed tests based on active measurements rely onperforming measurements to some Internet endpoint (i.e., ameasurement server), another possible source of a perfor-mance bottleneck is the server infrastructure itself.

Test infrastructure provisioning. The test server infrastruc-ture must be adequately provisioned so that it does not becomethe bottleneck for the speed tests. In the past, test servers havebeen overloaded, misconfigured, or otherwise not performingas necessary, as has been the case periodically with M-Labservers used for both FCC MBA testing and NDT measure-ments. Similarly, the data center switches or other networkequipment to which the servers connect may be experienc-ing technical problems or be subject to other performancelimitations. In the case of the FCC MBA reports, at onepoint this resulted in discarding of data collected from M-Labservers due to severe impairments [6, 9]. The connectionbetween a given data-center and the Internet may also beconstrained, congested, or otherwise technically impaired, aswas the case when some M-Lab servers were single-homed toa congested Cogent network. Finally, the servers themselvesmay be limited in their capacity: if, for example, a server hasa 1 Gbps Ethernet connection (with real-world throughputbelow 1 Gbps) then the server cannot be expected to mea-sure several simultaneous 1 or 2 Gbps tests. Many otherinfrastructure-related factors can affect a speed test, includingserver storage input and output limits, available memory andCPU, and so on. Designing and operating a high scale, reli-able, high performance measurement platform is a difficulttask, and as more consumers adopt 1 Gbps services this maybecome even more challenging [17].

Different speed test infrastructures have different meansfor incorporating measurement servers into their infrastruc-ture. Ookla allows volunteers to run servers on their ownand contribute these servers to the list of possible servers that

(a) Internet Health Test mistak-enly locating a client in Prince-ton, NJ to Philadelphia, PA(50+ miles away), and per-forming a speed test to a serverto New York City.

(b) Ookla Speedtest directinga client in Princeton, NJ toan on-net Speedtest server inPlainfield, NJ. Ookla also al-lows a user to select anothernearby server.

Figure 6: IHT and Ookla geolocation.

users can perform tests against. Ookla uses empirical mea-surements over time to track the performance of individualservers. Those that perform poorly over time are removedfrom the set of candidate servers that a client can use. Mea-surement Lab, on the other hand, uses a fixed, dedicated set ofservers as part of a closed system and infrastructure. For manyyears, these servers have been: (1) constrained by a 1 Gbpsuplink; (2) shared with other measurement experiments (re-cently, Measurement Lab has begun to upgrade to 10 Gbpsuplinks). Both of these factors can and did contribute to theplatform introducing its own set of performance bottlenecks.

Server placement and selection. A speed test estimates theavailable capacity of the network between the client and theserver. Therefore, the throughput of the test will naturallydepend on the distance between these endpoints as measuredby a packet’s round trip time (RTT). This is extremely impor-tant, because TCP throughput is inversely proportional to theRTT between the two endpoints. For this reason, speed testclients commonly attempt to find the “closest” throughputmeasurement server to provide the most accurate test resultand why many speed tests such as Ookla’s, use thousandsof servers distributed around the world. to select the clos-est server, some tests use a process called “IP geolocation”,whereby a client location is determined from its IP address.Unfortunately, IP geolocation databases are notoriously inac-curate, and client location can often be off by thousands ofmiles. Additionally, latency resulting from network distancetypically exceeds geographic distance, since network pathsbetween two endpoints can be circuitous, and other factorssuch as network congestion on a path can affect latency. Somespeed tests mitigate these effects with additional techniques.For example, Ookla’s Speedtest uses IP geolocation to selectan initial set of servers that are likely to be close, and then theclient selects from that list the one with the lowest RTT (otherfactors may also play into selection, such as server networkcapacity). Unfortunately, Internet Health Test (which usesNDT) and others rely strictly on IP geolocation.

Figure 6 shows stark differences in server selection be-tween two tests: Internet Health Test (which relies on IP

5

Page 6: Measuring Internet Speed: Current Challenges and Future ... · interest in measuring user Internet “speed”. Access speeds have increased by an order of magnitude in past years,

Figure 7: Throughput vs. number of TCP threads. [29]

geolocation and has a smaller selection of servers); and OoklaSpeedtest (which uses a combination of IP geolocation, GPS-based location from mobile devices, and RTT-based serverselection to a much larger selection of servers). Notably, theInternet Health Test not only mis-locates the client (determin-ing that a client in Princeton, New Jersey is in Philadelphia),but it also selects a server that is in New York City, which ismore than 50 miles from Princeton. In contrast, the Ookla test,which selects an on-network Comcast server in Plainfield, NJ,which is merely 21 miles away, and also gives the user theoption of using closer servers through the “Change Server”option.

3.4 Test Design ConsiderationsNumber of parallel connections. A significant considera-tion in the design of a speed test is the number of parallelTCP connections that the test uses to transfer data betweenthe client and server, since the goal of a speed test is to sendas much data as possible and this is usually only possible withmultiple TCP connections. Using multiple connections in par-allel allows a TCP sender to more quickly and more reliablyachieve the available link capacity. In addition to achieving ahigher share of the available capacity (because the throughputtest is effectively sharing the link with itself), a transfer usingmultiple connections is more resistant to network disruptionsthat may result in the sender re-entering TCP slow start aftera timeout due to lost packets.

A single TCP connection cannot typically achieve athroughput approaching full link capacity, for two reasons:(1) a single connection takes longer to send at higher ratesbecause TCP slow start takes longer to reach link capacity,and (2) a single connection is more susceptible to temporarilyslowing down transmission rates when it experiences packetloss (a common occurrence on an Internet path). Past researchconcluded that a speed test should have at least four parallelconnections to accurately measure throughput [29]. For thesame reason, modern web browsers typically open as manyas six parallel connections to a single server in order to max-imize use of available network capacity between the clientand web server.Test duration. The length of a test and the amount of datatransferred also significantly affect test results. As previously

described, a TCP sender does not immediately begin sendingtraffic at full capacity but instead begins in TCP slow startuntil the sending rate reaches a pre-configured threshold value,at which point it begins AIMD congestion avoidance. As aresult, if a transfer is too short, a TCP sender will spend asignificant fraction of the total transfer in TCP slow start,ensuring that the transfer rate will fall far short of availablecapacity. As access speeds increase, most test tools have alsoneeded to increase test duration.Throughput calculation. The method that tests use to calcu-late results appears to vary widely; often this method is notdisclosed. Tests may discard some high and/or low results,may use the median or the mean, may take only the highestresult and discard the rest, etc. This makes different testsdifficult to compare. Finally, some tests may include all of themany phases of a TCP transfer, even though some of thosephases are necessarily at rates below the capacity of a link:

• the slow start phase at the beginning of a transfer (whichoccurs in every TCP connection);

• the initial additive increase phase of the TCP transferwhen the sender is actively increasing its sending ratebut before it experiences the first packet loss that resultsin multiplicative decrease;

• any packet loss episode which results in a TCP timeout,and subsequent re-entry into slow start

Estimating the throughput of the link is not as simple as di-viding the amount of data transferred by the total time elapsedover the course of the transfer. A more accurate estimate ofthe transfer rate would instead measure the transfer duringsteady-state AIMD, excluding the initial slow start period.Many standard throughput tests, including the FCC/Sam-Knows test, omit the initial slow start period. The Ooklatest implicitly omits this period by discarding low-throughputsamples from its average measurement. Tests that include thisperiod will result in a lower value of average throughput thanthe link capacity can support in steady state.Self-selection bias. Speed tests that are initiated by a usersuffer from self-selection bias [14]: many users initiate suchtests only when they are experiencing a technical problem orare reconfiguring their network. For example, when config-uring a home wireless network, a user may run a test overWiFi, then re-position their WiFi AP and run the test again.These measurements may help the user optimize the place-ment of the wireless access point but, by design, they reflectthe performance of the user’s home wireless network, notthat of the ISP. Tests that are user-initiated (“crowdsourced”)are more likely to suffer from self-selection bias. It can bedifficult to use these results to draw conclusions about an ISP,geographic region, and so forth.Infrequent testing. If tests are too infrequent or are onlytaken at certain times of day, the resulting measurements maynot accurately reflect a user’s Internet capacity. An analogywould be looking out a window once per day in the evening,seeing it was dark outside, and concluding that it must be dark

6

Page 7: Measuring Internet Speed: Current Challenges and Future ... · interest in measuring user Internet “speed”. Access speeds have increased by an order of magnitude in past years,

24 hours a day. Additionally, if the user only conducts a testwhen there is a transient problem, the resulting measurementmay not be representative of the performance that a usertypically experiences. Automatic tests run multiple timesper day at randomly selected times during peak and off-peaktimes can account for some of these factors.

4 The Future of Speed TestingSpeed testing tools will need to evolve as end user connec-tions approach and exceed 1 Gbps, especially given that somany policy, regulatory, and investment decisions are basedon speed measurements. As access network speeds increaseand the performance bottlenecks move elsewhere on the path,speed test design must evolve to keep pace with both fasternetwork technology and evolving user expectations. We rec-ommend the following:Retire outmoded tools such as NDT. NDT, also known asthe Internet Health Test [15], may appear at first glance tobe suitable for speed tests. This is not the case, though itcontinues to be used for speed measurement despite its un-suitability and proven inaccuracy [11]. Its inadequacy formeasuring access link speeds has been well-documented [2].One significant problem is that NDT still uses a single TCPconnection, nearly two decades after this was shown to be in-adequate for measuring link capacity. NDT is also incapableof reliably measuring access link throughput for speeds of100 Mbps or more, as we enter an era of gigabit speeds. Thetest also includes the initial TCP slow start period in the result,leading to a lower value of average throughput than the linkcapacity can support in TCP steady state. It also faces all ofthe user-related considerations that we discussed in Section 3.It is time to retire the use of NDT for speed testing and lookahead to better methods.Use native, embedded, and dedicated measurement tech-niques and devices. Web-based tests (many of which relyon Javascript) cannot transfer data at rates that exceed severalhundred megabits per second. As network speeds increase,speed tests must be “native” applications or run on embed-ded devices (e.g., home router, Roku, Eero, AppleTV) orotherwise dedicated devices (e.g., Odroid, Raspberry Pi, Sam-Knows “white box”, RIPE Atlas probes).Control for factors along the end-to-end path when ana-lyzing results. Section 3 outlined many factors that can affectthe results of a speed test other than the capacity of the ISPlink—ranging from cross-traffic in the home to server locationand provisioning. As access ISP speeds increase, these lim-iting factors become increasingly important, as bottleneckselsewhere along the end-to-end path become increasinglyprevalent.Measure to multiple destinations. As access networkspeeds begin to approach and exceed 1 Gbps, it can be diffi-cult to identify a single destination and end-to-end path thatcan support the capacity of the access link. Looking ahead, itmay make sense to perform active speed test measurementsto multiple destinations simultaneously, to mitigate the possi-

bility that any single destination or end-to-end network pathbecomes the network bottleneck.Augment active testing with application quality metrics.In many cases, a user’s experience is not limited by the ac-cess network speed, but rather the performance of a particularapplication (e.g., streaming video) under the available net-work conditions. As previously mentioned, even the mostdemanding streaming video applications require only tens ofmegabits per second, yet user experience can still suffer as aresult of application performance glitches, such as changesin resolution or rebuffering. As access network speeds in-crease, it will be important to monitor not just “speed testing”but also to develop new methods that can monitor and inferquality metrics for a variety of applications.Adopt standard, open methods to facilitate better com-parisons. It is currently very difficult to directly comparethe results of different speed tests, because the underlyingmethods and platforms are so different. Tools that select thehighest result of several sequential tests, or the average ofseveral, or the average of several tests after the highest andlowest have been discarded. As the FCC has stated [10]: “Awell documented, public methodology for tests is critical tounderstanding measurement results.” Furthermore, tests andnetworks should disclose any circumstances that result in theprioritization of speed test traffic.

Beyond being well-documented and public, the commu-nity should also come to agreement on a set of standards formeasuring access link performance and adopt those standardsacross test implementations.

References[1] Apple: Resolve Wi-Fi and Bluetooth Issues Caused by Wireless

Interference, 2019. https://support.apple.com/en-us/HT201542. (Cited on page 3.)

[2] S. Bauer, D. D. Clark, and W. Lehr. Understanding Broadband SpeedMeasurements. In Technology Policy Research Conference (TPRC),2010. (Cited on pages 2 and 7.)

[3] Bufferbloat. https://www.bufferbloat.net. (Cited onpage 2.)

[4] CALSPEED Program, 2019. http://cpuc.ca.gov/General.aspx?id=1778. (Cited on page 1.)

[5] Avoiding Interference in the 2.4-GHz ISM Band, 2006. https://www.eetimes.com/document.asp?doc_id=1273359.(Cited on page 3.)

[6] FCC Re: Measuring Broadband America Program (Fixed), GN DocketNo. 12264, Aug. 2013. https://ecfsapi.fcc.gov/file/7520939594.pdf. (Cited on page 5.)

[7] FCC: Measuring Broadband America Program. https://www.fcc.gov/general/measuring-broadband-america. (Citedon pages 1 and 2.)

[8] Measuring Broadband America. Technical report, Federal Communi-cations Commission, 2011. https://transition.fcc.gov/cgb/measuringbroadbandreport/Measuring_U.S._-_Main_Report_Full.pdf. (Cited on page 5.)

[9] FCC MBA Report 2014, 2014. https://www.fcc.gov/reports-research/reports/measuring-broadband-america/measuring-broadband-america-2014. (Citedon page 5.)

7

Page 8: Measuring Internet Speed: Current Challenges and Future ... · interest in measuring user Internet “speed”. Access speeds have increased by an order of magnitude in past years,

[10] M-Lab Discussion List: MLab speed test is incorrect?, 2018.https://groups.google.com/a/measurementlab.net/forum/#!topic/discuss/vOTs3rcbp38. (Cited onpage 7.)

[11] Letter to FCC on Docket No. 17-108, 2014. https://ecfsapi.fcc.gov/file/1083088362452/fcc-17-108-reply-aug2017.pdf. (Cited on page 7.)

[12] FTP Rate Limiting, 2019. https://forum.filezilla-project.org/viewtopic.php?t=25895. (Cited onpage 5.)

[13] J. Gettys. Bufferbloat: Dark Buffers in the Internet. In IEEE InternetComputing, 2011. (Cited on page 2.)

[14] J. J. Heckman. Selection bias and self-selection. In Econometrics,pages 201–224. 1990. (Cited on page 6.)

[15] Internet Health Test, 2019. http://internethealthtest.org/. (Cited on page 7.)

[16] Change the Wi-Fi Channel Number to Avoid Interference, 2018.https://www.lifewire.com/wifi-channel-number-change-to-avoid-interference-818208. (Cited onpage 3.)

[17] J. Livingood. Measurement Challenges in the Gigabit Era, June 2018.https://blog.apnic.net/2018/06/21/measurement-challenges-in-the-gigabit-era/. (Cited on page 5.)

[18] CheckspeedMN. https://mn.gov/deed/programs-services/broadband/checkspeedmn. (Cited on page 1.)

[19] A.G. Schneiderman Encourages New Yorkers To Test Internet SpeedsAnd Submit Results As Part Of Ongoing Investigation Of BroadbandProviders, 2017. https://ag.ny.gov/press-release/ag-schneiderman-encourages-new-yorkers-test-internet-speeds-and-submit-results-part. (Citedon page 1.)

[20] Are You Getting The Internet Speeds You Are Paying For? https://ag.ny.gov/SpeedTest. (Cited on page 1.)

[21] New York State Broadband Program Office - Speed Test. https://nysbroadband.ny.gov/speed-test. (Cited on page 1.)

[22] Nginx Rate Limiting, 2019. https://www.nginx.com/blog/rate-limiting-nginx/. (Cited on page 5.)

[23] Broadband Speeds: Research on fixed line home broadband speeds,mobile broadband performance, and related research. https://www.ofcom.org.uk/research-and-data/telecoms-research/broadband-research/broadband-speeds.(Cited on page 1.)

[24] Oracle IaaS Pricing, 2019. https://cloud.oracle.com/en_US/iaas/pricing. (Cited on page 5.)

[25] Six Things That Block Your Wi-Fi, and How to Fix Them, 2011.https://www.pcworld.com/article/227973/six_things_that_block_your_wifi_and_how_to_fix_them.html. (Cited on page 3.)

[26] A Broadband Challenge: Reliable broadband internet ac-cess remains elusive across Pennsylvania, and a Penn Statefaculty member is studying the issue and its impact, 2018.https://news.psu.edu/story/525994/2018/06/28/research/broadband-challenge. (Cited on page 1.)

[27] The 2.4 GHz Spectrum Congestion Problem and AP Form-Factors, 2015. http://www.revolutionwifi.net/revolutionwifi/2015/4/the-dual-radio-ap-form-factor-is-to-blame-for-24-ghz-spectrum-congestion. (Cited on page 3.)

[28] SamKnows Test Methodology White Paper, Dec. 2011.https://availability.samknows.com/broadband/uploads/Methodology_White_Paper_20111206.pdf.(Not cited.)

[29] S. Sundaresan, W. De Donato, N. Feamster, R. Teixeira, S. Crawford,and A. Pescape. Broadband Internet Performance: A View from theGateway. In ACM SIGCOMM, pages 134–145, Aug. 2011. (Citedon page 6.)

[30] S. Sundaresan, N. Feamster, and R. Teixeira. Home Network or AccessLink? Locating Last-mile Downstream Throughput Bottlenecks. InInternational Conference on Passive and Active Network Measurement(PAM), pages 111–123, 2016. (Cited on page 3.)

[31] Wehe, 2019. https://dd.meddle.mobi. (Cited on page 4.)[32] Xfinity Internet Minimum System Recommendations. https://

www.xfinity.com/support/articles/requirements-to-run-xfinity-internet-service. (Cited on page 4.)

8