AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that...

13
AROMA: Routing Oblivious Measurement Analytics Anonymous Author(s) ABSTRACT The Software Dened Networking (SDN) model separates the con- trol and data plane functionality in the network and allows for centralized network algorithms. Such algorithms require a network- wide view to reach informed decisions, which is typically achieved by periodically updating the control view with telemetry data from multiple measurement switches. However, getting an accurate network-wide view (without “overcounting” ows or packets that traverse multiple measurement switches) is challenging. Therefore, existing solutions often simplify the problem by making assump- tions on the routing or measurement switch placement. We introduce AROMA, a network-wide and routing oblivious measurement infrastructure that generates a uniform sample of packets and ows, regardless of the topology, and without assump- tions on the routing. Our techniques are built to work in the data plane of programmable PISA switches. Additionally, we provide control plane algorithms that utilize the samples for a variety of essential measurement tasks and present formal accuracy guar- antees for these algorithms. Using extensive emulations on real network traces, we show that our algorithms are competitively ac- curate compared to the best existing solution, without making any assumptions or restrictions on the routing and the measurement switches placement. ACM Reference Format: Anonymous Author(s). 2019. AROMA: Routing Oblivious Measurement Analytics. In Proceedings of ACM SIGMETRICS / IFIP Performance (SIGMET- RICS’19). ACM, New York, NY, USA, Article 4, 13 pages. https://doi.org/10. 1145/nnnnnnn.nnnnnnn 1 INTRODUCTION Many network applications, such as load balancing, QoS enforce- ment, anomaly and intrusion detection, and trac engineering [6, 9, 11, 20, 27, 33, 38], rely on network-wide analytics of the network trac to reach informed decisions. To provide these capabilites, data is collected at multiple measurement switches and sent for analysis at a central collector. The collector combines the data col- lected from all measurement switches to assemble a network-wide view of the trac. Packet sampling is a commonly used technique for performing network-wide measurements. Typically, each packet that traverses a measurement switch is sampled with some probability p. Sam- pled packets are then sent to the collector, either individually or aggregated. The packet sample can be processed by the collector Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from [email protected]. SIGMETRICS’19, 2019, Phoenix, Arizona, USA © 2019 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn for diverse measurement tasks such as identifying the heavy hitter ows, calculating hierarchical heavy hitters, estimating the ow size distribution, and identifying useful trac patterns such as super-spreaders and port scans [32, 45]. In a similar manner, ow sampling is useful for obtaining network- wide ow-level measurements. Flows are generally dened as a sequence of related packets, e.g., all the packets between a source- destination pair, a 5-tuple, etc. Such measurements are advanta- geous for network management tasks such as trac engineering, anomaly detection and attack detection. Flow sampling diers from packet sampling in that it samples each ow with some probability p 0 . Regardless of the volume or number of packets in a ow, all ows are sampled with the same probability. A seemingly inher- ent diculty of ow sampling is that since the ows need to be sampled as the packets traverse the network, and therefore the prob- ability of sampling a ow appears to be correlated to the number of packets in that ow. This correlation would require knowing the ow size distribution and accordingly adjusting the probability of sampling a ow for the packets of each ow. Maintaining such per-ow information in the network incurs substantial overhead. Existing solutions such as [39] distribute this overhead by assign- ing dierent ow ranges to dierent switches; yet, this solution requires per-switch conguration that is impacted by the routing and workload. In both packet and ow sampling techniques, the same packet may traverse multiple measurement switches and may therefore be sampled multiple times. This is a fundamental problem that must be addressed in any uniform sampling scheme. Intuitively, a uniform sample selects each packet with the same probability, regardless of the number of switches that the packet traverses. In order to support uniform sampling, some solutions restrict the measurement switch placement to ensure that each packet traverses at most a single measurement switches [28, 47]. However, this approach is undesired as it requires an understanding of the routing and the topology of the network. Alternate approaches mark packets that have already been sampled so that they are not sampled a second time [3], though this method requires un-marking the packets as they leave the network. Packet sampling has been widely used in classic switch mon- itoring capabilities such as sampled NetFlow or sFlow, and is a basic functionality of legacy switches. However, to enable ecient network-wide measurements, we argue that ecient packet and ow sampling should be basic building blocks in emerging pro- grammable switches. Programmable switches allow for Terabit per second scale routing along with a limited programming capability. Such switches are prime candidates for performing the measure- ment due to their centrality and massive throughput. However, their programming model is very limited, supporting only basic arith- metic operations and therefore making it dicult to use existing techniques for uniform sampling [7]. Furthermore, the amount of available memory is restricted and does not suce for maintaining state-per-ow.

Transcript of AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that...

Page 1: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

AROMA: Routing Oblivious Measurement AnalyticsAnonymous Author(s)

ABSTRACTThe Software De�ned Networking (SDN) model separates the con-trol and data plane functionality in the network and allows forcentralized network algorithms. Such algorithms require a network-wide view to reach informed decisions, which is typically achievedby periodically updating the control view with telemetry datafrom multiple measurement switches. However, getting an accuratenetwork-wide view (without “overcounting” �ows or packets thattraverse multiple measurement switches) is challenging. Therefore,existing solutions often simplify the problem by making assump-tions on the routing or measurement switch placement.

We introduce AROMA, a network-wide and routing obliviousmeasurement infrastructure that generates a uniform sample ofpackets and �ows, regardless of the topology, and without assump-tions on the routing. Our techniques are built to work in the dataplane of programmable PISA switches. Additionally, we providecontrol plane algorithms that utilize the samples for a variety ofessential measurement tasks and present formal accuracy guar-antees for these algorithms. Using extensive emulations on realnetwork traces, we show that our algorithms are competitively ac-curate compared to the best existing solution, without making anyassumptions or restrictions on the routing and the measurementswitches placement.ACM Reference Format:Anonymous Author(s). 2019. AROMA: Routing Oblivious MeasurementAnalytics. In Proceedings of ACM SIGMETRICS / IFIP Performance (SIGMET-RICS’19). ACM, New York, NY, USA, Article 4, 13 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTIONMany network applications, such as load balancing, QoS enforce-ment, anomaly and intrusion detection, and tra�c engineering [6,9, 11, 20, 27, 33, 38], rely on network-wide analytics of the networktra�c to reach informed decisions. To provide these capabilites,data is collected at multiple measurement switches and sent foranalysis at a central collector. The collector combines the data col-lected from all measurement switches to assemble a network-wideview of the tra�c.

Packet sampling is a commonly used technique for performingnetwork-wide measurements. Typically, each packet that traversesa measurement switch is sampled with some probability p. Sam-pled packets are then sent to the collector, either individually oraggregated. The packet sample can be processed by the collector

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro�t or commercial advantage and that copies bear this notice and the full citationon the �rst page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior speci�c permission and/or afee. Request permissions from [email protected]’19, 2019, Phoenix, Arizona, USA© 2019 Association for Computing Machinery.ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00https://doi.org/10.1145/nnnnnnn.nnnnnnn

for diverse measurement tasks such as identifying the heavy hitter�ows, calculating hierarchical heavy hitters, estimating the �owsize distribution, and identifying useful tra�c patterns such assuper-spreaders and port scans [32, 45].

In a similarmanner,�ow sampling is useful for obtaining network-wide �ow-level measurements. Flows are generally de�ned as asequence of related packets, e.g., all the packets between a source-destination pair, a 5-tuple, etc. Such measurements are advanta-geous for network management tasks such as tra�c engineering,anomaly detection and attack detection. Flow sampling di�ers frompacket sampling in that it samples each �ow with some probabilityp0. Regardless of the volume or number of packets in a �ow, all�ows are sampled with the same probability. A seemingly inher-ent di�culty of �ow sampling is that since the �ows need to besampled as the packets traverse the network, and therefore the prob-ability of sampling a �ow appears to be correlated to the numberof packets in that �ow. This correlation would require knowingthe �ow size distribution and accordingly adjusting the probabilityof sampling a �ow for the packets of each �ow. Maintaining suchper-�ow information in the network incurs substantial overhead.Existing solutions such as [39] distribute this overhead by assign-ing di�erent �ow ranges to di�erent switches; yet, this solutionrequires per-switch con�guration that is impacted by the routingand workload.

In both packet and �ow sampling techniques, the same packetmay traverse multiple measurement switches and may therefore besampled multiple times. This is a fundamental problem that must beaddressed in any uniform sampling scheme. Intuitively, a uniformsample selects each packet with the same probability, regardlessof the number of switches that the packet traverses. In order tosupport uniform sampling, some solutions restrict the measurementswitch placement to ensure that each packet traverses at most asingle measurement switches [28, 47]. However, this approach isundesired as it requires an understanding of the routing and thetopology of the network. Alternate approaches mark packets thathave already been sampled so that they are not sampled a secondtime [3], though this method requires un-marking the packets asthey leave the network.

Packet sampling has been widely used in classic switch mon-itoring capabilities such as sampled NetFlow or sFlow, and is abasic functionality of legacy switches. However, to enable e�cientnetwork-wide measurements, we argue that e�cient packet and�ow sampling should be basic building blocks in emerging pro-grammable switches. Programmable switches allow for Terabit persecond scale routing along with a limited programming capability.Such switches are prime candidates for performing the measure-ment due to their centrality andmassive throughput. However, theirprogramming model is very limited, supporting only basic arith-metic operations and therefore making it di�cult to use existingtechniques for uniform sampling [7]. Furthermore, the amount ofavailable memory is restricted and does not su�ce for maintainingstate-per-�ow.

Under double-blind submission, please do not distribute
Page 2: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

To overcome these challenges, we introduceApproximateRoutingOblivious Measurement Analytics (AROMA), a sampling-basedmeasurement analytics infrastructure that combines �ow and packetsamples in a routing and workload oblivious manner. AROMA sup-ports a variety of network-wide measurement tasks, provides ac-curacy guarantees, and is compatible with the emerging ProtocolIndependent Switch Architecture (PISA) programmable switches.

The main bene�ts of AROMA are:

• Uniform packet and �ow sampling: The underlying tech-nique used in AROMA is a k-partition hash based structure,which supports sampling based on the packet or �ow identi-�er. Therefore, AROMA can be con�gured to perform packetsampling, �ow sampling, or both. Furthermore, any packet hasan equal chance of being sampled regardless of the number ofmeasurement switches it goes through (as long as it traverses atleast one measurement switch). Similarly, any �ow has an equalchance of being sampled regardless of the number of packets inthe �ow.

• Routing and workload oblivious: Each measurement switchproduces a sample that is independent of the sample produced byother measurement switches. Therefore, there is no need to coor-dinate between the measurement switches (e.g., there is no needto tag a packet so that measurement switches that see it in thefuture do not sample [3]) and no per-switch con�guration is re-quired (i.e., there is no need to divide responsibility for samplingparts of the tra�c across the measurement switches [39]). Suchper-switch con�gurations require knowledge about routing andtra�c distribution. Acquiring this information and con�guringthe network accordingly incur substantial overhead in settingup the measurement infrastructure. Furthermore, network andtra�c dynamics require continually updating these con�gura-tions, creating yet additional overhead. Our method avoids suchoverheads, and requires no information on routing, workload ornetwork topology. It is also the �rst to avoid such overheads for�ow sampling.

• Support for wide range of measurements: We provide nu-merous controller algorithms that utilize the packet and �owsamples to accomplish a variety of network measurement tasks,such as estimating the number of (di�erent) packets and �ows inthe measurement, estimating per-�ow frequency, identifying theheavy hitter �ows, calculating hierarchical heavy hitters, estimat-ing the �ow size distribution, and identifying super-spreaders.

• Practicality: AROMA can be implemented using P4 and de-ployed on PISA switches. Furthermore, the overall switch re-sources required byAROMAareminimal which allows the switchto perform additional functionality. This is a substantial advan-tage to other monitoring techniques such as [10, 34, 35, 41] whichoften utilize a large portion of the switch resources.

• Accuracy: We evaluate AROMA on real network traces andshow that it attains an attractive accuracy over a variety of net-work tasks, and that it is close in performance to existing so-lutions that provide similar guarantees, but are not compatiblewith PISA switches.

To the best of our knowledge, AROMA is the �rst framework toachieve generic, routing and workload oblivious �ow and packetsampling that can be deployed on PISA programmable switches.

The rest of this paper is organized as follows: we �rst present theAROMA framework and data structures (§ 2). Next we analyze theaccuracy bounds and convergence time of AROMA (§ 3), followedby an evaluation (§ 4). We conclude the paper with an overview ofrelated work(§ 5) and a conclusion(§ 6).

Symbol De�nitionS The packet stream

h�di, pidii A packet from �ow �di and packet identi�er pidiU The universe of �ow identi�ersfx The frequency of �ow x 2 UbV an estimate for |S|� The goal error parameter� The goal error probability

MThe number of samples required for the accuracy guar-antee

�A space factor that allows faster convergence (� � 1).We use � ·M slots instead ofM .

eM The number of samples the algorithm produced (eM 2[0,� ·M])

bp The estimated sampling probability (bp = eM/bV )T The actual sample produced (|T | = eM)Tx The number of times x appears in the sample T

bfx An estimate for the frequency of �ow x (i.e., bfx =Tx /bp)

� Heavy hitter threshold

bM(t) The number of samples the algorithm produced bytime t

Table 1: List of symbols and notations

2 THE AROMA FRAMEWORK2.1 OverviewIn a nutshell, AROMA collects �ow and packet samples withinthe dataplane of programmable switches. Such samples are thenmerged by the controller to form uniform network-wide samples.The controller then estimates various statistical properties fromthe network-wide samples. AROMA is partitioned into a dataplanemodule that stores and maintains the samples, and a measurementanalysis module which runs on the centralized controller.

We leverage a combination of packet sampling and �ow sam-pling (as in [39]), which together allow our system to be generaland support a large variety of tasks [40]. Further, AROMA allowsfor measurement tasks to be de�ned after the samples have beencollected, a property which is called late binding in the literature.

Additionally, AROMA employs routing oblivious sampling tech-niques (see also [16]) which guarantees that all packets (or �ows)have an equal chance to be sampled. AROMA uses a two phasehash-based sampling technique to �rst select a sample slot, andthen decide which element the slot should sample. Each samplingslot maintains only the element whose hash value is minimal. Here,an element is either a packet (in packet sampling) or a �ow (in �owsampling). The controller then merges the sampling slots from allswitches, and attains for each slot the element whose hash value is

Page 3: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

(globally) minimal. That is, an element is sampled only if its hashvalue is globally minimal for its corresponding slot. As a result,our system is routing oblivious and its output is mathematicallyidentical regardless of network topology and routing.

Finally, AROMA runs on P4 programmable switches, and natu-rally �ts within the switch constraints. Furthermore, our solutionrequires a minimal number (2-3) of pipeline stages and can be con-�gured to run with any amount of memory. This allows AROMA tooperate alongside higher-level applications such as load balancingor attack detection.

We �rst formally de�ne our model, assumptions, and the nota-tions (Section 2.2). Next, we show a method to store and collect auniform sample (of �ows, or packets) within the data plane usingPISA programmable switches (Section 2.3). Then we present thecontroller algorithm to merge the samples collected distributivelyinto a network-wide uniform sample, and survey ways to utilizethe samples to perform various measurement tasks (Section 2.4).

2.2 PreliminariesWe model the tra�c as an ordered stream of packets S 2 (U ⇥ N)⇤,where each packet h�di, pidii has a unique �ow identi�er �di 2 U,and a packet identi�er pidi 2 N that matches each packet to a single�ow. Flow identi�ers can be source IPs, source and destination IPpairs, or 5-tuples. For packet identi�ers, in the case of TCP, we canuse the TCP sequence number as part of a unique packet identi�er.In general, the works of [21, 48] explain how the packet header�elds can be used to derive unique packet identi�ers. In this work,we assume the existence of unique packet identi�ers.

We use k measurement switches R1, . . .Rk for measurements. Weassume that each packet traverses at least one measurement switch.Yet, some packets may traversemultiple measurement switches, andthe routing rules may change during the measurement. Formally,our only assumption is that suppose each measurement switch seesa subset of the stream (Si ✓ S), then all the measurement switchestogether cover all the packets [ki=1Si = S.

Our model is more general than the that of other network-widemeasurement solutions that assume that packets only visit a singlemeasurement switch [28, 31, 44], or each �ow is routed througha single �xed path [34]. The same model is also used in relatedwork [3, 7] in a software context.

The term �ow refers to the set of packets that share the same �owidenti�er. Given a �ow identi�er x , its frequency fx is the numberof packets with x as their �ow identi�er, i.e., fx = |{i |�di = x}|.

2.3 Data Plane Sampling ModuleWe now introduce our data plane sampling infrastructure. Sec-tion 2.3.1 provides a high-level overview of the algorithm, whileSection 2.3.2 provides the P4 implementation details, adhering tothe PISA architecture.

2.3.1 Algorithm overview. Each measurement switch allocates a�xed size block of memory for M slots. Measurement switchescalculate a hash value in (0, 1] based on the packet (or �ow) identi�er.Each slot stores the item (packet/�ow) with the minimal hash valuefrom all the items that were assigned to the slot. Interestingly, hashcollisions mean that we require time before the slots are �lled. Weshow that the number of �lled slots behaves like a variant of the

h2 ID0.5 C110.7 A1

0.4 G0.9 F1

D h1(D)chooses slot

h2 ID0.5 C11

0.2 D1

0.4 G0.9 F1

if h2(D)<0.7, insert {h2(D), D}

Before After

New item

h1(D)=4h2(D)=0.2

Figure 1: We compute two hashes for each observed item(packet or �ow). h1 determines in which slot to compete; ifthe slot is empty we add the new item. Otherwise, we add itonly if its h2 value is smaller than that of the stored item.

Coupon Collector problem, but instead of trying to �ll all the slotslike the common analysis, we attempt to �ll a certain percentage ofall slots (say 90%). This relaxation asymptotically reduces the timerequired to collect the sample at the expense of a slightly in�atedmemory consumption.

Formally, each measurement switch observes a stream of packetsh�di, pidii, with a �ow identi�er �di and packet identi�er pidi . Wedenote xi as the identi�er used, i.e., xi = �di for �ow sampling andxi = pidi for packet sampling.

Each switch maintains a data structure MEM , which contains(� · M) memory slots, and each slot stores exactly one identi�er.The value � � 1 is selected to ensure a sample of size at leastM . Wediscuss the relation between � and the time to collectM samplesin Section 3.2. Let us denote memory slot j asMEM[j]. Two valuesare maintained within the slot: a hash valueMEM[j].hash and anidenti�erMEM[j].id .

We use two independent randomhash functions:h1 : U ! [0,� ·M) for mapping an identi�er to a memory slot, and h2 : U ! (0, 1]to decide which item to sample. Imperatively, the hash functionsare identical for the measurement switches that participate in themeasurement as it allows merging the data structures for obtaininga network-wide uniform sample and thereby a global view. At ahigh level, each slot receives a fraction of incoming packets, andstores a single identi�er x that has the smallest h2(x) of all thoseobserved by that memory slot.

We initialize all MEM[j].hash to 1. As illustrated in Figure 1,upon observing each packet and determining identi�er xi = �D�,the switch does the following:

(1) Computes the two hash values h1(xi ) and h2(xi ) for thecurrent packet. In the example above, h1(xi ) is the fourthrow, and h2(xi ) = 0.2.

(2) Looks up the hash value stored in MEM[h1(xi )].hash, andignores the packet ifMEM[h1(xi )].hash h2(xi ).

(3) Otherwise, if h2(xi ) < MEM[h1(xi )].hash, then we replacethe existing sample in the slot:

MEM[h1(xi )].hash h2(xi )MEM[h1(xi )].id xi

Page 4: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

Algorithm 1:Maintaining uniform �ow samples1 control FlowSampling( inout headers hdr,

inout metadata meta) {2 register< bit<32> > (1⌧m) MEM_hash;3 register< bit<64> > (1⌧m) MEM_id;4 apply {

//Prepare flow identifier x

5 meta.�owid[31: 0]=hdr.ipv4.srcAddr;6 meta.�owid[63:32]=hdr.ipv4.dstAddr;

//Compute hash h1(x) 2 [0, 2m ), h2(x) 2 [0, 232)7 hash(meta.h1, HashAlgorithm.crc32, 0, {meta.�owid},

1⌧m);8 hash(meta.h2, HashAlgorithm.crc32_custom, 0,

{meta.�owid}, 1⌧32);

9 bit<32> existing_sample_h2;10 MEM_hash.read(existing_sample_h2, meta.h1);

//If the current packet has smaller hash,

replace existing sample

11 if (meta.h2<existing_sample_h2){12 MEM_hash.write(meta.h1, meta.h2);13 MEM_id.write(meta.h1, meta.�owid);14 }15 }16 }

We want to run two instances of our sampling algorithm simul-taneously, one for packet-sampling and the other for �ow-sampling.Recall that we select xi = pidi to sample packets, and xi = �di tosample �ows.

For correctness and accuracy guarantees, we require that at leastM out of the � · M slots will not be empty to obtain an M-sizeduniform sample. We can choose � � 1 to expedite this process; inpractice, a choice of � = 1.5 ⇠ 2 su�ces. See Section 3 for detailedanalysis of the performance for di�erent � values.

2.3.2 Implementation on programmable switches. To achieve Tbps-level aggregated throughput and low forwarding latency, a PISA [14]programmable switch uses a packet processing pipeline architec-ture that allows only simple operations per pipeline stage, andonly has a certain number of hardware stages. The work of [10]summarized the PISA limitations. Most relevant to our case are thelimited number of programmable pipeline stages, the limitationson memory access, and the limitations on arithmetic operations.AROMA’s P4 implementation requires O(1) memory accesses perpacket, and can be implemented using only 2 pipeline stages. Thus,it leaves plenty of room for the measurement switch to performother network applications.

We now discuss some of the implementation details for perform-ing uniform sampling in the data plane. At each programmableswitch, we allocate two register memory arrays each with � · Mentries. Note that we select � such that � ·M = 2m for somem 2 N.Such a selection simpli�es the implementation as we only have

access to random bits, and thus randomizing a number in an arbi-trary range is more di�cult to implement, and incurs additionaloverheads.

We denote the register arrays asMEMhash [i] andMEMid [i], andstore them in adjacent pipeline stages. The power-of-two sizing ofthe arrays allows easy addressing using anm-bit hash function h1.

Since each array entry stores an integer, we use a 32-bit hashfunction for h2(x). That is, the encoding of the real-valued numberh2(x) 2 (0, 1] is the integer 232 · h2(x) � 1. Hash functions h1 andh2 are implemented using CRC32 with di�erent polynomials, andh1 is truncated tom bits. We also initialize all entries inMEMhash

to maximum value 232 � 1.As demonstrated by the P4 code shown in Algorithm 1, for each

incoming packet h�di, pidii, the programmable switch determinesxi (xi 2 {�di, pidi}) and does the following:

(1) Access parsed header �elds, such as IPv4 source and des-tination addresses, to retrieve xi (line 5,6), then computeh1(xi ) 2 [0, 2m ) and h2(xi ) 2 [0, 232) (line 7,8).

(2) Compare the value found inMEMhash [h1(xi )] to h2(xi ).• IfMEMhash [h1(xi )] <= h2(xi ): do nothing.• If MEMhash [h1(xi )] > h2(xi ), replace the existing entryby setting:MEMhash [h1(xi )] h2(xi ),MEMid [h1(xi )] xi ,

as shown in line 12 and 13.The resources required for performing uniform sampling in a

PISA switch are quite minimal; using only two hardware pipelinestages is su�cient. Thus, we can simultaneously implement both�ow-sampling and packet-sampling in existing programmable switchtargets.

2.4 Using the Samples in Control PlaneThe controller merges samples collected from all switches to form aglobal uniform sample set. This process is described in Section 2.4.1.In the subsequent sections, we brie�y describe how the controlleruses the sample set to perform various network measurement tasks.Additionally, Section 2.5 discusses useful measurement tasks thatrequire minor extensions to AROMA.

2.4.1 Merging samples. First, we describe how to merge two sam-ples into a single sample, as illustrated in Figure 2. Repeatedly apply-ing this algorithm allows the controller to merge all the samples.

Given the samples collected by two switches, MEMa [·] andMEMb [·], AROMAmerges them toMEMab [·] as follows: it iteratesover the � ·M slots, and for each slot j it comparesMEMa [j].hashand MEMb [j].hash to select the smaller of the two values as thenew value forMEMab [j].hash. It then setsMEMab [j].id andMEMab [j].count accordingly.

It is straightforward to prove that the resulting values inMEMabare the same as if all packets were observed by either switch a orb or both, as the smaller h2 hash value in each slot will prevail.We repeat this process to merge the samples collected at all theswitches, to obtain the global sampleMEMglobal[j].{hash, id}. Wefurther trimMEMglobal[·] to ignore empty slots.

2.4.2 Number of packets/flows. Perhaps the most fundamentalmeasurement task is to estimate the actual number of packets and

Page 5: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

h2 ID0.5 C11

0.2 D1

0.4 G0.9 F1

h2 ID11

0.6 K0.5 L0.9 M1

0.9 F1

h2 ID0.5 C1

0.6 K0.2 D0.9 M0.4 G0.9 F1

MEMa MEMb MEMab

Merge

Choose smaller h2

Figure 2: An example of the merge process of two sam-ples collected in di�erent measurement switches. Whenthe same slot contains di�erent items, we select the onewhose h2 value is smaller. In this example, we select Dand discard L.

�ows within the measurement. As mentioned, in the routing oblivi-ous setting, this is not equivalent to summing the number of �owsor packets over the di�erent measurement switches as we do notknow how many measurement switches each of them traversed.Similarly to the HyperLogLog algorithm [24], by looking at the hashvalue in each slot, we roughly know how many distinct identi�erscontested for that slot.

More precisely, if the hash value in slot i isMEMglobal[i].hash 2(0, 1], the expected total number of distinct identi�ers hashed tothis slot is 1/MEMglobal[i].hash. Thus, by scaling up the harmonicmean of each slot’s estimate, the total number of distinct identi�ersseen by all the slots can be estimated by:

bV = (� ·M)2Õ� ·M�1i=0 MEMglobal[i].hash

.

Given this estimated number of di�erent packets/�ows, we alsoget an estimation of the sampling probability. For that, assume thateM 2 [0,� ·M] slots were �lled (i.e., we have a uniform sample ofsize eM); then the estimated sampling probability is bp = eM/bV .Werequire the estimated sampling probability for other measurementtasks (such as frequency estimation, superspreaders, and frequencydistribution estimation).

2.4.3 Distributed frequency estimation. To estimate �ow size fxfor a �ow x , we inspect the uniform packet sample set and look atthe packet identi�ers. We denote by

Tx = |{0 i � ·M | MEM�lobal [i].id 2 x}|the number of packets in the global uniform sample set that belongto �ow x . Subsequently, we divide Tx by the estimated samplingprobability bp to get estimated �ow size

bfx = Tx /bp.2.4.4 Distributed heavy hi�ers. We can use the uniformly sampledpackets to estimate heavy hitters, de�ned as those �ows with sizefx which exceeds a � fraction of total packet tra�c (|S|), i.e., fx >� · |S|. Our algorithm outputs every �ow whose frequency in the

sample is at least a � -fraction of the sample size. For example, if� = 1% and we gathered eM = 10000 samples, we will output every�ow that appears in the sample at least 100 times. If an applicationis more recall-oriented or precision-oriented it is possible to changethe threshold to get (with high probability) 100% accuracy in oneof them (at the cost of degrading the other).

2.4.5 Hierarchical heavy hi�ers. We look at the uniformly sampledpackets to determine hierarchical heavy hitters. We report a pre�xas a hierarchical heavy hitter if it appears in more than � · eM packets.Note that this is a simpli�ed de�nition compared to the one in [8].The exact de�nition is a bit cumbersome, but we also support it.

2.4.6 Superspreaders. We de�ne a Superspreader as a source IP ad-dress that communicates with more than � destination IP addresses.Such an IP address appear in many �ows and is therefore likely toappear in the uniform �ow sample. Given a uniform sample of eM�ows, we can examine the �ow identi�ers and see if any source IPaddress appeared more than � ·bp times; such a source IP is sendingout to more than � destination IPs in expectation.

2.5 ExtensionsIn this section, we discuss several extensions to AROMA that re-quire storing additional information within the uniform sample.We start by describing the possible extensions and then survey themeasurement tasks they facilitate.

Count: The same identi�er may appear many times at a sin-gle measurement switch. If we maintain a count for the numberof repeated appearances, the count can be useful for estimating�ow size (when counting repeated �ow identi�ers) or for detectingretransmissions (when counting repeated packet identi�ers). To im-plement counting in the data plane, we add a register memory arrayMEMcount . When a sample �rst appears, we reset the correspond-ing counter’s value to 1. When a sample appeared again, with thesame ID as the existing sample in the slot, we increment the count.In the case of sampling �ow identi�ers, it is also straightforward tocount bytes instead of packets.

When merging di�erent samples with the same identi�er, wechoose the larger count out of the two counts. That is, whenmergingMEMa andMEMb , for any i 2 [� ·M], if we have thatMEMa [i].id =MEMb [i].id , we set

MEMab [i].count =max(MEMa [i].count ,MEMbb [i].count).

Under the assumption that every �ow has at least one measurementswitch that sees all its tra�c, this yields the �ow size for the sampled�ow identi�er. Then, for sampled �ow x = MEM�lobal [i].id , weknow its size fx = MEM�lobal [i].count . Alternatively, if no suchmeasurement switch exists, we can estimate the size of the �owusing packet sampling.

TTL: IP Time-to-Live (TTL) value for packet samples can allowus to order samples of the same packet across multiple switches. Toimplement TTL recording for packet sampling in the data plane, weadd a register memory arrayMEM .TTL. We record the TTL valuein the packet’s IP header whenever we sample a new packet. Usingthe TTL �eld enables the controller to quantify the tra�c betweenevery two measurement switches. Such information may be usefulfor tra�c optimization.

Page 6: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

2.5.1 Flow size distribution estimation. Since we are uniformlysampling �ow identi�ers regardless of �ow size, the count asso-ciated with �ow identi�ers { fx } accurately represents the global�ow size distribution. That is, we can use the number of sampled�ows that has size i for estimating the overall number of �ows fromthat size. To that end, we divide the number of i-sized �ows in our�ow-sample by the estimated �ow sampling probability.

2.5.2 Identifying frequent paths and tra�ic splits. By looking atsampled packets and their TTL at every measurement switch, thecontroller can deduce the routing paths. This deduction allowsestimating the actual switches that processed packets from a givenlarge �ow. It further allows us to create an approximate count foreach path in the network, a capability that may be desirable in arouting-oblivious setting.

2.5.3 Estimating the retransmission rate. Under normal conditions,each packet is only seen once by any measurement switch. If weobserve a count greater than one for a sampled packet, we knowseveral identical copies of the same packet have gone through onemeasurement switch. If the number is very high (e.g., >100), this islikely a routing loop - but if the number is small (e.g., <5) it is likelyto be due to packet loss and retransmissions.

Therefore, by looking at the counts associated with the packetsamples, we can estimate the global average retransmission rate ofthe network. Let R be the average count of the samples, and bV bethe estimated total packet count, the global retransmission rate canbe estimated by R�1bV .

2.5.4 Detecting suspected routing loops. A packet entering a rout-ing loop will visit some measurement switches repeatedly, untilexhausting its TTL. If we happen to sample this packet, we will seean abnormally high count.

When there is no problem in the network, the probability forretransmission is low. Thus, when we saw a packet sample with acount that is higher than 5 (or �bV ), we assume that there is a routingloop in the system. The controller can then use the packet sample(and routing information) to pinpoint the source of the problem.

3 ANALYSISIn this section, we provide rigorous bounds on the accuracy of thealgorithm. As a general note, we refer here to a packet streamwhichcan be distributed in any way between the measurement switchesas long as each packet is measured at least once. All the results inthis section are also applicable to �ow sampling by replacing thenotion to “�ow stream”. Speci�cally, we analyze the guarantee forestimating �ow-sizes which, by simple reductions, also extend toHH and HHH. For superspreaders the analysis is also applicable,although the condition regarding the minimum number of packets(M) is replaced by similar lower bound on the number of �ows. Theentire section assumes that the hash functions are independent. Inpractice, simpler hash functions su�ce for o�ering similar accuracyguarantees [15].

Our goal is to estimate �ow sizes, with high probability, up toan additive error of |S| · � . This type of guarantee is standard instreaming algorithms and appears in [7, 19, 36, 43] and many others.However, due to the nature of our algorithm, we cannot provide thisguarantee immediately but rather require convergence time. That

is, unlike the solution of [7], we cannot always produce a uniformsample of size M after seeing M packets. Thus, we require moretra�c to sample enough packets and achieve the desired accuracy.Intuitively, while [7] stores the packets with the highest (h2) hashin a heap, we �rst apply h1 and map the packet into a slot, and eachsuch slot can hold a single packet. This simpli�cation allows us tomeet the programming model of the PISA switch, but as multiplepackets may be hashed to the same slot before all the slots are full,it may take a while before we gather enough samples. Formally,hash collisions in h1 mean that some packets may not be sampledeven if not all slots are full.

We mark by eM(t) 2 [0,� · M] the number of non-empty slotsin our algorithm after seeing t packets. We then utilize the resultof [7] that shows the accuracy guarantee one gets from analyzing auniform sample of sizeM . We say that our algorithm has convergedonce eM(t) � M and thus we provide the accuracy guarantee.

L���� 3.1. ([7]) Let T ✓ S be a random packet subset of sizeM �

⌃3��2 log2(2/� )

⌥. For a �ow x 2 U, let Tx be its frequency in

the sample T . Then Pr����fx �Tx · |S|/M

���� � |S|�� � .

If � = 1, then the process of collecting the samples from thenonempty slots is known as the Coupon Collector problem [13]. Inthe Coupon Collector problem, a collector wishes to gather allMcouponswhile getting a single coupon, uniformly at random, at eachstep. Since the time to collect the i’th distinct coupon is distributedgeometrically with mean M/(M � i), we have that the expectedtime to collect all coupons is

ÕMi=1M/(M � i) = M lnM +O(M). To

derive a high-probability bound, observe that the probability that agiven coupon is not collected after r steps is (1 � 1/M)r e�r/M .By using the union bound and setting r = M ln(M/� )) we get thatPr [M(r ) < M] M ·e�r/M � . This analysis is directly applicableto our method for � = 1 as we uniformly hash every packet into oneof theM slots and the goal is to �ll all slots. We can then chooseMto guarantee the desired result with probability 1� �/2 and use theunion bound to derive the standard (�,� )-guarantee. We summarizethis in the following theorem.

T������ 3.2. For any �,� > 0, let M =⌃3��2 log2(4/� )

⌥; our

algorithm (with � = 1 and thusM slots) guarantees approximating�ow sizes up to an (|S|�)-additive error, with probability 1 � � , giventhat the number of packets it processes is at leastM · ln(2M/� ).

The above solutionworkswell if themeasurement is long enoughwith respect to the error parameters � and � . However, this mayprove to be too lengthy for accurate measurements. For exampleif � = � = 1% we guarantee the convergence of the algorithm afterabout 4.4 million packets.

3.1 Trading Space for Convergence TimeTo shorten the convergence time, we explore the space to conver-gence time tradeo� that � values larger than 1 o�er. Schematically,by increasing � we pay a constant factor in the amount of spacerequired, but reduce the convergence time asymptotically, as wenow show. We note that while the coupon collector analysis aboveis standard, to the best of our knowledge, the process described inthis section is novel to our work. Particularly, we get that for any

Page 7: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

constant � > 1 the number of packets until convergence drops toO(M), as summarized in the following theorem.

T������ 3.3. Let M 2 N+,� > 1 and denote � = 1 + 1/ln� +ln(2/� )/(M · ln�). When allocated with � ·M slots the algorithm �llsat leastM of them after seeing � ·M packets with probability at least1 � �/2.

P����. For a subset of indices K ✓ [� ·M] of size |K | = M � 1,de�ne by IK an indicator for the event that all packets were mappedonly into the indices of K . This event is bad as it means that thealgorithm cannot produce anM-sized uniform packet sample andfails to provide the approximation guarantee. Observe that the

probability of this event is Pr [IK ] =⇣M�1� ·M

⌘� ·M. Since the number

of such subsets K is�� ·MM�1

�, we can use the union bound to get that

the probability that such a subset exists is at most

Pr [9K ✓ [� ·M] : |K | = M � 1 ^ IK ]

✓� ·MM � 1

◆·✓M � 1� ·M

◆� ·M (e · �)M ·

✓1�

◆� ·M= �/2,

where the last inequality follows from the known binomial coe�-

cient bound�nk�

⇣e ·nk

⌘k. ⇤

We once again use Lemma 3.1 for providing the error guarantee.To exemplify the reduction in convergence time, consider the aboveparameters (� = � = 1% and � = 2 which means double space used).Our result now implies guaranteed convergence after only 630Kpackets, a reduction of over 85%.

T������ 3.4. For any �,� > 0, let M =⌃3��2 log2(4/� )

⌥; our

algorithm (with � > 1 and � · M slots) guarantees approximating�ow sizes up to an (|S|�)-additive error, with probability 1 � � , giventhat it processes at leastM · (1 + 1/ln� + ln(2/� )/(M · ln�)) packets.

3.2 Empirical Convergence TimeThe above analysis provides bounds on the number of packets thatare required to collectM samples for a given � value. In Figure 3,we show the actual convergence time (the minimal time t for whicheM(t) � M), in a synthetic data where all packet identi�ers areunique. As the �gure shows, even setting � = 1+ 2�4, which means6% space overhead, reduces the convergence time by nearly 80%.This observation also indicates that the above analysis providesan upper bound on the convergence time while the performanceis likely to be better in practice. Notice that the �gure evaluatesthe number of packets required to �ll M slots, regardless of theunderlying data, and is therefore applicable to any workload. Wenote that this is a simulation of the convergence rate and thereforeis not restricted to having � ·M a power of two.

3.3 Pre-Convergence PerformanceAnother perspective of analyzing our algorithm would be to con-sider the guarantee it provides at time t , even if t is smaller thanthe convergence time. That is, at time t we can extract eM(t) uni-form samples that allow for a guarantee (�(t),� ), where �(t) is amonotonically decreasing function in t .

Figure 3: To provide the accuracy guarantee for � = � = 1%,we need M = 259316 samples. Here, we have a curve (Anal-ysis) for the convergence time upper bound given in theo-rems 3.2 and 3.4. The x-axis shows the space overhead ofthe algorithm. The red (Required) horizontal line shows M ,which is a lower bound on the convergence for any � value.The number of packets we need for getting M samples inpractice (Actual) is computed 10 times and appears with 95%error bars.

To that end, we need to �nd a lower boundM(t) on the number of

samples collected, as a function of time t , such that Prh eM(t) < M(t)

i<

�/2.Wenow show thatwe can guaranteeM(t) � min {t ,� ·M} /3�O(

pt log��1). According to Lemma 3.1, this means that we have

a (�(t),� ) guarantee for �(t) ⇡q

3 ln(4/� )min{t,� ·M }/3 . This bound, while

quite loose, shows that we have the same asymptotic performanceas the algorithm of [7] that cannot be implemented in P4 and runsin logarithmic worst-case time. In the proof we use the followingversion of the Cherno� bound [37].

L���� 3.5. (Cherno� Bound) Let X ⇠ Bin(n,p) be a binomialrandom variable with mean np, then for any � 2 [0, 1]:

Pr[X < np(1 � � )] e�(�2 ·np)/2.

For convenience, we denote Y = min {t ,� ·M}.T������ 3.6. After seeing t packets, the number of non-empty

slots satisfy Prh eM(t) < Y/3 �

p2Y ln(4/� )

i< �/2.

P����. Consider the �rstY/3 packets; since the number of �lledslots was at most Y/3, each packet has reached an empty slot withprobability at least 2/3. Therefore, we can model a lower bound onthe number of slots these were able to �ll as a binomial randomvariable with mean Y/3 · 2/3 = 2Y/9. Similarly, the number ofslots the next Y/3 packets �ll can be lower bounded by a binomialvariable with mean Y/9. Together, during the �rst 2Y/3 packets we�lled at least Y/3 slots in expectation. For any � > 0, we can thenuse the Cherno� bound for each such interval and get that withprobability 2e�(� 2 ·Y /9)/2, we have �lled at least Y/3(1 � � ) slots.Setting � =

q18 log(4/� )

Y completes the proof. ⇤

We conclude the following performance guarantee before theconvergence of the algorithm.

Page 8: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

C�������� 3.7. For any �,� > 0, let M =⌃3��2 log2(4/� )

⌥and

let t � ·M , then our algorithm estimate �ow sizes with an (�(t),� )-guarantee for �(t) =

r3 ln(4/� )

t/3�p2t ln(4/� )

.

4 EVALUATIONDataset:We used The CAIDA Anonymized Internet Trace 2018 [1].From the “Equinix-new york” high-speed monitor (New York 2018).We summarize the number of distinct �ows, for a given streamlength, in Table 2.

Length 216 217 218 219 220 221 222 223 224 225#�ows 15K 26K 41K 66K 107K 183K 314K 550K 967K 1.69M

Table 2: The number of distinct 5-tuples in themeasurementas a function of the number of packets in the trace. As weshow below, when the number of packets/�ows is large withrespect to the memory size then FlowRadar fails to decodethe �ow sizes.

Metrics:We consider the following performance metrics:(1) RootMean Square Error (RMSE):Measures the di�erences be-

tween predicted values of an estimator to actual values. For-mally, for each �ow x the estimated frequency is bfx and realfrequency is fx . RMSE is calculated as:

q1

|U |Õx 2U(bfx � fx )2.

(2) F1 Score: A quantity that represents the quality of a setapproximation (e.g., the set of heavy hitters). This metriccombines precision (the correct fraction of reported �ows),and recall (the fraction of true �ows that were reported)into a single numerical value in the following manner: F1 =2 · ((precision · recall)/(precision + recall)).

(3) Weighted Mean Relative Di�erence (WMRD): consider theset of �ow sizes { fx |x 2 U} and let z be the size of thelargest �ow. Denote by Fi = |{x 2 U| fx = i}| denote thenumber of �ows of size i . Let bFi be the estimation producedby an algorithm for Fi . De�ne the sum of absolute errorsto be E =

Õzi=1

���Fi � bFi��� and the sum of averages as A =Õz

i=1(Fi +bFi )/2. The metric is then de�ned asWMRD = E/A.WMRD is always between 0 and 2with a perfect match being0 and complete disagreement being 2.

Evaluation Parameters: In Figures 4-6 we used the �rst 225 ⇡33.55 million packets. The x-axis in these plots is the allocatedper-switch space. We de�ne a heavy hitter as a �ow whose size isat least 0.1% of the overall number of packets in the measurement.For the �rst 225 packets of NewYork 2018 dataset this amounts to35 heavy hitters. Similarly, we de�ne a hierarchical heavy hitter asa source network whose size is more than 0.1% of the overall tra�c;this follows the HHH de�nition of [46]. We de�ne a superspreaderas a source IP that communicated with at least � = 1000 distinctdestination IPs. For these parameters, we measured 54 such sources.

In Figures 7-8, where the number of packets varies, we keep the0.1% threshold for HH and HHH and set the SS threshold such thatthere are approximately 50 superspreaders for each data point.

4.1 Evaluation Results4.1.1 Comparison with uniform sampling and so�ware solutions.We start our evaluation by motivating the need for AROMA, bycomparing its accuracy to naive uniform sampling. Recall that theinherent problem in utilizing plain sampling for network-widemeasurement is that packets that traverse longer paths have moreopportunities to be sampled, and thus appear in the sample witha higher probability. Additionally, we compare AROMA to theBEFMR18 software routing oblivious algorithm of [7] which is notimplementable in hardware (see Section 5.5). We used the hop countdistribution by [23, 42] that measured the hop count distributionof the Internet, and assumed a measurement switch at each hop.Speci�cally, this model assumes that the probability for k-hops (fora given �ow) is

Pr[k hops] = 1 + o(1)N

k’m=0

cm+1(lnN )k�m(k �m)! ,

where ci is the i’th Taylor coe�cient of the reciprocal of the Gammafunction 1/�(z) [2, Table 6.1.36]. The work of [42] models the actualhop-count distribution of the Internet at that time as the distributionfor N = 98400, which gives a median hopcount of 12.

We deploy measurement switches on each hop and normalizethe frequency at the controller by either the mean or the medianhop count, of the hopcount distribution. Figure 4 shows the resultsof this evaluation, where Median (mean) is the uniform samplingnormalized by the median (mean) value, AROMA is our algorithmand Software refers to [7]. Figure 4a shows the results for estimatingper �ow frequency. As can be observed, the mean normalizationprovides better accuracy for (plain) random sampling. Then ourmethod and software are almost identical and are considerablymoreaccurate for a wide range of sampling probabilities. Figure 4b showsthe F1 score when calculating heavy hitters, as can be observedour method and the software method o�er higher F1 values thanplain random sampling. Note that for this application, it is unclearwhich normalization (bymean, or bymedian) is superior. Intuitively,the performance of uniform sampling su�ers from �ows whosehop-count is signi�cantly di�erent from the mean or median andtherefore are grossly underestimated or overestimated.

We conclude that network-wide distributed sampling performsbetter in practice than (plain) uniform sampling.

4.1.2 Comparison with existing hardware solutions. Figure 5 showsan evaluation of AROMA’s performance when compared to ex-isting works, for various measurement tasks. We also comparedwith FlowRadar [34] in various con�gurations. Throughout theevaluation, we consider FlowRadar’s estimation of the size of a�ow which it failed to decode as zero. For the FatTree topology weassume that the �ows are distributed uniformly, the easiest settingfor FlowRadar.

• 1SFR - FlowRadar where all packets go through a singleswitch.

• FTFR - FlowRadar deployed on all switches of a k=8 fat treewith FlowDecode (the faster decoding procedure).

• FTFRND - FlowRadar deployed on all switches of a k=8 fattree with NetDecode (the more accurate but slower decod-ing procedure).

Page 9: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

(a) Frequency Estimation (b) Heavy Hitters

Figure 4: Root Mean Square Error for frequency estimation (lower is better), and F1 score for heavy hitters (higher is better),when comparing our method to (plain) random sampling, on an Internet-like hopcount distribution.

(a) Heavy Hitters (b) Hierarchical Heavy Hitters (c) Superspreaders

Figure 5: F1 scores (higher is better) on the New York 2018 dataset, for various network measurement tasks, and per-switch space.

• EOFR - FlowRadar deployed on all edge switches of a k=8fat tree with packets only measured once (on the �rst edgeswitch they visit).

Figure 5a shows the F1 metric for heavy hitter measurement(higher F1 values are better). FlowRadar (in all tested scenarios)fails to provide any meaningful information until circa 1 MB ofspace, and from that point on, it rapidly improves with more spaceuntil it provides an exact measurement (F1=1) which is better thanour approach. The success of FlowRadar depends on measurementlength and workload, and there are various con�gurations whereour method is superior.

Figure 5b show results for the hierarchical heavy hitters’ task,and Figure 5c for superspreader measurements, as can be observedthe qualitative behavior is the same as in the heavy hitter case.AROMA can operate and provide accurate measurements whileFlowRadar fails unless it is allocated with enough space. Thus,for these tasks, our algorithms are superior when given a smallamount of space, and inferior when there is enough space to runFlowRadar e�ciently.

Figure 6 shows results for our packet sampling algorithm andthe frequency estimation problem. As well as, for our �ow sam-pling algorithm, and the �ow size distribution estimation prob-lem. In Figure 6a, we can see that our approach continuouslyimproves given more space. In contrast, the various FlowRadarcon�gurations are very inaccurate until there is enough mem-ory, and then they have no error at all. Still, AROMA outperformsFlowRadar in many con�gurations.

In Figure 6b, we see results for the �ow size distribution estima-tion task. As can be observed, our method is very accurate. In theFlowRadar con�gurations, we again see the “cli�” where the algo-rithms do not work until there is enough memory allocated. Noticethat the required memory for them to work is several megabytes,whereas our algorithm is accurate even with a few kilobytes.

Next, we allocate 250KB for each switch and monitor the accu-racy throughout the trace. Figure 7 shows the F1 score (higher isbetter) for various applications, where varying the stream length.As can be seen, initially FlowRadar con�gurations achieve accuratemeasurement (F1 score of 1). Then, as the measurement prolongs,

Page 10: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

(a) Frequency Estimation (b) Flow Size Distribution Estimation

Figure 6: Root Mean Square Error, andWeighted Mean Relative Di�erence (lower is better), for the frequency estimation, and�ow size distribution estimation tasks on the New York 2018 dataset.

(a) Heavy Hitters (b) Hierarchical Heavy Hitters (c) Superspreaders

Figure 7: F1 score (higher is better) on the New York 2018 dataset, for various network measurement tasks, and per-switch space.

we encounter more �ows, and FlowRadar con�gurations begin tofail. Once we reach 32 million packets, all the FlowRadar con�gura-tions become ine�ective. In contrast, our sampling-based approachis relatively accurate throughout the measurement. The conclusionis that AROMA is superior when there is insu�cient memory spacefor an accurate measurement.

Figure 8 shows results for the frequency estimation, and �ow sizedistribution estimation tasks (lower is better). In Figure 8a, we seethat the accuracy of our method gracefully degrades throughout themeasurement. FlowRadar degrades accuracy less gracefully as themeasurement prolongs. When the measurement is long enough, ourapproach is more accurate than all FlowRadar con�gurations. Fig-ure 8b shows the �ow size distribution estimation accuracy through-out the measurement. Here, the results are qualitatively similar, butour method does much better than FlowRadar con�gurations.

4.1.3 Performance breakdown. For the HH, HHH, and SS tasks,we used F1 as a single-metric for evaluating the performance ofalgorithms. However, the actual precision and recall performance ofthe algorithm is not the same. Speci�cally, our algorithms provide

near-perfect precision and recall, while FlowRadar gives perfectprecision but a poorer recall. The reason for this is that FlowRadarprovides the exact sizes of the �ows that it can decode and canthus know if one is a heavy hitter. For completeness, we show theprecision and recall performance in Figure 9.

5 RELATEDWORKThe problem of network-wide measurement is well studied undermultiple assumptions. We now survey alternative solutions andcompare them to our work.

5.1 Single measurement switch, per packetsolutions

Numerous solutions assume that we only count each packet ata single measurement switch [9, 17, 28]. Thereby avoiding thefundamental problem we address in this work. When each packet isonly measured at a single measurement switch, the network-wideproblem is simpli�ed into a merging problem where data fromdi�erent measurement switches should be merged by the controller.

Page 11: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

(a) Frequency Estimation (b) Flow Size Distribution Estimation

Figure 8: Root Mean Square Error and Weighted Mean Relative Di�erence (lower is better) on the New York 2018 dataset,varying the measurement’s length.

Indeed, many measurement algorithms supports such merging [4,5, 12, 18, 31]. In our context, placing measurement switches ina way that each packet is only measured once may restrict themeasurement switch placement, and requires knowledge of thetopology and routing protocols in the network which is undesired.

5.2 Packet MarkingThe work of [3] suggests marking measured packets by exploitingunused bits in the IP header �elds. That way, we can make surethat each packet is only measured once regardless of the number ofmeasurement switches it traverses. However, this simple and e�ec-tive method implicitly restricts measurement switch deployment.Intuitively, we need to verify that the unused bits are clear beforethey enter our network. If we do not do that, then the method mayfail due to a proprietary use of these bits in other networks. Worseyet, an attacker can mark his packets and avoid detection. Further,it is advised to clear packets as they leave the network for goodcitizenship. Such requirements restrict the measurement switch de-ployment and require an understanding of the routing and topologyin the system. Our approach requires no such knowledge.

5.3 Single per-�ow Path SolutionsFlowRadar [34], EverFlow, and Trajectory sampling[21, 48] assumethat each �ow is routed on a single path. The single path limitationis restrictive as routing changes are an integral part of modern net-works. Such changes are often done due to failure recovery, tra�cengineering, and load balancing. Worse yet, multicast capabilities,and Multipath TCP [25] mean that the same network �ow is routedthrough multiple paths. Our work is di�erentiated from FlowRadarin the following manner: (i) Space, FlowRadar measures each �owprecisely, and thus the required memory is linear with the numberof �ows, whereas the space required by our solution is sub-linearwith the number of �ows. (ii) Assumptions, FlowRadar assumesthat each �ow is routed in a single path and thus does not support

multi-path technologies. It also cannot cope with routing changedue to failures and optimization.

5.4 Flow Sampling TechniquesSeveral solutions have been proposed for �ow sampling [22, 30, 39].Speci�cally, in [39], the authors present cSamp, a �ow samplingmethod that performs hash-based packet selection to coordinatebetween the measurement switches. cSamp performs network-widemonitoring by distributing responsibilities across the measurementswitches in the network. The framework is responsive to routing,topology and network dynamics and shifts the responsibilities ac-cording to the network changes. In contrast, AROMA achievesnetwork-wide uniform �ow sampling without assigning speci�c re-sponsibilities and therefore is not a�ected by the network dynamics.

5.5 Other routing oblivious solutionsThe BEFMR18 software algorithm [7] performs network-wide mea-surements through uniform packet sampling. That is, their workmakes the same assumptions as our own, however, the di�er-ence between them is twofold: First, our work complies with amore restrictive model that is implementable in high performanceprogrammable switches. Further, our method also performs �owsampling, which extends its expressiveness. Speci�cally, the workof [40] demonstrates that a combination of �ow and packet sam-pling satis�es the needs of many network applications. When con-sidering �ow sampling [22, 30, 39], our method is the �rst one thatmakes no assumptions on the statistical properties of the underly-ing tra�c.

Next, we survey the BEFMR18 algorithm, explain its inner-goingsand discuss why it is not implementable as is within the PISAarchitecture. In BEFMR18, each packet has a key that is derivedfrom its header �elds, and each measurement switch computesthe hash value for each packet. Measurement switches maintaina k-sized reservoir of the packets whose hash value is smallest.

Page 12: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

(a) Heavy Hitters Recall (b) Heavy Hitters Precision

(c) Superspreaders Recall (d) Superspreaders Precision

Figure 9: The precision and recall breakdown of the algorithms for the HH and SS tasks. FlowRadar always has a perfectprecision as any �ow it decodes is fully accurate.

In addition, measurement switches also maintain a count distinctalgorithm (e.g., [24, 29]) to estimate the number of packets.

The controller collects data from all measurement switches andprocesses it to identify the k packets with globally minimal hashvalues. It then merges all the count distinct algorithms to obtainsthe total number of packets in the measurement. The authors showthat the collected sample is uniform, and can solve heavy hittersand per-�ow frequency estimation. Intuitively, even if a packettraverses multiple measurement switches, it receives the same hashvalue in each of them and does not a�ect the globally minimalhashes in the system.

5.6 k-partition algorithmsWe note that our approach can be categorized as a type of k-partition sketch. The usage of such sketches is common in theliterature [24, 26, 29]. Most well-known is the HyperLogLog [24]algorithm that stores in each slot only the number of leading ze-roes in the hash, and utilizes the harmonic averaging to estimatethe number of distinct elements. While the structure is similar,AROMA utilizes it in a novel manner to collect random samples,whereas the existing techniques utilize it to estimate the numberof distinct items.

6 CONCLUSIONWe introduced AROMA, a network-wide measurement infrastruc-ture that enables uniform network-wide �ow and packet samplingin PISA switches. AROMA does not make any assumptions regard-ing routing and is very �exible with respect to the placement of themeasurement switches in the network.

We proved formal accuracy guarantees and demonstrated theability to perform a variety of network measurement tasks. Weevaluated AROMA through simulations with di�erent topologies,per-switch memory, and measurement length. We showed thatAROMA outperforms uniform sampling and that it allows accuratemeasurements in memory-constrained con�gurations where theprevious works are inapplicable.

AROMA’s novelty extends beyond programmable switches. Specif-ically, it is the �rst technique to perform �ow sampling withoutassumptions on the workload or coordination between the switches.Interestingly, it also has advantages in software implementation;speci�cally, it improves the update time of the existing (software)network-wide packet sampling technique [7] from logarithmic toa constant.

Page 13: AROMA: Routing Oblivious Measurement Analyticsxiaoqic/documents/draft-AROMA.pdf · module that stores and maintains the samples, and a measurement analysis module which runs on the

REFERENCES[1] The CAIDA equinix-nyc anonymized internet traces 2018.[2] Milton Abramowitz and Irene A Stegun. Handbook of mathematical functions:

with formulas, graphs, and mathematical tables, volume 55. Courier Corporation,1965.

[3] Yehuda Afek, Anat Bremler-Barr, Shir Landau Feibish, and Liron Schi�. Detectingheavy �ows in the SDN match and action model. Computer Networks, 2018.

[4] Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Je� Phillips, ZheweiWei, and Ke Yi. Mergeable summaries. In ACM PODS, 2012.

[5] Daniel Anderson, Pryce Bevan, Kevin Lang, Edo Liberty, Lee Rhodes, and JustinThaler. A high-performance algorithm for identifying frequent items in datastreams. In ACM IMC, 2017.

[6] Ankita Pawar Rüdiger Birkner Marco Canini Nick Feamster Jennifer RexfordWalter Willinger Arpit Gupta, Rob Harrison. Sonata: Query-driven networktelemetry. ACM SIGCOMM, 2018.

[7] Ran Ben Basat, Gil Einziger, Shir Landau Feibish, Jalil Moraney, and Danny Raz.Network-wide routing-oblivious heavy hitters. In IEEE/ACM ANCS (short paper),2018.

[8] Ran Ben Basat, Gil Einziger, Roy Friedman, Marcelo Caggiani Luizelli, and ErezWaisbard. Constant time updates in hierarchical heavy hitters. InACM SIGCOMM,2017.

[9] Ran Ben Basat, Gil Einziger, Isaac Keslassy, Ariel Orda, Shay Vargaftik, and ErezWaisbard. Memento: Making sliding windows e�cient for heavy hitters. In ACMCoNEXT, 2018.

[10] Ran Ben-Basat, Xiaoqi Chen, Gil Einziger, and Ori Rottenstreich. E�cient mea-surement on programmable switches using probabilistic recirculation. In IEEEICNP, 2018.

[11] Theophilus Benson, Ashok Anand, Aditya Akella, and Ming Zhang. MicroTE:Fine grained tra�c engineering for data centers. In ACM CoNEXT, 2011.

[12] Radu Berinde, Piotr Indyk, Graham Cormode, and Martin J. Strauss. Space-optimal heavy hitters with strong error bounds. ACM Trans. Database Syst.,2010.

[13] Gunnar Blom, Lars Holst, and Dennis Sandell. Problems and Snapshots from theWorld of Probability. Springer Science & Business Media, 2012.

[14] Pat Bosshart, GlenGibb, Hun-Seok Kim, George Varghese, NickMcKeown,MartinIzzard, Fernando Mujica, and Mark Horowitz. Forwarding metamorphosis: Fastprogrammable match-action processing in hardware for SDN. In ACM SIGCOMMComputer Communication Review, 2013.

[15] Kai-Min Chung, Michael Mitzenmacher, and Salil P. Vadhan. Why simple hashfunctions work: Exploiting the entropy in a data stream. Theory of Computing,2013.

[16] Edith Cohen and Haim Kaplan. What you can do with coordinated samples.CoRR, abs/1206.5637, 2012.

[17] Graham Cormode. Continuous distributed monitoring: A short survey. InAlMoDEP, 2011.

[18] Graham Cormode and S. Muthukrishnan. An improved data stream summary:The count-min sketch and its applications. J. Algorithms, 55, 2004.

[19] Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. Frequency estimationof internet packet streams with limited space. In ESA, 2002.

[20] Gero Dittmann and Andreas Herkersdorf. Network processor load balancingfor high-speed links. In Proc. of the 2002 Int. Symp. on Performance Evaluation ofComputer and Telecommunication Systems, volume 735.

[21] N. G. Du�eld and Matthias Grossglauser. Trajectory sampling for direct tra�cobservation. IEEE/ACM Trans. Netw., 2001.

[22] Nick G. Du�eld, Carsten Lund, and Mikkel Thorup. Flow sampling under hardresource constraints. In ACM SIGMETRICS, 2004.

[23] Aiguo Fei, Guangyu Pei, Roy Liu, and Lixia Zhang. Measurements on delay andhop-count of the internet. In in IEEE GLOBECOM - Internet Mini-Conference,1998.

[24] Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. Hyperloglog:The analysis of a near-optimal cardinality estimation algorithm. In AOFA, 2007.

[25] Alan Ford, Costin Raiciu, Mark J. Handley, and Olivier Bonaventure. TCP Exten-sions for Multipath Operation with Multiple Addresses. RFC 6824, 2013.

[26] Éric Fusy and Frécéric Giroire. Estimating the number of active �ows in a datastream over a sliding window. In ANALCO, 2007.

[27] Pedro García-Teodoro, Jesús E. Díaz-Verdejo, Gabriel Maciá-Fernández, andE. Vázquez. Anomaly-based network intrusion detection: Techniques, systemsand challenges. Computers and Security, pages 18–28, 2009.

[28] Rob Harrison, Qizhe Cai, Arpit Gupta, and Jennifer Rexford. Network-wideheavy hitter detection with commodity switches. In ACM SOSR, 2018.

[29] Stefan Heule, Marc Nunkesser, and Alexander Hall. Hyperloglog in practice:Algorithmic engineering of a state of the art cardinality estimation algorithm. InACM EDBT, 2013.

[30] Nicolas Hohn and Darryl Veitch. Inverting sampled tra�c. In ACM IMC, 2003.[31] Qun Huang, Xin Jin, Patrick P. C. Lee, Runhui Li, Lu Tang, Yi-Chao Chen, and

Gong Zhang. Sketchvisor: Robust network measurement for software packetprocessing. In ACM SIGCOMM, 2017.

[32] Jaeyeon Jung, V. Paxson, A. W. Berger, and H. Balakrishnan. Fast portscandetection using sequential hypothesis testing. In IEEE Symposium on Securityand Privacy, 2004.

[33] Abdul Kabbani, Mohammad Alizadeh, Masato Yasuda, Rong Pan, and BalajiPrabhakar. Af-qcn: Approximate fairness with quantized congestion noti�cationfor multi-tenanted data centers. In IEEE HOTI, 2010.

[34] Yuliang Li, Rui Miao, Changhoon Kim, and Minlan Yu. Flowradar: A betternet�ow for data centers. In USENIX NSDI, 2016.

[35] Zaoxing Liu, Antonis Manousis, Gregory Vorsanger, Vyas Sekar, and VladimirBraverman. One sketch to rule them all: Rethinking network �ow monitoringwith UnivMon. In ACM SIGCOMM, 2016.

[36] Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. E�cient computationof frequent and top-k elements in data streams. In IN ICDT, 2005.

[37] Michael Mitzenmacher and Eli Upfal. Probability and Computing: RandomizedAlgorithms and Probabilistic Analysis. Cambridge University Press, 2005.

[38] B. Mukherjee, L.T. Heberlein, and K.N. Levitt. Network intrusion detection.Network, IEEE, 1994.

[39] Vyas Sekar, Michael K. Reiter, Walter Willinger, Hui Zhang, Ramana Rao Kom-pella, and David G. Andersen. cSamp: A system folr network-wide �ow monitor-ing. In USENIX NSDI, 2008.

[40] Vyas Sekar, Michael K. Reiter, and Hui Zhang. Revisiting the case for a minimalistapproach for network �ow monitoring. In ACM IMC, 2010.

[41] Vibhaalakshmi Sivaraman, Srinivas Narayana, Ori Rottenstreich, S. Muthukrish-nan, and Jennifer Rexford. Heavy-hitter detection entirely in the data plane. InACM SOSR, 2017.

[42] P Van Mieghem, Gerard Hooghiemstra, and Remco Hofstad. A scaling law forthe hopcount in internet. 2001.

[43] Tong Yang, Jie Jiang, Peng Liu, Qun Huang, Junzhi Gong, Yang Zhou, Rui Miao,Xiaoming Li, and Steve Uhlig. Elastic Sketch: Adaptive and fast network-widemeasurements. In ACM SIGCOMM, 2018.

[44] Ke Yi and Qin Zhang. Optimal tracking of distributed heavy hitters and quantiles.Algorithmica, 2013.

[45] Minlan Yu, Lavanya Jose, and Rui Miao. Software de�ned tra�c measurementwith opensketch. In USENIX NSDI, 2013.

[46] Yin Zhang, Sumeet Singh, Subhabrata Sen, Nick Du�eld, and Carsten Lund.Online identi�cation of hierarchical heavy hitters: algorithms, evaluation, andapplications. In ACM IMC, 2004.

[47] Haiquan Zhao, Ashwin Lall, Mitsunori Ogihara, and Jun Xu. Global icebergdetection over distributed data streams. 2010.

[48] Yibo Zhu, Nanxi Kang, Jiaxin Cao, Albert Greenberg, Guohan Lu, Ratul Mahajan,Dave Maltz, Lihua Yuan, Ming Zhang, Ben Y. Zhao, and Haitao Zheng. Packet-level telemetry in large datacenter networks. In ACM SIGCOMM, 2015.