Block Permutations in Boolean Space to Minimize TCAM for Packet Classification Authors: Rihua Wei,...

Post on 26-Dec-2015

214 views 0 download

Transcript of Block Permutations in Boolean Space to Minimize TCAM for Packet Classification Authors: Rihua Wei,...

Block Permutations in Boolean Space to Minimize

TCAM for Packet Classification

Authors:Rihua Wei, Yang Xu , H. Jonathan Chao

Publisher: IEEE INFOCOM,2012

Presenter:Jia-Wei,Yo

Date:2012/2/8

1

Introduction

Ternary Content Addressable Memories (TCAMs) have been widely used to implement packet classification because of its parallel search capability and constant processing speed.

2

Introduction Rule r1, both the source port and destination port

contain a range [1,5]. So both of them needs to be expanded to three prefixes, i.e., “001”, “01*”, “10*”. The combination of the prefix specifications of the two ranges will consume 3x3=9 TCAM entries, causing the well-known range expansion problem.

3

• Propose a novel technique called Block Permutation (BP) to compress the packet classification rules stored in TCAMs

Relative work

4

Relative work In Figure 3 (b) spread sparsely and no two neighboring

rule elements have the same action; thus, there are no two elements in the Karnaugh table that can be directly merged using logic optimization.

Block Permutation

01- - <> 11- -Ex : 0110 Ex’: 1110B1 : 0001 B1 : 0001

B2 : 1101 B2’: 0101

B3 : 0010 => B3 : 0010 => B1 and B2’ merge to B6

B4 : 1110 B4’: 0110 B3 and B4

’ merge to B7

B5 : **** B5 : ****

6

Block Permutation

7

Terms and Concepts

1. Block size :The size of a block is defined as the number of points that are contained in the block. For example, the size of the block “0**1” is 4.

2. Distance :The number of different counterpart bits in their Boolean representations. For example, the distance between the two points “0001” and “1101” is 2.EX: “0*01” and “01*0” is 1 , “0*01” and “0101” is 0.

3. Direction :If the Boolean representations of two blockshave wildcards(don’t care bit) that all appear in the same bit positions, we say these two blocks are in the same direction.EX: “0*01” and “0*10” in the same direction.

8

Terms and Concepts

Target Blocks and Assistant Blocks: A pair of target blocks is the two blocks that we target to merge by a permutation.

9

B6 and B 7 are target block.

Terms and Concepts

To merge this target, we perform the operation “--10<>--11” over other two blocks “**10” and “**11”. These two blocks is the corresponding assistant.

10

Exchange row 10 and 11

Classifier compression

11

Wp : assistant block size

tar : target block

p : permutation

Classifier compression1. GET_TARGET : Try to find out all possible targets.

12

- - -0 <> - - -1 (assistant block size : 3)

Target block : (distance : 2)

B6 : 0*01 => B6’ : 0*00

B7 : 0*10 => B7’ : 0*11

Can’t merge.

Classifier compression

2. EVAL_PERM :Have two tasks. One is to search all possible permutations for the targets we have obtained in previous step. The other is to determine if these permutations are worth performing and which permutation can yield the largest compression with the least overhead.

Select the “best” one to perform : the number of blocks reduced minus the number of new blocks caused by the splitting of existing blocks.

Classifier compression

14

- - 00 <> - - 01

B4 : 1111 1111

1101 => 1100 produce two new small block and B4 disappears

B3 : 1100 1101

=> Invalid

Classifier compression

15

3. PERFORM : perform the permutation that has been

selected in the step of EVAL_PERM to merge the target blocks.

Transformation implementation

Use the pipeline structure to implement a series of transformations. If there are N transformations, we will design an N-stage pipeline.

The one - block structure (one – stage pipeline) normally requires much less hardware resource than the pipeline structure, normally the stage has to be very complicated, thus largely reduce working speed.

Propose a solution called stage-grouping to reduce the number of stages to trade-off between the speed and the cost.

16

Transformation implementation

17

Experiment

18

Linux workstation driven by Intel Xeon 2.0GHz E5335 CPUs.Implemented the corresponding transformations by using the FPGA of Altera Cyclone III. The FPGA synthesis tool used is Quartus II.The reason why we chose Altera Cyclone is due to its low price and appropriate clock rate. This kind of FPGA can run on a clock up to 400MHZ or even higher, which is enough for our targeted throughput of 100M packets per second.Nr = 150 , Wmax = 102 , Wmin = 54 , using C/C++ language.

Experiment

19