VoQ Crossbar Schedulers

31
4 1 2 3 1 3 2 4 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 4 1 3 2 Hong Kong University of Science and Technology - MSc(IT) 2003 Fall Semester - Track 1 CSIT560 Internet Infrastructure: Switches and Routers Final Project Presentation #T1-6-2 VoQ Crossbar Schedulers VoQ Crossbar Schedulers Start Start Group Members: Cyrus LO – 02784753 interman @ ust . hk Ricky CHAN - 02784557 rickyccm @ ust . hk Ricky KWAN - 02784674 rickykcy @ ust . hk Presented on 1 December 2003

description

Hong Kong University of Science and Technology - MSc(IT) 2003 Fall Semester - Track 1 CSIT560 Internet Infrastructure: Switches and Routers Final Project Presentation #T1-6-2. VoQ Crossbar Schedulers. Group Members: Cyrus LO – 02784753 [email protected] Ricky CHAN - 02784557 [email protected] - PowerPoint PPT Presentation

Transcript of VoQ Crossbar Schedulers

Page 1: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Hong Kong University of Science and Technology -MSc(IT) 2003

Fall Semester - Track 1 

CSIT560 Internet Infrastructure: Switches and Routers

Final Project Presentation #T1-6-2

VoQ Crossbar SchedulersVoQ Crossbar Schedulers

StartStart

Group Members:Cyrus LO – 02784753 [email protected]

Ricky CHAN - 02784557 [email protected]

Ricky KWAN - 02784674 [email protected]

Presented on 1 December 2003

Page 2: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Agenda• Part I. Introduction

Project goals (30secs by Ricky Chan)

• Part II. Some well-known MWM/MSM scheduling algorithms What are the differences between MSM & MWM ? Overview of PIM algorithm What is the common problem? (2’30 mins by Ricky Chan)

• Part III. Our findings Requirements and assumptions A new conceptual algorithm –

Real De-synchronized Round-Robin Model (RDESRR) What is RDESRR ? (6 mins by Ricky Kwan)

• Part IV. Experimental results SIM Results Result comparisons : RDESRR Vs SSRR & ISLIP Strengths & Weaknesses (4 mins by Cryus Lo)

• Part V. Conclusion Conclusions (1 min by Cryus Lo)

• Part VI. Q & A (1 min by all)

Page 3: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Part I. Introduction

Page 4: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Project goals

By further studying and analysis of the pointer desynchronization effect of some iterative arbitration algorithms, our group has the following goals:

•Tackle the common problem of some well-known MSM algorithms

•Propose a new conceptual algorithm for overcoming the problem

•Simulate the proposed algorithm by using SIM

•Justify the results

Page 5: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Part II.

Some Well-known MWM/MSM scheduling algorithms

Page 6: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

What are the differences between MSM & MWM ?

• Maximum weight matching (MWM) – stable for independent, 100% throughput under any traffic.

• Maximum size matching (MSM) – stable for i.i.d, uniform, admissible traffic.

• Both are too complex to implement, time complexities are O(N3logN) – MWM and O(N5/2) – MSM

•Only rely on some simple iterative algorithms to approximate MSMapproximate MSM•Some simple iterative algorithms:

•Parallel Iterative Matching (PIM)Round-Robin Matching (RRM)iterative SLIP (iSLIP).Fcfs in Round-Robin Matching (FIRM)Static Round-Robin Matching (SRR – SSRR, DSRR, Rotating DSRR & DRDSRRDRDSRR etc)

Derandomized rotating double Derandomized rotating double static round-robin !static round-robin !

Page 7: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Overview of PIM algorithm

When no new matching can be found, the algorithm stops.

3.Accept - If an input receives a grant, it accepts one by selecting an output randomly among those that granted to this output..

2. Grant - If an unmatched output receives any requests, it grants to one by randomly selecting a request uniformly over all requests.

1. Request - Each unmatched input sends a request to every output for which it has a queued cell.

The basic matching algorithm. Each iteration of the algorithm follows these three steps:

Page 8: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Is any improvement for PIM algorithms ?

• PIM -> RRM -> ISLIP -> FIRM -> Pointer Desynchronized (SRR)

• Some small change on pointers but make big improvement

• SRR is desynchronized the highest priority pointers

• It has no guarantee the grant result is desynchronized.

0

1

2

3

3 02 1

3 02 1

3 02 1

3 02 10

1

2

3

Both grant to 3

Page 9: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Is any improvement for PIM algorithms ?

•If all grants are desynchronized.

Expect to get better result.

0

1

2

3

3 02 1

3 02 1

3 02 1

3 02 1

Grant Desynchronized(Real Desynchronized)

Page 10: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Part III. Our Findings

Page 11: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Assumptions

• Not enough knowledge in Hardware• Look for the theoretical solution• We ignore all hardware implementation issues

Requirements

• The system is stable• 100% Throughput in uniform traffic

•Target to desynchronize the grant results (so called Real Desynchronize)

Page 12: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

A new algorithm – RDESRR

Concept:• Real Desynchronized Round Robin Model (RDESRR)

• Based on 2 phases RRM model (Request and Grant)

• Add a small share memory that each outputs can read/write (called Share Bits)

• The size of the memory is 1 bit per input

• If the bit is set, the corresponding input has already granted by an output

• If the bit is not set, the output may grant to corresponding input port

Page 13: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR Conceptual model

0

1

2

3

0

1

2

3

3 02 1

3 02 1

3 02 1

3 02 1

3

0

1

2

Share Bits

Page 14: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR model

• 2 phases only

• Request. Each input sends a request to every output for which it has a queued cell.

• Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output check the corresponding bit is set or not, if not set, the output will set the bit and notifies the input its request was granted. Otherwise, the output will look for next request until all requests has gone through. The pointer gi to

the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input. If no request is received, the pointer stays unchanged.

Page 15: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR Demo - Request

Step 1: Request

0

1

2

3

0

1

2

3

Page 16: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR Demo – Add a share memory in Output

Step 2: Grant

0

1

2

3

0

1

2

3

3 02 1

3 02 1

3 02 1

3 02 1 3

0

1

2

Share Bits

•Add a small share memory that each outputs can read/write (called Share Bits)

Page 17: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

3 02 1

3 02 1

3 02 1

3 02 1

RDESRR Demo – Output check the share bits

0

1

2

3

0

1

2

3

Step 2: Grant

3

0

1

2

Share Bits

•The output check the corresponding bit is set or not

Page 18: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR Demo – When share bit is occupied

0

1

2

3

0

1

2

3

Step 2: Grant

3

0

1

2

3 02 1

3 02 1

3 02 1

3 02 1

Share Bits

•if not set, the output will set the bit and notifies the input its request was granted•The share bit is First Come First Serve

Page 19: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR Demo – Output looks for next request

0

1

2

3

0

1

2

3

Step 2: Grant

3 02 1

3 02 1

3 02 1

3 02 1 3

0

1

2

Share Bits

•If set, the output will look for next request until all requests have gone through

Page 20: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR Demo – All share bits are allocated

0

1

2

3

0

1

2

3

Step 2: Grant

3 02 1

3 02 1

3 02 1

3 02 1 3

0

1

2

Share Bits

•Fully allocate the share bit will result for fully grant all input request

Page 21: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

3 02 1

3 02 1

3 02 1

3 02 1

RDESRR Demo – Pointer update/Share bit reset

0

1

2

3

0

1

2

3

3

0

1

2

Share Bits

•The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input•If no request is received, the pointer stays unchanged•Share bits are also reset

Page 22: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR model (Recap)

• 2 phases only

• Request. Each input sends a request to every output for which it has a queued cell.

• Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output check the corresponding bit is set or not, if not set, the output will set the bit and notifies the input its request was granted. Otherwise, the output will look for next request until all requests has gone through. The pointer gi to

the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input. If no request is received, the pointer stays unchanged.

Page 23: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Part IV. Experimental Results

Page 24: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

SIM Results• Run the test for 32x32 port in SIM using –l 1000000

Total Latency Avg Match Size0.1 0.0588 3.1958 0.2 0.1447 6.3938 0.3 0.2686 9.5947 0.4 0.4501 12.7940 0.5 0.7198 15.9960 0.6 1.1398 19.1980 0.7 1.8636 22.3961 0.8 3.2619 25.5986 0.9 7.5087 28.8003 1.0 715.5900 31.9850

RDESRR

Page 25: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Result comparisons : RDESRR Vs SSRR & ISLIP

• The latency in load = 1 is 715.59• The average match size in load = 1 is 31.985 (99.95% match)

Average Delay under Uniform Traffic

1.0E-02

1.0E-01

1.0E+00

1.0E+01

1.0E+02

1.0E+03

1.0E+04

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

NormalizedLoad

Ave

rage

Del

ay

RR-DESY

SSRR

ISLIP

Page 26: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Strengths • 2 phase operations

• Each input only get one grant, • no arbiter in input port

Unknown factorsParallel processing for ports scheduling.

All outputs access the share bits simultaneously Multi-process protection for Share Bits

1. Prevent deadlock, Check and set bit should be atomic action

2. Any chance for two outputs access same bit at the same time?

• Save Cost•It is simple to implement because base on RR

•Has a lot of potential varieties to study

•May achieve Maximum Size Match after few iterations.

Page 27: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Part V. Conclusions

Page 28: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Final thought…• Future research and study

1. Hardware implementation feasibility?

2. Apply to ISLIP, FIRM, may get better result?

• Comments on SIM1. We have successfully build the SIM on Linux platform so we expect it

can also be build on Windows platform with Cywin library

2. It cannot simulate the parallel processing behaviour therefore we cannot know what happen in the real scheduler

3. The processing time of SIM is quite slow so we cannot have enough time to simulate more revised algorithms and more iterations

Conclusion: Conclusion: RDESRR is a conceptual model which get RDESRR is a conceptual model which get

the good result in SIMthe good result in SIM

Page 29: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Part VI. Q & A

Page 30: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Q & AQ & A

Page 31: VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

PresentationPresentation End End