Coordinator Tutorial v1.0

Coordinator Tutorial

Revision History:

Rev ByRevision

DateDescription

0.1Shih-Hao LiHarshil Vyas

05/25/2007 Initial draft version.

0.2 Shih-Hao Li 05/30/2007 Minor corrections.

1.0 Shih-Hao Li 10/31/2007Edited contents for new data structures and added load balancing section.


Table of Contents

COORDINATOR TUTORIAL.....................................................................................................................1

1 INTRODUCTION.................................................................................................................................3

1.1 802.11 BASICS...............................................................................................................................31.2 MERU SYSTEM...............................................................................................................................5

1.2.1 Controller and AP....................................................................................................................51.2.2 Communication Subsystem.......................................................................................................61.2.3 ESS Profile................................................................................................................................71.2.4 Virtual Cell...............................................................................................................................9

2 COORDINATOR DETAILS.............................................................................................................11

2.1 COMPONENT THREADS................................................................................................................112.2 TOPOLOGY GRAPH.......................................................................................................................122.3 ATS NODE...................................................................................................................................132.4 STA NODE...................................................................................................................................132.5 PERBSSID NODE...........................................................................................................................132.6 ASSIGN MANAGER AND HANDOFF MANAGER............................................................................14

3 COORDINATOR AND WNC AGENT............................................................................................15

4 PROCESS FLOW...............................................................................................................................18

4.1 TOPOLOGY MANAGER.................................................................................................................184.1.1 Probe Indication.....................................................................................................................194.1.2 Frame Report..........................................................................................................................19

4.1.2.1 RSSI Report..................................................................................................................................204.1.2.2 BSSID Report...............................................................................................................................20

4.1.3 MAC State...............................................................................................................................224.2 ASSIGN MANAGER.......................................................................................................................234.3 HANDOFF MANAGER...................................................................................................................244.4 TIMING REPORT...........................................................................................................................24

5 LOAD BALANCING..........................................................................................................................25

5.1 OVERVIEW...................................................................................................................................255.2 FEATURE DESCRIPTIONS..............................................................................................................25

5.2.1 Station Limit...........................................................................................................................265.2.2 Limit Increment......................................................................................................................26

5.3 NEW ALGORITHM........................................................................................................................26

6 MISCELLANEOUS...........................................................................................................................27

6.1 SCANNING AP..............................................................................................................................276.2 Channel vs. Radio.......................................................................................................................27

Meru Networks, Inc. of 28 Proprietary & Confidential


1 IntroductionThis document describes the major components and functionalities of Coordinator in Meru system. First we introduce 802.11 basics, followed by an overview of Meru system. Then we describe the details of Coordinator and its process flow.

1.1 802.11 Basics

A standalone Access Point (AP) broadcasts beacon periodically (e.g. per 100 msec). Beacon has two major functionalities. First, beacon is the frame that contains information regarding the network configuration, such as supported data rates, security mode, and channel information. For any station (STA) who wants to connect to an AP, beacon provides sufficient information for the station to know whether the configuration of the network is good for the station to connect. Secondly, once a station is connected to an AP in the network, beacon is like a keep-alive message from the AP. Stations can make sure they are still in the network by receiving the beacons from the AP.

There are 4 MAC addresses in an 802.11 frame:

source MAC (i.e. STA MAC),

transmitter MAC,

receiver MAC,

Basic Service Set Identifier (BSSID).

BSSID is used to identify the network. It is a 6-byte value, normally a MAC address, which needs to be unique in the network. In a private network, BSSID can be a random number. For example, when two laptops are connected directly without an AP, it is called Ad Hoc network, where BSSID is still used, but no extra MAC address used between the two laptops. If there is an AP in between, there will be a MAC address configured to be used for BSSID. Each laptop only needs to configure the MAC address of its wireless interface, so they can generate the MAC address randomly and set the local bit to one in the MAC address. The least significant bit in the first octet indicates whether this is a multicast frame or not. The second bit indicates whether this is a local address or not. If all bits are 1s in the six bytes, it is a broadcast address. Figure 1 shows the MAC address format.



Figure 1: MAC Address format.

In a MAC address, the first three bytes is defined as Organizational Unique Identifier (OUI), which is assigned by IEEE and does not have the local bit set. The BSSID in an 802.11 frame is like a unique tag to identify a packet belonging to which network.

When a STA comes up, it does not know which network is available and which one to join. So it sends broadcast probe request, where destination MAC and BSSID are all 0xFFs. All the APs hear this broadcast request will reply in a message called probe response, which contains similar information in the beacon. One of the important fields is Service Set Identifier (SSID), which is the string name of the network or a human readable form of the BSSID.

After receiving a probe response, normally the STA sends a direct probe afterward, where SSID is copied from the response. The difference between broadcast probe and direct probe is the former is sent when STA does not know the network name while the latter has the network name specified. Both probes are still sent to broadcast MAC address and broadcast BSSID. The broadcast and direct terms are used in relation to the network name, not the MAC address. For a broadcast probe, all the APs will reply. For a direct probe, only the AP supporting that network name will reply. In the probe response, AP tells the STA which BSSID to be used. Once the STA receives the probe response, it knows both the AP's MAC address and the BSSID. All the further packet exchanges are unicast messages.

In a single frame, AP's MAC address (i.e. the MAC address of AP's wireless interface) and BSSID could be same or different. But in an enterprise network, they are always different, where BSSID is not assigned to a specific wireless interface. It is like the representation of the whole network.

One AP can support multiple networks, each configured with one wireless profile, which is called Extended Service Set profile (ESS profile). For example, there is only one AP switched on and it has five wireless profiles configured. When scanning the wireless network, you will see five networks are available. So logically there are five networks, but all provided by one single AP. In this case, AP's wireless MAC address remains the same, but the BSSID is different for each wireless network. The packet exchanges are actually between STA's MAC address and BSSID instead of AP's wireless MAC address.



The typical packet sequence between a STA and an AP in an enterprise network is shown in Figure 2. After receiving a successful association response from an AP, the STA is now connected to the AP on layer 2, just like plugging an Ethernet cable in a wired network.

Figure 2: Packet sequence between STA and AP.

1.2 Meru System

A basic Meru system consists of a Controller, multiple APs (also called ATS – Access Transceiver Station), and several STAs. Meru provides a unique feature to allow STAs communicate with least interruptions inside the network.

1.2.1 Controller and AP

A standalone AP handles everything by itself. It does not have any external entity to help decide which STA should be allowed and what different configurations it should provide. In an enterprise network, lots of the control are taken away from AP and kept in a centralized device called Controller. Controller can help with load balancing, supporting Quality of Services (QoS), and providing better protection for the enterprise network. Controller tracks all the AP configurations, different client connectivities, and movements of the clients inside the network.

In Meru system, each AP has Controller configuration burned into its flash, which contains information like which Controller to connect to when it starts. When an AP comes up, it tries to find a Controller. This process is called discovery. There are two ways to discover a Controller. The first one is called L2 discovery, where AP sends broadcast frame on layer 2. All Controllers will reply to this frame and AP will connect to one of them. The second one is called L3 discovery, where AP sends discovery frame to a specific IP address which is configured inside AP. Generally the preferred method is L3 discovery.

Once an AP comes up and gets the IP address of Controller, it will send a join request to the Controller. When the Controller receives this request, it will accept the request, authorize the AP,



and send configurations to the AP. In Meru system, AP does not store any configuration (i.e. ESS profile). All the ESS profiles are stored inside Controller. An ESS profile specifies the network name, supported data rates, security mode, and other information. After AP downloads the profile and configures itself, it starts beaconing the profile and now is ready to serve clients. In a real scenario, any given physical spot in an enterprise network will be covered by at least four or five APs. All those APs will provide the same networks. For example, if there are five profiles, all of them will be downloaded to all the APs in the network. This gives clients an illusion that the network is available everywhere in the building. Actually the service is provided by different APs through the same Controller.

Coordinator is the heart of Controller. It restricts the entries and movements of clients in the network. When a client enters a network, it sends probe request and receives probe response. After that, the client sends authentication request to AP, but AP does not send authentication response right away. It waits for Controller to respond. Internally when AP receives a probe request, it sends a probe indication to Coordinator to inform that there is a new STA who wants to join the network. Then the AP waits for the assignment from Coordinator. Assignment has the following significances. AP will only serve a client beyond probe request if it has the assignment for that client. In other words, until there is an assignment for the client, AP will only respond to its probe request and then do not serve the client any further. So it will not reply to authentication request and association request and it will not allow the client to enter the network or transfer data. After the probe indication is received by Coordinator, it sends the assignment to one of the APs who can hear the client. The AP who gets the assignment will allow the client to join the network by responding to client's authentication request and association request. When this is done, the client is connected to the network and it can start data transfer.

All the APs in the network send frame reports to Coordinator every three seconds. A frame report message contains the information of all the STAs that the AP can hear whether they are assigned to that AP or not. For example, when a client is moving from AP1 to AP2, both APs can hear the client and they will send the frame report to Coordinator. Inside the frame report message, there are Received Signal Strength Indication (RSSI) values for each client (i.e. the signal strength received from each client). Coordinator can compare these values and detect which AP has stronger signal. If AP2 has stronger signal than AP1 from the client, Coordinator will trigger a handoff. In this case, Coordinator tells AP1 to handoff the client to AP2. Then AP1 sends all the information of this client to AP2 through wired connection. AP2 stores the client information and starts talking to the client. In the meantime, the client is unaware of this change.

Two important functions of Coordinator are:

1) assignment - when a client initially joins,

2) handoff - when Coordinator hands off a client from previous AP to new AP.

Each AP has a Coordinator thread running, like a thin Coordinator client, which just takes messages and makes changes internally. In contrast, the Coordinator running on Controller is the Coordinator server. Thus the communication between AP and Coordinator is like a client-server connection going through AP’s Coordinator client via internal messaging system.

1.2.2 Communication Subsystem

All the applications and modules inside Meru system communicate each other through Communication Subsystem, where everyone has a unique mailbox ID defined at the coding time. Applications use u-comm/k-comm commands provided by Communication Subsystem to send messages to destination mailbox. Internally Communication Subsystem maps mailbox ID to <IP address, port number> and delivers the messages through UDP.

For example, Coordinator triggers a handoff from AP1 to AP2. It tells AP1 the ID of AP2 (called APID, which is uniquely allocated when an AP joins the Controller). To send a message to AP2's Coordinator client, AP1 only needs to know AP2's APID and its Coordinator's mailbox ID, which is well known because it is defined at the coding time. The command usage is like send(APID,



mailbox ID, message). Thus Communication Subsystem knows where to send it and deliver the message. Communication Subsystem is part of every module in the Meru system that needs to communicate with each other.

1.2.3 ESS Profile

An ESS profile can be created using Meru Web GUI. First, we add a new entry in the ESS Profile page (as shown in Figure 3).

Figure 3: ESS Profile.

Then select the new profile to go to the Update page (as shown in Figure 4), where we specify all the parameters. The parameter "New AP's Join ESS” indicates whenever any AP comes up, it will automatically download this profile or not. The default value of this flag is on. It applies to all the existing APs and new APs, which means when a new profile is created, it will be downloaded to all the APs currently up and all the new APs coming up later. After all the parameters are specified, we only have the profile configuration created on Controller, but no BSSID, which will be generated when downloading the profile to an AP.



Figure 4: ESS Profile Update.

For every profile supported by a standalone AP, there will be a separate BSSID. In normal case, if an AP downloads two ESS profiles from Controller. It will send two beacons and use two different BSSIDs, one for each profile.

AP downloads ESS profiles from WNC Agent in Controller. In the meantime, Coordinator gets the notification from WNC Agent if Coordinator has registered with WNC Agent to get notified when anyone downloads the profile. The profile is important to AP because it needs to send beacons based on those parameters. They tell AP what kind of services it supports. When an AP downloads a profile, WNC Agent creates a BSSID on the fly at the time of downloading and stores it in the ESS-AP table before sending them to the AP. Then AP uses this BSSID for beaconing and connecting to the client. Figure 5 shows an example of ESS-AP table.



Figure 5: ESS-AP Table.

The process of adding a new AP entry in the ESS-AP table is equivalent to downloading the profile to the new AP. When a profile is downloaded, it is called a service or service message. In the /main/include/nms/ directory, those .tmp and .h files are generated during compilation. WNC Agent has all the corresponding tables specified in XML format. In nms_service_mib.h file, the structure mib_service_t is the service message, which will be sent to AP in a message. This structure (i.e. predefined message) contains the ESS profile information and BSSID. At the same time, if Coordinator has registered with WNC Agent for this event to happen, it will also get a copy of this message (i.e. the whole profile). In this case, WNC Agent sends the same message to both the AP and Coordinator, while Coordinator only retrieves the fields it needs, such as BSSID.

For the Interface Index column in the ESS-AP table, the 1 is for b/g radio and the 2 is for a radio on AP150. If an AP downloads 5 profiles, it will have 5 BSSIDs, or 10 BSSIDs if downloaded for both radios. Different APs can serve the same ESS profile, same interface, and same channel, but with different BSSIDs. For example, two APs support the same ESS profile called nsd and a STA wants to connect to the nsd profile. The ESS profile name and SSID name can be the same (nsd in this case), but the BSSID has to be unique for each AP. It is like a destination MAC address, which needs to be unique in the network so the STA knows which BSSID to talk to. The actual packet exchanges are between the STA source and the BSSID (i.e. destination). For example, there could be hundreds of APs in an enterprise network, each downloading the same ESS profile but with separate BSSIDs. So the STA knows which AP to connect to.

1.2.4 Virtual Cell

According to the 802.11 standard, every frame will be acked by only one AP. If two APs have the same BSSID and a STA sends a frame to this BSSID, both APs will ack this frame, which results in a collision. This happens because each AP believes it is the only one who is sending the ack.



So they immediately reply without sniffing the medium. Normally each AP has a separate BSSID. But it may cause problem when a client switches from one AP to another during a handoff, where the connection gets dropped because the destination BSSID has changed. Then the client will re-initiate the 802.11 packet sequence. It is like unplugging the Ethernet cable from one port and plugging it to another port. Because the destination MAC address is changed, the layer 2 connection has to be reconnected. For voice call, it may or may not get dropped depending on the layer 3 latency period. But clients may feel some jitter in the call.

Meru solves this problem with virtual cell, which provides a single BSSID visible to STAs across the network instead of providing multiple BSSIDs. All the APs support the same profile and the same BSSID. So each STA feels there is only one AP in the network and there is only one BSSID to connect to. When a STA moves and is handed-off, it does not notice that because the destination BSSID never changes. The parameter "Enable Virtual Cell" in ESS profile (as shown in Figure 4) determines if a profile is virtual or not. A virtual profile generates the same BSSID for all APs for the same channel, while a non-virtual profile generates different BSSID for each AP even though they are in the same channel. Assume there is only one profile and it is virtual. If all the APs download it on both of their radios, there will be only 2 BSSIDs, one for each radio.

In normal Meru's scenario, typically 4 to 6 APs can hear the same STA. When a STA sends a packet to an AP, all APs will try to ack that packet. This problem is solved by Meru’s virtual cell. For most AP models (e.g. AP200), virtual cell is done via a hardware FPGA table called assignment table inside each AP. The only AP who gets the assignment will have an entry added in this table. When the STA is talking, all APs can hear it but only the AP with the entry in its assignment table will reply. That is Meru’s approach to achieve the same BSSID in virtual cell. Thus the STA feels it is always talking to the same AP because the BSSID is never changed. By keeping the same BSSID, virtual cell can provide faster handoff, called soft handoff. In the traditional handoff, the STA has to reconnect because BSSID is changed, which is called hard handoff.

In the same network, there could be mixed virtual and non-virtual profiles served by the same or different APs. But if there is any change made to a profile, all the APs using that profile will get notified to update their information. In some case, they may need to delete existing profile and download a new one. For example, if the "Enable Virtual Cell" flag is changed, it will cause the profile being deleted and recreated because the BSSID has to change. For those fields not affecting BSSID are changed, such as the beacon interval, each AP only receives the notification for the update.

To select a BSSID for virtual cell, three parameters need to remain constant, which are BSSID, channel, and ESS profile. Thus, clients will not notice when they are handed-off to other APs. WNC Agent generates the same BSSID on the fly when an AP is downloading a profile on a radio which has the same channel. The BSSID value is a function of Controller Index, ESS Index, and Channel. Each ESS profile is given a unique ESS Index for easy access. This number is incremented for every newly created ESS profile. Each channel uses a different frequency. WNC Agent will give the same BSSID for the same Controller, same ESS profile, and same channel on which the AP is downloading the profile. For example, if an AP is downloading the profile on radio 1, and radio 1 is on channel 6. Then WNC Agent will give the same BSSID to all the APs who are downloading the same profile on radio 1 for channel 6.

Beacon plays an important role in virtual cell. It is used for broadcasting which AP can support the network as well as keeping clients in sync. Because there is no physical connection, a client needs to know if it is still connected to the network by receiving the beacon every 100 msec. In addition, beacon can inform a client if it has any pending packet. For example, voice clients usually go to power-save mode and wake up every beacon interval to check if there is any packet pending for them. Those clients sleep for 100 msec and wake up for 10 msec window to receive the beacon. So they can know if still connected to the network. Then they go to sleep for another 100 msec. If there is any packet sent to the client when it is sleeping, AP indicates that to the client by setting a Traffic Indication Map (TIM) bit in the beacon. Each beacon has a TIM vector, where there is one bit for every client connected to the AP. When a client wakes up and finds its



corresponding bit is set in the beacon, it will fully wake up and send a Power-Save Polling (PS-Poll) frame to the AP, which then will send back the client’s data packets.

Because of virtual cell, the TIM needs to cover all the clients in a Meru network. In order for each AP to know other AP's clients and their status (such as sleeping, awake, or pending packet), every AP needs to send those information to Coordinator. Coordinator combines all these information from each AP and sends it back to all of them. According to the 802.11 standard, a TIM can cover maximum 2007 clients, which means each BSSID can only support up to 2007 clients.

The virtual cell for AP150 is done differently due to hardware limitation. A different implementation of Coordinator is built for this case. So there are two Coordinator binary executables stored on Controller with a flag to indicate which one to run when it starts. This flag is called “AP150 Vcell", which can be specified in the Global Controller Parameters page in Web GUI (as shown in Figure6).

Figure 6: Controller Parameters.

2 Coordinator DetailsThis section describes the major components and their data structures in Coordinator.

2.1 Component Threads

When Coordinator starts, it creates the following threads.

Timer Scheduler - maintains all the timers in Coordinator.



Inter-cell Coordinator - processes pending flows every 15 seconds and checks the resources available on APs.

Resource Manager - calculates resource allocation.

Assign Manager - decides if a client can enter, get assigned, or handoff to another AP. Then it gives this information to Handoff Manager.

Handoff Manager - executes the state machine of doing handoff and assignment.

Topology Manager - maintains entire network as a topology graph, called topograph. It provides an API for external modules to change topograph when adding a new AP or STA, deleting an AP, or a new ESS profile is downloaded.

Time Estimator - maintains which AP can hear which APs or stations, and creates AP-AP edges and AP-STA edges in topograph.

Beacon Manager - calculates the content of a beacon and sends it to AP.

NMS Agent - communicates with WNC Agent, which stores the configurations of the whole system. Any configuration change in a module will be sent to WNC Agent. All the threads in Coordinator rely on NMS Agent to get the configurations from WNC Agent.

Dispatcher is not a thread. It is used by any thread who wants to send messages to another thread and when a message comes to Coordinator. Those messages are dispatched by Dispatcher to the right thread. Communication Subsystem also goes through Dispatcher for passing Coordinator messages. Communication Subsystem is a different module. It has its own processes and threads, and provides an API library for other module to send messages.

Coordinator stores its configurations in two ways. It can be changed by CLI, Web interface, or NMS system, where the configuration is maintained by the WNC Agent inside Controller. In addition, the “tweaking” configuration is a copy of the coord.config file in Coordinator source directory and passed to Coordinator via command line parameter when it starts. This configuration is fixed for every type of Controllers and is mainly used by QAs or developers.

2.2 Topology Graph

Topograph stores the whole network topology information, which includes a list of AP (or ATS) nodes in AtsList, a list of STA nodes in StaList, and a list of ESS nodes in EssList. Each ESS node maintains a mapping of all its member BSS nodes while all the BSS nodes are stored in BssList.

There are a set of fixed messages exchanged between WNC Agent and AP when a profile is downloaded, such as service message and ESS-ATS message. From those messages, Coordinator extracts the information and keeps that in its own data structures. For example, AtsBssList (i.e. ATS-BSSID mapping) contains a map of all the BSSIDs created for a given AP, which is updated every time AP downloads a profile. Coordinator checks which BSSID got downloaded and adds it in the list.

Each BSS node contains the beacon information related to a particular BSSID. Beacon Manager sends the beacon information to all the APs who have the BSSID. Then each AP uses that information to send the beacon out. Beacon information is picked up by Coordinator and stored in associated BSS node when a profile is downloaded to AP and copied to Coordinator by WNC Agent. Some of the beacon information is copied from the profile directly, such as VirtualBss, which indicates whether the profile is virtual or not. Others are computed or manipulated by Coordinator itself, such as NewLostStaVector, LostStaAidMin, LostStaAidMax, and LostStaLength. When Coordinator is talking to APs, it comes to know that some STAs are lost and updates these variables accordingly.



2.3 ATS Node

ATS node contains information regarding an AP, which includes the following data structures.

AtsEdgeList contains all the APs that have an edge with this AP, i.e. all its neighbor APs.

StaEdgeList contains all the STAs that this AP can hear, i.e. the STAs in the coverage area of this AP.

The cell neighborhood means a region in which multiple APs' coverage area combines from the perspective of a particular AP. For example, AP1 is in the center and there are five APs surrounding it. The whole coverage area of each AP gets through AP1's coverage area, then all those APs are the neighbor APs of AP1.

Two APs can be neighbor of each other for two reasons. Either they can hear each other directly (i.e. AP1 can directly hear AP2's beacon) or indirectly (i.e. there is a STA in between who can hear both APs, but each AP can not hear the other AP directly). For a given AP node, all such other APs, direct or indirect, form the AP's AtsEdgeList.

2.4 STA Node

STA node contains station-related information, which includes the following data structures.

EdgeList contains all the APs this STA can hear.

PerBssidList contains a list of AP assignments, where each assignment is given to one STA per BSSID. Each PerBssid node in the list is for one particular BSSID. It contains information about which AP has the assignment right now.

ActiveBss points to the PerBssid node on which STA it is currently associated.

If there is only one virtual profile in the system and is downloaded by all APs on only one radio, then there is only one BSSID in the system. When a client comes up and sends broadcast probe request, all APs can hear this client. Therefore all APs will send probe indication to Coordinator. Coordinator will assign the client to only one of the APs because there is only one BSSID. If the profile is not virtual, the assignment will come to all APs because their BSSIDs are different. Then the client can choose one of the APs to connect to. For example, there is only one profile and one AP in the system. The AP downloads the profile on both radios (a and b/g), each has a different BSSID. If an a/b/g client sends a probe request, it will send it on both a channel and b/g channel. In this case, it will get two assignments, one for radio-1 BSSID, and the other for radio-2 BSSID. Thus there will be two nodes created in PerBssidList.

Inside Meru STA, a station can be assigned of multiple BSSIDs, but it can only be associated with one, where associated means actually doing data transfer. The pointer to the associated BSSID node is stored in ActiveBss, which is actually points to one of the PerBssid nodes in PerBssidList.

2.5 PerBssid Node

PerBssid node is usually in the context of a given STA node. It includes the following data structures.

Bssid is the BSSID for this node.

AssignedATS is the assigned AP on this BSSID.

OldAssignedATS is the previous assigned AP.

AlternateATS is the best possible AP for assignment.



Aid is the association ID given in the association response message from AP. This ID will be used in further communications, such as TIM vector, where the particular bit number by this ID is the bit for that STA. If this bit is set, that means there is pending message for this STA. In our case, AID is given by Coordinator because AID number has to be consistent and unique across BSSID. It is not per AP because AP can not generate the AID.

StaState is the state of the STA. As the STA progresses through 802.11 pack exchanges, AP reports this information to Coordinator, which updates it here. It changes from probe request state (i.e. STA just being discovered), to authenticated state, and to associated state.

Reassign is used by Assign Manager to reevaluate an assignment. When AP sends a frame report message to Coordinator and indicates another AP has better RSSI value, Coordinator will turn on the reassign flag to tell Assign Manager to reevaluate the assignment, i.e. compare the assigned AP and the alternate AP.

2.6 Assign Manager and Handoff Manager

Assign Manager and Handoff Manager work together. Assign Manager makes decisions about whether a STA should be in, kicked out, or handed-off. Then it informs Handoff Manager to carry out these instructions. So Handoff Manager is the one actually sending out messages to APs. The process for a client to get into the system is called assignment.

There are three typical messages for client movement in a network:

Assign Message – used to get a client into the network,

Assign-Remove Message – used to get a client out of the network,

Handoff Message – used to move a client within the network.

Those messages are sent from Handoff Manager inside Coordinator to the AP.

A handoff has two entities, source AP and destination AP, where each AP could be either a null AP or valid AP. For assign message, the source is a null AP and destination is a valid AP. For handoff message, both the source and the destination AP are valid APs. For assign-remove message, the source is a valid AP and the destination is a null AP. So from Handoff Manager's perspective, everything is a handoff.

When a STA enters a system at the first time, it sends probe request to APs. Those APs who receive the request send probe indication to Coordinator to assign this STA. Upon receiving the probe indication, Coordinator first creates a STA node if the STA is not in the system. Then it creates a PerBssid node in the STA node and adds the AP in the STA-ATS list (i.e. EdgeList). In the meantime, it stores the best alternate AP from EdgeList to AlternateATS inside the PerBssid node. The AP-AP list and STA-AP list are modified and populated based on the messages coming from AP. When Coordinator receives the frame report message from any AP, it triggers Assign Manager to compute the best alternate AP based on the RSSI values and stores that in AlternateATS.

Now Assign Manager checks if the alternate AP is better than the current assigned AP. If that's the case, the alternate AP will become the assigned AP and Assign Manager will choose a new alternate AP. In the meantime, Assign Manager will trigger a handoff from the assigned AP to the best alternate AP. For the very first time, AlternateATS is a valid AP and both AssignedATS and OldAssignedATS are null, which means this is an assignment. Assign Manager will tell Handoff Manager to start a handoff (i.e. to send an assign message to the assigned AP). Before doing that, Assign Manager copies AssignedATS to OldAssignedATS and copies AlternateATS to AssignedATS. Now AssignedATS becomes a valid AP. Assign Manager will choose a new alternate AP if it can choose (i.e. the second best alternate AP if not null). It is



possible the second best alternate AP is the old assigned AP. For example, people talk on the phone and walk back and forth in the room.

Up till his time, it is only Assign Manager manipulating the data structure, but AP has not received any message yet. It is Handoff Manager’s job to do that. The reason the data structure is not updated after Handoff Manager finished is because there may be better AP found while Handoff Manager is performing the handoff. In our case, Assign Manager can always update the latest information and trigger new handoff if needed.

Assign Manager looks between the alternate AP and the assigned AP and decides which one is better, while Handoff Manager looks between the assigned AP and the old assigned AP and decides if a hard handoff is needed.

After a handoff is done, consider the scenario where OldAssignedATS is a null AP and both AssignedATS and AlternateATS are valid APs. The client moves towards the second best alternate AP. Based on the messages Assign Manager receives from different APs, it finds out this alternate AP becomes better than the assigned AP. It then performs the same process, i.e. copies AssignedATS to OldAssignedATS, copies AlternateATS to AssignedATS, and chooses a new alternate AP. Now all three variables are valid APs.

Currently we have only one alternate AP. Coordinator always keeps the best alternate AP based on the frame reports from all APs and stores it in AlternateATS. When a handoff happens, Coordinator does not search for a better AP at that time, it just hands off to the one stored in AlternateATS.

For example, a STA is moving inside the network. There is one AP where the STA is originally assigned and four other APs who can hear this STA. They are all reporting the signal strength to Coordinator. Out of these four APs, Coordinator will choose the best alternate AP and store it in AlternateATS. If another new AP comes in and has better signal strength, it will replace the AlternateATS. Coordinator will always evaluate them and only keep one best alternate AP ready. When a handoff needs to be triggered, it will be triggered to this best alternate AP. At the time of handoff, Coordinator will not search for the best alternate AP.

3 Coordinator and WNC AgentWhen an extern event happens, the modules that register for that event will receive an AVC_NOTIF message from WNC Agent. The four primary NMS messages sent to Coordinator are:

CREATE - used when an ESS profile is created,

DELETE - used when an ESS profile is deleted,

AVC_NOTIF - used when any change other than CREATE/DELETE made to existing configurations,

GET_RESPONSE - used when Coordinator requests configuration from WNC Agent.

When Coordinator comes up or restarts itself after a crash, it sends a GET_ALL_REQUEST message to WNC Agent to get all the configurations. Then WNC Agent replies with a GET_RESPONSE message. When Coordinator receives this message, it configures accordingly. After that, if there is any configuration change, Coordinator receives a CREATE message if a new service created, a DELETE message if a service deleted, or a AVC_NOTIF message if something else modified. These are the major messages between WNC Agent and Coordinator.

When Coordinator receives a GET_ALL_REQUEST message from WNC Agent, it means WNC Agent is asking Coordinator to return all its data. The data that Coordinator provides to WNC



Agent are StaEdgeList, StaList, AtsList, and AtsAtsEdgeList. These are the data shown by CLI when typing "show topostation".

Each NMS message has a class type defined for more specific operation. Class type IF_802_11_ENTRY is the configuration message for radio interface. When receiving class type SERVICE, Coordinator is expecting some update for BSSID from WNC Agent, such as service downloaded on AP. When this happens, Coordinator adds or updates the BSSID in AtsBssidList.

CLI command "show station" displays the station table from NMS database. Whenever a STA connects to the network and gets an IP address, it registers with WNC Agent. WNC Agent has its own STA table, which is the one displayed for CLI "show station". Figure 7 shows an example of “show station” output.

meru-wifi# show station

MAC Address IP Type AP Name L2 Mode L3 Mode Authenticated User Name Tag Client IP

00:00:f0:04:55:5c DHCP 76-Sw-OAP-2F wpa2-psk clear 0 192.168.9.56 00:01:3e:10:01:bb DHCP 72-CEO-208-2F clear clear 0 192.168.9.151 00:04:e2:b8:02:2b DHCP 1-CS-208-1F wpa clear MERUNETWORKS\apawlovich 0 192.168.9.51 00:0e:35:09:71:96 DHCP 70-VPM-208-1F wpa-psk clear 0 192.168.9.58 00:0e:35:0f:dc:e0 DHCP 69-HW-201-2F wpa-psk clear 0 192.168.9.164 00:0e:9b:9a:05:0a DHCP 76-Sw-OAP-2F wpa2-psk clear 0 192.168.9.67 00:0e:9b:9a:0e:b3 Discovered 71-CTO-201-2F wpa-psk clear 0 192.168.9.172 00:0e:9b:9a:0f:bf DHCP 71-CTO-201-2F wpa-psk clear 0 192.168.9.69 00:0e:9b:b3:25:b7 DHCP 1-CS-208-1F clear clear 0 192.168.9.61 00:12:f0:54:a2:56 DHCP 70-VPM-208-1F wpa-psk clear 0 192.168.9.53 00:12:f0:87:e2:2b DHCP 1-CS-208-1F clear clear 0 192.168.9.60 00:13:ce:7e:a2:58 Discovered 70-VPM-208-1F clear clear 20 192.168.11.58 00:13:ce:83:56:d6 DHCP 70-VPM-208-1F wpa-psk clear 0 192.168.9.161 00:14:6c:53:44:88 DHCP 1-CS-208-1F clear clear 20 192.168.11.56 00:14:a4:4c:62:ca DHCP 71-CTO-201-2F clear clear 0 192.168.9.64 00:16:6f:24:7f:98 DHCP 76-Sw-OAP-2F wpa2-psk clear 0 192.168.9.65 00:16:6f:c7:3d:b0 DHCP 19-QA-208-2F wpa-psk clear 0 192.168.9.170 00:16:cf:a8:a7:b5 DHCP 70-VPM-208-1F wpa-psk clear 0 192.168.9.59 00:40:96:a8:d4:42 Discovered 1-CS-208-1F wpa-psk clear 0 192.168.9.169 00:40:96:ad:1d:b2 DHCP 1-CS-208-1F wpa-psk clear 0 192.168.9.72 Station Table(20 entries)

Figure 7: Output of "show station".

CLI command "show topostation" displays the station topology. This is the table reflects exactly the station database inside Coordinator. It is also the same information Coordinator sends to WNC Agent via message GET_ALL_REQUEST with class type TOPO_STA. Figure 8 shows an example of “show topostation” output.



meru-wifi# show topostation

MAC Address AP AP Name Last Handoff Time State BSSID

00:00:f0:04:55:5c 76 76-Sw-OAP-2F 2007/05/25 12:13:46 ASSOCIATED 00:0c:e6:e6:4c:05 00:04:e2:b8:02:2b 1 1-CS-208-1F 2007/05/24 14:56:56 ASSOCIATED 00:0c:e6:b7:cb:1f 00:0e:35:09:71:96 70 70-VPM-208-1F 2007/05/25 07:55:50 ASSOCIATED 00:0c:e6:80:7e:32 00:0e:35:0f:dc:e0 69 69-HW-201-2F 2007/05/25 12:36:15 ASSOCIATED 00:0c:e6:80:7e:32 00:0e:9b:9a:05:0a 76 76-Sw-OAP-2F 2007/05/24 11:03:45 ASSOCIATED 00:0c:e6:e6:4c:05 00:0e:9b:9a:0e:b3 71 71-CTO-201-2F 2007/05/25 12:28:31 ASSOCIATED 00:0c:e6:80:7e:32 00:0e:9b:9a:0f:bf 71 71-CTO-201-2F 2007/05/25 12:34:46 ASSOCIATED 00:0c:e6:80:7e:32 00:0e:9b:b3:25:b7 1 1-CS-208-1F 2007/05/25 12:02:28 ASSOCIATED 00:0c:e6:80:7e:32 00:12:f0:54:a2:56 70 70-VPM-208-1F 2007/05/25 11:38:17 ASSOCIATED 00:0c:e6:80:7e:32 00:12:f0:87:e2:2b 1 1-CS-208-1F 2007/05/25 12:29:01 ASSOCIATED 00:0c:e6:f7:8f:7a 00:13:ce:7e:a2:58 70 70-VPM-208-1F 2007/05/25 12:31:59 ASSOCIATED 00:0c:e6:93:6b:c6 00:13:ce:83:56:d6 70 70-VPM-208-1F 2007/05/25 12:32:51 ASSOCIATED 00:0c:e6:80:7e:32 00:14:6c:53:44:88 1 1-CS-208-1F 2007/05/25 12:35:30 ASSOCIATED 00:0c:e6:31:d6:41 00:14:a4:4c:62:ca 19 19-QA-208-2F 2007/05/25 12:36:35 ASSOCIATED 00:0c:e6:80:7e:32 00:16:6f:24:7f:98 76 76-Sw-OAP-2F 2007/05/25 08:41:48 ASSOCIATED 00:0c:e6:e6:4c:05 00:16:6f:c7:3d:b0 19 19-QA-208-2F 2007/05/25 12:17:56 ASSOCIATED 00:0c:e6:80:7e:32 00:16:cf:a8:a7:b5 70 70-VPM-208-1F 2007/05/25 12:24:53 ASSOCIATED 00:0c:e6:f8:22:52 00:40:96:a8:d4:42 1 1-CS-208-1F 2007/05/25 12:36:15 ASSOCIATED 00:0c:e6:80:7e:32 00:40:96:ad:1d:b2 1 1-CS-208-1F 2007/05/25 12:30:02 ASSOCIATED 00:0c:e6:f8:22:52 Stations Topology(19 entries)

Figure 8: Output of "show topostation".

There could be multiple entries for a single STA in "show topostation", but only one entry in "show station". Two methods are used to update the data for "show topostation". The first one is whenever an event happens, such as a new STA joining or exiting, Coordinator sends a proactive message to WNC Agent. The other is done by periodical polling from WNC Agent, where WNC Agent gathers statistics from Coordinator every 60 seconds (i.e. "Statistics Polling Period"), which is part of the output of CLI command "show controller".

The CLI command always talks to NMS Agent. When entering "show topoXXX" CLI command, the display is done from the cached information in WNC Agent. Those data are returned by Coordinator in CoordNmsCollect::ProcessNmsMsg in response to the GET_ALL_REQUEST message from WNC Agent. Each class type requests specific information. For example,

"show topoap" - returns AtsList via class type TOPO_ATS ,

"show topoapap" - returns AtsAtsEdgeList via class type TOPO_ATSATS,

"show topostaap" - returns StaAtsEdgeList via class type TOPO_STAATS,

"show topostation" - returns StaList via class type TOPO_STA.

In reality, any of the above four class types will return all four information. WNC Agent fetches them from Coordinator every polling interval and stores them locally. Therefore those CLI commands only reflect the cached information in WNC Agent.

The difference between “show topoap" and "show ap" is the former shows the numbers of neighbors and attached/assigned STAs, while the latter shows dynamic database of APs, which is a completely different database than the one maintained by Coordinator. WNC Agent maintains its own STA and AP databases, which holds different information than what Coordinators stores. The "show topoXXX" shows Coordinator's databases and states. For example, some AP comes up and gets discovered by Controller. Once it is discovered, it gets registered with WNC Agent, which then sends all the configurations to the AP from its database. At the same time, WNC Agent informs Coordinator about this AP and the configuration sent to the AP. Then Coordinator adds that AP in its own database. Figure 9 and Figure 10 show the examples of “show ap” and “show topoap” respectively.



meru-wifi# show ap

AP ID AP Name Serial Number Op State Availability Runtime Connectivity AP Model AP Type

19 19-QA-208-2 00:0c:e6:03:45:20 Enabled Online 3.4.SR2-12 L3 AP208 Local 1 1-CS-208-1F 00:0c:e6:03:27:b7 Enabled Online 3.4.SR2-12 L2 AP208 Local 23 23-ACCT-208 00:0c:e6:03:27:bc Disabled Offline None AP208 Local 69 69-HW-201-2 00:0c:e6:00:3e:63 Enabled Online 3.4.SR2-12 L3 AP201 Local 70 70-VPM-208- 00:0c:e6:03:27:bf Enabled Online 3.4.SR2-12 L2 AP208 Local 71 71-CTO-201- 00:0c:e6:00:3d:d1 Enabled Online 3.4.SR2-12 L3 AP201 Local 72 72-CEO-208- 00:0c:e6:03:27:8f Enabled Online 3.4.SR2-12 L3 AP208 Local 76 76-Sw-OAP-2 00:12:cf:4d:e6:65 Enabled Online 3.4.SR2-12 L3 OAP180 Local AP Table(8 entries)

Figure 9: Output of "show ap".

meru-wifi# show topoap

AP ID AP Name RsRq RsAlloc Neighbor Attached Assigned

72 72-CEO-208-2F 0 0 5 16 0 69 69-HW-201-2F 0 0 5 16 1 1 1-CS-208-1F 0 0 5 17 6 71 71-CTO-201-2F 0 0 5 15 3 70 70-VPM-208-1F 0 0 5 13 5 19 19-QA-208-2F 0 0 5 17 1 76 76-Sw-OAP-2F 0 0 0 5 3 AP Wireless Resources(7 entries)

Figure 10: Output of "show topoap".

In WNC Agent’s AP table, some AP may be disabled off line. That means they are either switched off or not connected to the Controller. But WNC Agent still maintains this information because users may change the configuration of an AP anytime. WNC Agent remembers all the configurations even though the AP is not connected. So when the AP is connected back, WNC Agent will download the exactly the same configuration on that AP. CLI command "show ess-ap" can be used to verify which profile is downloaded to which AP.

Coordinator does not need to maintain the same information as WNC Agent. This is because whenever an AP comes up or downloads a profile, WNC Agent will notify Coordinator all these events. Then Coordinator will update its database. So Coordinator only keeps up-to-date information of which APs are up, while WNC Agent keeps all the configuration information regardless APs are up or down.

AP itself does not have a persistent copy of the configuration. Whenever an AP restarts, it has to download the configuration from WNC Agent. The only information stored in AP's flash is whether the AP has a DHCP or static IP address and the IP address of the Controller to connect to when the AP comes up.

4 Process FlowIn this section, we describe the process flow of major components in Coordinator and the main functions they use.

4.1 Topology Manager

The majority of work performed by Topology Manager is to handle three different messages – Probe Indication, Frame Report, and MAC State.



4.1.1 Probe Indication

When a STA comes up, it broadcasts probe request. All the APs who can hear the request send probe indication to Controller, where the message is handled by ProcessProbeMsg. This function checks if the STA already exists or already assigned. If not, it creates a STA node and assigns the STA to an AP.

The AP's MAC address is not part of the probe indication message (i.e. AtsProbeMsg_t in ats_wnc_msg.h). It is contained in a bigger message (i.e. AtsReport_t), which could consist of different kinds of messages that AP sends to Coordinator. Those messages are defined in a union (i.e. AtsMsgUnion_t). ProcessProbeMsg confirms the BSSID in the probe indication really exists on this AP via searching through AtsBssList. When a service gets downloaded by an AP, it comes with a BSSID, which is added to AtsBssList by Coordinator.

Every time a STA is added to some AP on particular BSSID, ProcessProbeMsg checks the Call Admission Control (CAC) limits, which are used for different QoS configurations. If MaxStationsperBss (from coord.config file) is not equal to zero, it means load balancing is enabled. In this case, all the clients are distributed across different APs in one area instead of being assigned to the same single AP.

Now Topology Manager checks if the STA is in a user-configurable denial list. To do this, it sends the STA's MAC address to MAC filtering module (a security process, also called SMM) asking whether the STA is allowed to enter the network or not. If SMM is disabled or the STA is allowed, it continues to AllowStation.

To summarize ProcessProbeMsg, it does:

1) process probe indication message,

2) extract STA MAC address and BSSID from the message,

3) check all the limits,

4) validate all required data structures,

5) call AllowStation.

AllowStation first finds the StaNode and PerBssid node for this STA, and creates them if not exist. It then chooses an alternate AP. For all the BSSIDs that are common between the AP and the STA (i.e. the BSSIDs exist in PerBssidList of the STA and also exist in global AtsBssList), we are trying to see if this AP can be an alternate AP for the STA on that BSSID.

For first time assignment (i.e. no assigned AP), Topology Manager directly calls AssignSTA, which actually assigns the STA. If the assigned AP is not null, Topology Manager just sets the reassign flag to some value. This flag can also be set when an AP goes down, a new BSSID added, or a new neighbor comes up. In the meantime, Assign Manager checks the reassign flag every second. If it is not equal to null, Assign Manager will evaluate between the alternate AP and the assigned AP and decide if the alternate AP should become the assigned AP.

In the end of AllowStation, Topology Manager calls CoordNmsCollect::AddUpdate to send the update to WNC Agent. This will update NMS cache automatically.

4.1.2 Frame Report

ProcessAtsFrame is the API that processes the frame report message from APs. A frame report actually carries two kinds of information. One is the RSSI report, which contains the latest RSSI values of stations to be used in making handoff decisions. The other is the BSSID report, which contains the latest STA-BSSID states to be used for syncing the state between AP and Coordinator.



The data structure of a frame report message is defined in ats_frame_rtport_t (in ats_wnc_msg.h), where Data contains the amount of BssidReportCount BSSID reports. Each BSSID report (i.e. bassid_report_t) contains a BSSID and flags. The RssiReportCount and rssi_report_t are no longer in use. So RssiReportCount is always zero. Figure 11 illustrates the structure of the frame report message.

Figure 11: Frame Report Message.

4.1.2.1 RSSI Report

The original design was to send multiple RSSI reports and pick RSSI from there. Now RSSI is picked up from frame report. Frame report is sent every three seconds from an AP, where the RSSI value (stored in MaxRssi) is the average RSSI value for the STA over the past three seconds. When AP receives packets from STAs, it records the RSSI value of each packet, averages them over three seconds, and sends them in the frame report to Coordinator.

If receiving a proper RSSI value for a STA from some AP, this AP could become best alternate AP (i.e. it could be a better alternate AP than the one currently selected). If that's the case, it will override the current alternate AP and set the reassign flag to trigger Assign Manager to do a handoff if needed. Here Topology Manager only updates the variables and flag. It is up to Assign Manager to make a handoff decision.

4.1.2.2 BSSID Report

If an AP and Coordinator go out of sync, such as disagreeing on a STA being assigned or not, Coordinator will send appropriate messages to bring the AP in sync. Coordinator always has the most accurate information because it receives the latest update from everybody.



To understand BSSID report, we need to introduce assigned STA and discovered STA, both are maintained by AP. All the AP radio interfaces are in promiscuous mode, so they can hear all the STAs talking even though they are not connected to a particular AP. Promiscuous mode is where the layer 2 interface sends all the packets to layer 3 even though the destination MAC address in the packet does not match the MAC address of the layer 2 interface.

Each AP maintains a database of all the STAs it can hear. When AP hears a new STA, it adds the STA to the discovered database, which is actually an in-memory linked-list. Along with the STA, it also maintains other information, such as the average RSSI value of the past three seconds, which are needed for frame report. These databases have no relationship with Coordinator's databases or WNC Agent’s databases.

Every time a STA is assigned by Coordinator, the STA is added in the assigned database. If the STA already exists in the discovered database, it is deleted from there. A STA can only exist in one of the two databases, not both.

When AP receives an assign-remove message, the STA is moved from the assigned database to the discovered database. The discovered database can be timed-out, in which case all the expired STAs are removed from the database. The assigned database is never timed-out. It is completely governed by Coordinator.

When an AP sends a frame report message to Coordinator, it includes the entire assigned and discovered databases in the message. A frame report message contains a list of the frame reports, one for each STA. In each frame report, there is only one average RSSI value and a list of BSSID reports (as shown in Figure 11), which means a STA can be heard on multiple BSSIDs. For example, the assigned database has 10 STAs and the discovered database has 40 STAs. So there are total 50 frame reports in one frame report message. The flags in BSSID report are used for syncing the state. They indicate

1) if a STA is in discovered database or assigned database,

2) if there was any traffic observed for this STA during the past 3 seconds or not,

3) the MAC state of the STA on this AP (e.g. associated or authenticated).

When parsing a STA frame report, if the STA does not exist in Coordinator's database, it means AP knows about the STA, but Coordinator does not. For each BSSID report, we check if the BSSID exists in the reporting AP. If that’s the case, it means the STA was heard on the BSSID that exists on the AP. So the STA is in the same virtual cell supported by the AP. The followings describe different scenarios and how they are handled.

1) If the AP says the STA is not discovered (i.e. AP thinks it is assigned), but Coordinator does not know about this STA because it does not exist in Coordinator's database. This could happen if AP missed the assign-remove message from Coordinator. In this case, we bring the AP in the same state as Coordinator by sending an assign-remove to the AP.

2) If the AP says the STA is discovered and there is some traffic heard on the STA, this is as good as sending a probe indication. It means there is a STA which is heard over the air talking to the BSSID that exists on the AP, and the STA is not assigned to anyone, i.e. not in Coordinator’s database. It could be Coordinator missed the probe indication from AP. In this case, we just add this STA into Coordinator’s database.

3) If the assigned AP of this STA is the same as the AP who is sending the frame report and the STA state in Coordinator’s database is ASSOCIATED, then we just update the link information.

4) If the assigned AP of this STA is the same as the AP who is sending the frame report but the MAC state in frame report does not match the one stored in Coordinator’s database, then we request the AP to send a MAC state indication for this STA to reconfirm our state



of STA is the same as AP’s. If AP still reports different MAC state, then the syncing will happen in MAC state indication handling.

5) If the assigned AP of this STA is different from the AP who is sending the frame report and the AP thinks the STA is assigned on this BSSID, it means Coordinator has some other AP assigned for this STA. In this case, Coordinator sends an assign-remove message to the AP who thinks the STA is assigned.

6) If the assigned AP of this STA is the same as the AP who is sending the frame report but the AP is reporting the STA is discovered, it means the AP thinks the STA is not assigned but Coordinator thinks the STA is assigned. In this case, Coordinator needs to send a new assignment to the AP.

4.1.3 MAC State

ChangeStaMacState is the API that processes the MAC state indication message from AP. Whenever STA changes its MAC state (e.g. probing → authenticated → associated or reverse), all those changes will be sent from AP to Coordinator.

The MAC state message contains STA MAC address, BSSID, and MAC state. It indicates the MAC state of this STA on the given BSSID. Each MAC state indication message is sent separately for each MAC state change, not like frame report.

When a STA changes from probing to authenticated or from authenticated to associated, it is called moving up the state. The reverse is called moving down the state. In normal scenario, when a STA moves from authenticated to associated, we just inform WNC Agent (via RequestHandoff) that the STA is now associated. So WNC Agent can update its STA database (those shown in "show station"). There is no real handoff performed in this case.

If the STA is currently associated and MAC state message reports that the STA is associated on the same BSSID, it means the STA was temporarily disassociated then becomes associated again. In that case, Handoff Manager picks up the change and updates WNC Agent only. It does not actually perform any handoff. This is done inside handoff state machine. But if the STA was associated on a BSSID and is now associated on another BSSID without informing the old AP, then an actual hard handoff will be performed.

If a STA changes from associated to authenticated, it may get assign-remove depending on CAC limits and other variables. When an AP has an assignment for this STA, the STA can just go ahead and get associated. All the checking for CAC limits, station limits, and validation are done during processing probe indication and before AllowStation. Once AllowStation is done, assignment is given, and then there is no further communication from Coordinator to AP. As long as AP has the assignment, it will allow the STA to get associated. Consider the scenario where all the validations are met and a STA get associated with the AP. Now if the STA is disconnecting from the network, such as disabling/enabling the network on Windows or switching on/off the handset, it will send disassociation frame to AP. Then AP will send MAC state indication to the STA to downgrade its state from associated to probing. At this time, we can either remove or not remove the assignment. If not removing the assignment, when the STA switches on again, it will connect immediately to the network because the assignment is still on the AP.

A problem could happen in the above scenario when the station limit is reduced on Controller before or during the STA is disconnected. The change does not affect the STA when it comes back. For example, an administrator initially allows only four STAs to connect to the AP, so the station limit is set to four for the AP. This limit is checked before the assignment is sent. Now the administrator reduces the station limit from four to two. When two of the four STAs switch off and switch on again, they can still connect to the network. This is because the original assignments are still there. To solve this problem, the assignment is removed when load balancing is enabled and STA downgrades its state. Thus when the STA boots up again, it has to go through the whole assignment and AllowStation cycle, where the limits and validations will be checked again.



4.2 Assign Manager

In CoordAssignMgrMain, Assign Manager activates its mailbox and creates the timer with callback function CoordAssignMgrReassignCB. The timer value is taken from ReassignInterval (currently one second) in coord.config file. So every second this timer will expire and CoordAssignMgrReassignCB will be called, which sends the ITC_Reassign signal to Assign Manager. Upon receiving this signal, Assign Manager traverses through all the STA nodes as well as their PerBssid nodes, and checks if the reassign flag is not equal to null. If there is a match, it calls AssignSTA.

AssignSTA gets the Association ID (AID) from the AID pool, which is maintained per BSSID. Because of the single BSSID used in virtual cell, the limitation of 2007 clients is not for each AP, but across the whole virtual cell. Similarly, because AID is given from Controller, not generated by AP, it has to be unique in that virtual cell. When Coordinator sends an assignment, AID will be given along with the assignment. If AssignSTA finds there is no AID allocated to the STA, it will allocate one for that BSSID. This AID will be used by AP when the client actually associates with the AP.

The algorithm used to compare AssignedATS and AlternateATS is based on their RSSI values in two consecutive frame reports plus some threshold checkings to avoid unnecessary handoff. Once a handoff decision is made, AssignSTA will change AssignedATS to OldAssignedATS and change AlternateATS to AssignedATS.

When a frame report comes in, Coordinator stores current RSSI to PriorRSSI and new RSSI to RSSI. Both RSSI values are stored inside StaAtsEdge under StaNode, where a STA can have edges with multiple APs.

To compare the RSSI values, AssignSTA calls FindOldMetrics to get the RSSI and PriorRSSI values of AssignedATS for a given STA. Then it retrieves the RSSI and PriorRSSI values for AlternateATS from the edge between the STA and AlternateATS.

AP208 and AP200 use different hardware, so they have different RSSI ranges. For example, AP208 reports positive RSSI values and AP200 reports negative RSSI values. In coord.config file, there are two sets of threshold values defined for different AP models. For AP200, there are AP200AdequateRssi, AP200NewSignalGoodness, and AP200GoodnessCutoff. For other models, there are AdequateRssi, NewSignalGoodness, and GoodnessCutoff. The algorithms used to compare those thresholds are the same for all models and are described below.

1) AdequateRssi - If AssignedATS reports RSSI value higher than AdequateRssi, we will not trigger a handoff even AlternateATS reports better RSSI value. This means AssignedATS can still provide good enough signal strength to service the STA. So there is no need for handoff.

2) NewSignalGoodness - The RSSI value of AlternateATS should be better than that of AssignedATS and the difference needs to be higher than NewSignalGoodness. This can prevent handing off STA back and forth due to small RSSI difference.

3) GoodnessCutoff - If AssignedATS reports RSSI value lower than GoodnessCutoff, then NewSignalGoodness does not apply. This means if AssignedATS reports very poor RSSI (i.e. below GoodnessCutoff), we will be very aggressively handing off the STA. So even the RSSI difference between AssignedATS and AlternateATS is very small, we will handoff the STA to AlternateATS.

If there is no AssignedATS and AlternateATS (i.e. AssignedATS crashes and AlternateATS is not set), then AssignSTA calls RemoveAssignmentFromAts to remove the assignment. For new assignment, AssignSTA calls ChangeAssignmentToNewAts to update AssignedATS and invoke CoordHandoff::RequestHandoff to create and send out a



handoff request. In the meantime, Assign Manager adds a retransmit timer. If the initial send fails, Handoff Manager will retransmit the handoff request when the timer expires.

Both Assign Manager and Handoff Manager are threads inside Coordinator. Sometimes RequestHandoff is called only to send update to WNC Agent without sending any handoff message to AP, in which case there is no change in Coordinator’s data structure and no handoff happening.

In summary, Assign Manager decides whether an assignment or handoff has to be done and changes all the variables in the BSSID node accordingly. Then Handoff Manager picks up those configurations, generates an assignment or handoff request, and sends it to the AP.

4.3 Handoff Manager

There are two kinds of handout. One is soft handoff, where BSSID is the same. The other is hard handoff, where BSSID is changed. A hard handoff could happen in the following case.

A STA was connected to BSSID1. Then the STA moves on its own and connects to BSSID2 without telling BSSID1 (i.e. without sending deauthentication and disassociation frames to BSSID1). Because the assignments are there for this STA with both BSSIDs, the STA can decide which one to connect. This can happen in several ways. For example, if virtual cell is not there, each AP has a different BSSID for the same profile. So STA can move across APs thinking that it is the same profile. STA on its own might think this is a better AP and switch to that. Or the STA has two profiles configured even the virtual cell is there. It is like a list of preferred profiles configured on a wireless laptop, where their priorities are in descending order. The wireless driver starts scanning the first one in the list. If it does not find that, it will try the second one. Later if the first profile comes up, it will connect back to the first one.

If the STA jumps from BSSID1 to BSSID2 without telling the AP on BSSID1, the AP does not know the STA is gone. But Coordinator will know that because when the STA connects to BSSID2 and closes to associated state, Coordinator can see the STA was previously associated with BSSID1 and now moves on to BSSID2, where BSSID2 can be on the same or different AP. In this case, Coordinator will do a hard handoff, which tells the old AP that the STA has moved to a different network and the assignment for that STA has to be removed.

SendHanoff is the state machine of Handoff Manager. To perform a soft handoff, Controller sends HOFF_INIT to the old AP, which is the current AssignedATS. Then the old AP sends handoff state to the new AP. Once the handoff is completed, the new AP sends ack to Controller. That's how a handoff is done. If the handoff timer expires and Controller has not received the ack from the new AP, it will resend HOFF_INIT to the old AP. If the soft handoff failed for three times, it will revert to a hard handoff via ResendHandoffReq. For hard handoff, Controller sends assign-remove to the old AP and sends assignment to the new AP. This can be done in parallel because there is no state transition between the old AP and the new AP.

4.4 Timing Report

The edge between two APs can be created directly or indirectly. In Meru system, each beacon has a Meru Information Element (IE), which contains the sending AP’s MAC address. Information Element is a proprietary field, which is ignored by STA. If two Meru APs can hear each other directly (i.e. they can hear each other's beacon directly), they can understand the Meru IE in beacons and create an AP-AP link between them. For two APs not hearing each other directly, they can still create a link if there is a STA in between that can hear both APs. Timing report is used to generate the AP-AP link for this case.

In 802.11, each packet contains a sequence number. As both APs can hear this particular STA, they store specific information of every sixteenth packet (i.e. packet with the sequence number a multiple of 16) received from the STA. They both send this information in timing report to



Coordinator. Coordinator can see that both APs heard the same packets (i.e. packet sixteen, thirty-two, forty-eight, etc) from the same STA. Thus there should be an edge between the two APs.

StaAtsSyncMsg_t defines the format of timing report for each STA in the indirect case. The message includes the MAC address of the STA and STA message ID, where STA message ID is an eight-byte field consisting of the sequence number and the CRC value of the packet. All the APs who can hear the packet from a STA will send this information to Coordinator. Coordinator stores this information and uses STA message ID as the key to match all the APs who send the same packet information and creates edges between those APs.

5 Load BalancingWhen the load balancing feature is enabled, Coordinator will try to dispatch clients across multiple APs, such that the whole system can have a balanced load distribution. In this section, we describe the original load balancing algorithm and its potential problems, followed by the new algorithm.

5.1 Overview

In a conventional client-server environment, load balancing usually means a centralized process that receives client requests, picks up the least loaded server from a server farm, and forwards the request to the selected server. Ideally by doing this, the load can be evenly distributed across multiple servers. In Meru environment, the client is a station, the server is a BSS providing wireless service, the server farm is an ESS consisting of multiple BSSs, and the centralized process is Coordinator.

What makes Meru’s load balancing different from the traditional one is how the load is computed. The load on a traditional server usually is defined as the number of client requests being processed on the server, the CPU or resource usage, or a combination of both. In Meru, it is defined as the number of associated stations on a BSS or BSSID.

In an 802.11 environment, a client needs to get assigned and then get associated before transferring actual data. During this process, Coordinator can only allow or reject a client request during the assignment time. It has no control on the association because it is the client that decides which ESS or BSS to get associated. This behavior makes Meru’s load balancing more difficult than the traditional one, where the latter is equivalent to distribute loads based on the number of assigned stations instead of associated stations.

The purpose of Meru’s load balancing is to distribute the load across all the BSSIDs under the same ESS. Because Coordinator has no full control of BSSID association, our goal is to make the difference between the numbers of associated stations on each BSSID as small as possible.

In the real world, it could happen that either none or all of the stations on a BSSID become associated after getting the assignments. In these two extreme cases, we need to clean up unused assignments or increase the station limit to accept new assignments.

5.2 Feature Descriptions

The load balancing algorithm can be turned on by setting qosvars max-stations-per-bssid to a non-zero number, which is also used as the initial station limit. If qosvars load-balance-overflow is on, the station limit can be increased dynamically by a fixed amount specified by BSSOverflowIncreaseSlot in the coord.config file.



5.2.1 Station Limit

In the original implementation, Coordinator uses the current assigned count to determine if a new station can get assigned when receiving a Probe Indication message for that station. And it uses current associated count to determine if the current limit should be increased.

One problem with this approach is an associated station continues to probe the BSSIDs to which they were not assigned and it has no intention of associating with them. This will tie up the load balancing resources and prevent unassigned stations from getting assigned to an AP. One way to alleviate this situation is when a station is associated with a BSSID, we remove the assignments of this station with other BSSIDs in the same ESS. This can free up resources for new clients. Another way is to move the additional assignments of an already associated station to a queue to be released when both the station limit is reached and a new station wants to join.

We should also consider the situation after adding a new BSSID in the ESS, the load limit needs to be adjusted so that the new BSSID can take the load quicker than existing BSSIDs. This could be done by decreasing the current limit, which makes existing BSSIDs above the limit and thus can not accept new stations.

Currently the station limit could be decremented when a BSSID is inactive for a fixed amount of time. We need to make sure the decrement logic should not allow the current limit to go below the initial limit.

When Coordinator restarts, it should start with previous limit. Otherwise, it will take time through several iterations for the limit to be increased up to the point where it stopped. During this time period, new clients can not join because existing assigned or associated clients have already taken the resources.

5.2.2 Limit Increment

The logic that determines whether or not to increase the limit can be changed to be when the highest-count BSSID and lowest-count BSSID are within the limit increment (i.e. BSSOverflowIncreaseSlot) rather than waiting until all BSSID counts are at the limit. Alternatively, the new limit can be changed as the lowest-count plus the limit increment if that is larger than current limit. By doing this, we can prepare more resources in advance.

We can also change the current limit increment from a hard-coded value to be a percentage of the current limit with an appropriate minimum and maximum. The idea is to increase the increment amount as the stations scale so we are reducing the number of occasions that all BSSIDs are full at any given time. Using the percentage would still keep a reasonable balance among the BSSIDs and keep the difference among BSSIDs relatively small as the assigned station count scales.

In contrast to the limit increment, the limit decrement should move slower (i.e. the decrement amount should be smaller than the increment amount) in order to prevent oscillations between increment logic and decrement logic. For example, we can increment by 3 and decrement by 1 to adjust the limit faster when the load increases.

5.3 New Algorithm

In the new load balancing algorithm, Coordinator has an additional assignable state for each BSSID it supports, where the state could be one of the followings:

P = Can Participate (i.e. can assign a new station and send a probe response), NP = Cannot Participate.

When a BSSID is NP (i.e. nonassignable), it will not accept any new station.



Coordinates also defines two thresholds: MinAssociations = minimum of associations among all BSSIDs supporting the same ESS

profile on the same interface, AssignmentLimit = current assignment limit per BSSID.

The main load balancing logic can be illustrated as

MinAssociations = min( #association(BSSID) for all BSSIDs )

if ( #association(BSSID) > L1 + MinAssociations ) BSSID is NP

elseBSSID is P

AssignmentLimit = L2 + min( #assignment(BSSID) for all BSSIDs that are P )

where L2 is the limit increment and L1 is the acceptable difference between the minimum and maximum numbers of associations among all BSSIDs in the same ESS. If L1 is infinite, the algorithm will be purely dependent on the number of assignments.

Immediately upon MAC State changed from non-Associate to Associate, we remove the assignments for this station from all other BSSIDs assuming the station can come back in later.

6 Miscellaneous

6.1 Scanning AP

Scanning AP is an AP in scanning mode to serve Rogue AP mitigation. Connecting an external AP into an enterprise network may affect the network service and performance because it may operate on the same channel or create security issues. The external AP in the scenario is called Rogue AP.

To solve this problem, Meru supports the rogue AP mitigation feature, where an AP is put in scanning mode to scan all the channels. For example, there are ten APs in each flow. We put one of the APs within the center of the flow in scanning mode. That AP will scan all the channels rather than serving the clients. It is actually listening to all the packets on the air. So it can see all the beacons and detect if a beacon is not from a source in our network. For example, a beacon does not have Meru OUI in the MAC address. If such packet is detected, it means somebody not part of Meru network is sending beacons. The scanning AP captures those packets and sends them to the Rogue AP module in Controller, which analyzes those packets and detect if they are from a rogue AP.

Once an AP is put in scanning mode, it is like a down AP for Coordinator. It does not join station assignment or handoff. For a large network, there could be more than one scanning APs to cover the whole area.

6.2 Channel vs. Radio

An AP can have multiple interfaces, where each interface is the physical radio on the AP. The physical radio could be one of the channels supported by the interface. Each 802.11 radio has a range of channels, where channel and frequency is one-to-one mapping.



Use the example in Figure 12, AP-1 has two interfaces, 1 and 2, where interface 1 is 802.11bg radio using channel 6 and interface 2 is 802.11a radio using channel 40. When deploying a network, you can choose the channels you want to use. In AP208 and AP150, radio 1 is always 802.11bg radio and radio 2 is always 802.11a radio. 802.11bg radio can be put in b-only, g-only, or b/g-combined. You can select any available channel for a radio, but preferring non-overlapped channels to be least interfered with each other. The relationship between channel and radio is an AP has the interfaces of radio at the physical interfaces on that AP. In our case, interface 1 is always 802.11bg and interface 2 is always 802.11a due to hardware limitation. After the RF selection, you can select the channels for that radio inside its RF band.

Figure 12: Radio Configuration.


Coordinator Tutorial v1.0

Documents

Transcript of Coordinator Tutorial v1.0