[IEEE 2011 IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) - Dublin,...
Transcript of [IEEE 2011 IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) - Dublin,...
12th IFIP/IEEE 1M 2011: Application Session
ABSTRACT
Shyyunn Sheran Lin, Gregory S. T hompson, Viren Malaviya SSTG, Cisco Systems 170 W Tasman Drive, San Jose, California, U.S.A.
([email protected], [email protected]
IEEE 1M 2011 Cisco Systems
Cisco.com
A major task of managing a computer network is to gather the inventory of the devices, including hardware, software, and the configuration. It is desirable to collect this information efficiently. Traditional Network Management tools require an appliance that resides on the network to collect device information. This paper introduces a distributed, embedded approach to collect network device information without an extra appliance. Utilizing device programmability with add on scripts, the network devices can communicate with each other and perform the inventory collection tasks. The collected inventory information can be sent to network management stations or hosted inventory reporting applications. This mechanism is used on Cisco devices without any device OS upgrade by using a set of scripts to collect CLI and SNMP information from the devices. Utilizing the computing power on multiple networking devices, this distributed mechanism can concurrently collect the whole network inventory efficiently. This paper gives the overview of the approach, then drills into the architecture, mechanism of the scripts, the coding standards, performance, impact to the devices, scalability, benefits, and discusses future work.
978-1-4244-9221-31111$26.00 ©2011 IEEE 745
IEEE 1M 2011, Cisco Systflms
This solution uses an embedded approach to collect device inventory information. It utilizes the device programmability
capability. In this case, scripts were developed to program the devices. The scripts are downloaded from a company website and installed at a gateway device at the customer site, the Gateway device then pushes the scripts to several selected devices. These devices communicate to other devices and gather the required SNMP MIB information and the configuration information by
invoking OS native commands. The collection can be sent to a hosted inventory application through internet or a network
management application hosted at the customer's premises.
The devices in the customer network are categorized into the following 3 roles: Gateway, Collector and End devices. The
Gateway device is a router that runs the program collecting information from the collectors and sends the whole collection to an
application backend server via the Demilitarized Zone (DMZ). The Collector is a router that can run the scripts to collect its own device information and the neighborhood devices information and send back the collection to the Gateway. The End devices can
be any router or switch devices. End devices do not need to have the scripts installed, as long as it can response to SNMP and CLI
commands, the data can be collected. All Gateway, Collector and End Devices are connected in the network.
In the initial phase of the approach, there is only one Gateway device which is a registered device to the backend
application. The scripts are downloaded and installed to this devices, customer will provide a seed file which shows the hierarchy
of Gateway, Collector and End devices information. The relationship of Collectors and End devices are also specified in the seed file. There can be many Collectors and each one is responsible for several end devices. The Gateway device is the master and
center of this whole collection; it is responsible to push the scripts to selected collectors and starts the collection. When a
collection starts, the Gateway device directs the data collection by parsing the master seed file, identifying collectors, and
spawning slave collection policies on each collector. Inside the Collector, separate threads are spawned for each device and
collection type so it can parallel communicate to End devices. Gateways can also be Collectors, so if the network is small, one
gateway device can serve the collection tasks, thus the Collectors can be eliminated. Once the remote Collectors are spawned, the Gateway collects any end devices assigned to it. After collecting Gateway's own end devices, the Gateway sleeps and
periodically wakes up and checks for incoming collections from the Collectors. When all data has been sent to the Gateway, the
data is archived as a complete inventory and a transport policy is launched to send the data securely to remote applications.
To qualify to be a Gateway or Collector, a device must have the Embedded Event Manager feature, security PKI support,
and some free local storage per device collected. Free local storage is usually the limiting factor on how many devices any given Collector can inventory. The Gateway device also needs to have an outside connectivity if it needs to send the collection to other
application outside the customer network. The collection can also be sent to an application that is hosted at customer's premises.
746
Script Based Embedded Collection Approach
.com
Utilize device programmability, scripts will be downloaded to the device
Once setup, collection works transparently in the background
Using the devices themselves for collection saves administering a separate device, and power/HVAC load
No new device OS image changes required
Secure data transfer among devices and to the hosted application
IEEE 1M 2011, Cisco Systems
The Embedded Collector aims to provide a low touch application which requires little or no support from vendors and it
is easy to install on network devices for data collection. This approach can replace dedicated collection appliances for smaller
networks. By remedying external appliance, the approach not only satisfies Green criteria but also saves on power and reduces the Heating, Ventilation, Air-Conditioning (HV AC) needs. The Embedded Collector design leverages lOS programmability feature:
namely Embedded Event Manager (EEM), where EEM policies written in scripts provide the intelligence to collect the data. Once
the policies are installed, collection is automatically scheduled to collect data from the network devices. There are two ways to
launch the collection, one is on demand collection where users invoke scripts to start the collection, the other one is periodic
automatic collection in which an auto scheduled collection can be kicked off from the EEM timer event and users can configure
the intervals of the automatic collection based on application and business requirements. There are mUltiple ways to implement the Embedded Collector approach. One approach is to integrate the collection
capability in the operating systems (OS), which can utilize and integrate the features in the operating systems more coherently.
However, the feature integration with operating systems release usually takes long and the customers resist upgrading with a new
image due significant effort involved in re-certifying and testing. A typical philosophy among network operators adhere to is "If
it isn't broken, do not fix it". Thus, the OS releases running in real customer networks are often 1-2 years behind current releases. The approach introduced in this paper utilizes device programmable scripting capability. This approach is quicker to
develop and more easily adopted by the users, since no OS upgrade is needed. The scripts are easier to upgrade as well as install.
If users want to remove the scripts, an uninstall script can remove the feature. For most customers security of their data is of
paramount importance. As a result the data transferred among devices and the other application are encrypted and secured. This
collection method offers the benefit of getting the complete view of the network inventory without sacrificing the security and
safety of the network information.
747
Device Programmability using Embedded Event Manager
• Event Detectors
"watch for events of interest"
• EEM Server
T he "brains" of the system
• Policies (scripts)
Applets
Tcl-based
All of this is internal to Cisco lOS
POLICY ENGINES - TWO TYPES
Event Subscribers IEEE 1M 2011, Cisco Systems
CISCO.com,
ED notifies EEM
Server; which
triggers interested
policies
4
Cisco EEM (Embedded Event Manager) is a Cisco lOS feature that allows end users to specify a condition to watch, and
write a policy to be carried out when that condition is met. It consists of various event detector modules that watch and report
events to the EEM server. The EEM server sends the event to the appropriate policy engine - either CLI applet, TCL (Tool
Command Language), or lOS shell (in newest lOS images). Some examples of event detectors are: Syslog, SNMP, Timer,
Interface Counter, CLI, OIR (On-line Insertion/Removal), Manual, lOS Watchdog/System Monitor, in which the Timer and the Manual detector are used in this Embedded Collector.
In general, EEM can be used in several areas: It can apply workarounds for problems discovered in the fields and
increase reliability by monitoring the system behavior and try to do fault detection, prevention and recovery. It provides
automation for management tasks by bundling mUltiple tasks and execute them automatically when a specific condition is
encountered. In the problem diagnosis area, its event detection capabilities can help to identify issues. If additional features or functionalities are needed after lOS releases, new logic can be added on top of IOS without image upgrade using EEM. EEM also
allows end users to modify user interface and do feature customization with external entities so that it satisfies a wider range or
customers.
An EEM policy is a script that is executed to carry out the desired actions. Those actions include: execute an lOS CLI
command and receive the result; Force a switch over to the standby in an SSO configuration; Request system information; Send
an email; Access SNMP data. Locally and remotely; Send XML-RPC requests; Send SNMP traps with custom data; Log a message to Syslog; Reload the box; Cause another EEM policy to be executed and Publish an application specific EEM event
EEM is normally used in a reactive mode, where some action needs to be automated in response to an event. The
Embedded Collector is essentially a suite of EEM policies that are replicated on devices that are designated as Collectors. The
policies interact with each other both on the collector devices, and between devices using secure protocols.
EEM policies can share data via the EEM's context mechanism. The context mechanism is a global TCL hash table built
from namespace variables which match a given regular expression. Master policies launch slave policies on the same device using the "action yolicy" extension, and slave policies pick up parameters and adjust counters by retrieving the context. Thus, the
Embedded Collector can use this mechanism as a crude semaphore. Each policy sleeps and loops until the context is available for
reading, which means another policy has finished updating it.
748
Assistant
Embedded Collection Flow
tel net or ssh for show commands
IEEE 1M 2011. Cisco Systems
Transport
Gateway
Collector
Yellow- Control Blue - Data
Inside Cisco
Firewall (i.e. DMZ)
5
The figure depicted above is the architecture of the collection in the Gateway and the Collector. The system is kicked off
via EEM timer event or via manual CLI invocation. A master scheduler policy parses a seed file and launches a collector control
policy for each device to be collected. A collector control policy determines if the collection is local or remote, and launches
SNMP and CLI collector policies appropriately.
Collector policies consult a local platform database - called the Inventory Control File(ICF) which returns the devices
OS, MIB OIDs, and CLls to collect. The Collector formats the output in a standard directory structure and archives the tree for
delivery. The Forwarder policy watches for collector completion, and sends the archive back to the gateway. The gateway
aggregates the returned inventories and re-archives the inventory for delivery to the back end, or numbers the incoming archive
and sends the partial inventory to the back end for re-assembly once all parts have been received or a timeout occurs.
The ICF is implemented as a collection ofTCL lists and arrays. One list contains all CLls understood by the back end,
and another list contains all SNMP OIDs. The three arrays are indexed by platform model number which are retrieved by the policies via SNMP query for the chassis model. The platform type array contains the platform OS. Two other arrays contain lists
of indexes into the CLI and SNMP OlD lists. List elements may be a discrete index or expressed as ranges, e.g. 20-37 to save
storage.
The following shows excerpts from the lists and arrays used in the ICF
set se_show_commands [list \
"show startup-config" \ "show running-con fig" \ "show version" \
The list elements in the above MIBS OlD list are lists themselves (a list of lists). The first element is the OlD of interest. The
second element is the MIB name. The third element is the expected prefix string returned by the SNMP Proxy CLI. This is done
so that the policy can determine the boundaries of the SNMP get-next or get-bulk operation. The final element is a key for the
type of query to be used S = single OlD to be retrieved, T = this OlD is column of a entire table to be retrieved.
array set platform_mib { {CE-550-DS3} {0-1620-37}
{CE-505} {0-1620-37}
{CE-507} {0-1620-37}
749
TCL Coding Standards and Techniques in an Embedded Environment
Clsco.com
EEM adds extensions to TCL 8.3.4 to query event info, launch policies, send syslog messages, etc.
EEM TCL library support is available for some common functions such as CLI, SMTP, and TCL global variable state check-pointing.
TCL allows all exceptions to be "caught"
Control policies are kept simple- "middle-man"
More complex collection policies are spawned per device, so that exceptions do not kill the entire collection
EEM environment variables provide a way for the end user to tune collector behavior, e.g. set timeouts, retries, debug level, etc.
IEEE 1M 2011. Cisco Systems
6
TCL has been used since the mid-90's in Cisco lOS for regression testing purposes. At the turn of the century, Cisco lOS started shipping with a TCL interpreter to support the Interactive Voice Response (IVR) feature. Shortly after, other Cisco lOS
TCL-enabled features were developed such as tclsh parser mode, Embedded Syslog Manager (ESM), Embedded Menu Manager
(EMM), and the main feature supporting the Embedded Collector, Embedded Event Manager (EEM).
Using TCL on Cisco lOS is a little more challenging than on a general purpose OS, in that on a general purpose OS, an
exception usually only results in the death of a single process. On Cisco classic lOS, the entire OS can be treated as a single process. Thus, of great concern are things such as infinite loops. Cisco lOS actually ensures there is no CPU hog in the case of
mal-written scripts - but the script will have to be killed by the EEM framework using the maximum runtime specification.
Another item to be mindful of is uncontrolled growth of lists and arrays. Since we want our TCL scripts in EEM policies
to run on a variety of platforms, with varying amounts of available RAM, we must ensure we periodically write to persistent
storage vs. consuming too much RAM. Which leads us to another area of concern - persistent storage. Unlike a general purpose
platform, most Cisco lOS platforms use flash memory vs. hard disks for persistent storage. Since flash memory can wear out after so many write cycles, we must also take care not to write unnecessarily often to the file system. Thus, there is a balancing act that
must be met between RAM consumption and writing to persistent storage.
With the above limitations in mind, putting the Embedded Collector together using a suite of EEM TCL-based policies
presents additional challenges. Again, every function in the Embedded Collector must handle exceptions in order to prevent the
exception from halting the policy completely. Static analysis tools can help spot unhandled exceptions. In addition to exception
handling, control policies are kept simple in order to act as "middle-men", in that their only purpose is to forward arguments and watch the collection policies. This allows the master policies to be insulated from scripts that hang due to communication faults,
or in case an unhandled exception slips through the cracks. EEM also allows for "environment" variables - which are stored in the
device's running-configuration and automatically passed to TCL-based policies as global TCL variables. This allows the
Embedded Collector to provide tunable parameters for customers to alter the behavior of the collector such as shorten or lengthen
time-outs as needed, adjust debug level, etc. This way, policy scripts can be pre-compiled from Cisco, but the behavior altered
somewhat without having to modify the script itself.
750
Script Obfuscation and Protection - -
Cisco lOS has a built-in byte-code loader which allows TCL-based features, such as EEM, to treat pre-compiled scripts
just as if they were plain text. This allows scripts to be obfuscated to discourage tampering, and protects intellectual property. Embedded Collector policies and library files are pre-compiled using the TclPro compiler.
The Signed TCL Scripts feature introduces security for the TCL scripts. This feature allows users to create a certificate to
generate a digital signature and sign a TCL script with that digital signature. The script is checked for a digital signature from
Cisco. In addition, third parties may also sign a script with a digital signature. If the script contains the correct digital signature, it
is believed to be authentic and runs with full access to the TCL interpreter. If the script does not contain the digital signature, the
script may be run in a limited mode, known as Safe TCL mode, or not run at all. After each routing device enrolls in a PKI, every peer (also known as an end host) in Public Key Infrastructure (PKI) is
granted a digital certificate that has been issued by a CA. When peers negotiate a secured communication session, they exchange
digital certificates.
A Rivest, Shamir and Adleman (RSA) key pair consists of a public key and a private key. When setting up your PKI, one
must include the public key in the certificate enrollment request. After the certificate has been granted, the public key is included
in the certificate so that peers can use it to encrypt data that is sent to the router. The private key is kept on the router and used both to decrypt the data sent by peers and to digitally sign transactions when negotiating with peers. RSA key pairs contain a key
modulus value. The modulus determines the size of the RSA key. The larger the modulus, the more secure the RSA key.
However, keys with large modulus values take longer to generate, and encryption and decryption operations take longer with
larger keys.
A certification authority (CA), also known as a trust point, manages certificate requests and issues certificates to participating network devices. These services (managing certificate requests and issuing certificates) provide centralized key
management for the participating devices and are explicitly trusted by the receiver to validate identities and to create digital
certificates. Before any PKI operations can begin, the CA generates its own public key pair and creates a self-signed CA
certificate; thereafter, the CA can sign certificate requests and begin peer enrollment for the PKI.
You can use a CA provided by a third-party CA vendor, or you can use an internal CA, which is the Cisco lOS
Certificate Server.
751
Collection Time
- -. • dUJIIIIIIUIIIIIIIJIIIIIIIIIII
Num Devices vis Collection Time per run
120
Xl 100
.� 80 I!l '5 60
B 40 E " z 20
0
0 5 10 15 20 25 30
Tot Collection Time per run (Min)
I--+--- Series 1 I
35
The GW and Collectors were C7206 NPE-G1 with at least 256DRAM and 128MB Flash. The CPU did not
exceed more than 12%.
Each Collector has 10 End devices. Thus collection is for: (1 GW + num_oCcoliectors + (num_oCcoll • 10
end_devices)
IEEE 1M 2011. Cisco Systems
Clsco.com
8
The above figure exemplifies an actual collection of a network with 100 devices. A 100 device collection takes
approximately 20-30 minutes, or approximately 200-300 devices per hour. The time will vary depending upon the device load,
and how many collection polices are allowed to run in parallel. The host device resource usage can be governed by using EEM parallel thread configuration. The target resource usage for inventory collection is < 10% CPU impact. It is suggested to schedule
a collection in off peak hours. Prior to starting any collection, the script based application first checks the CPU utilization of the
device over the last 5 minutes. If it exceeds a threshold then collection function is not performed.
Early performance of the Embedded Collector was initially in terms of hours, but several enhancements were made
during development. First, individual SNMP get-next queries were replaced with smarter logic that uses SNMP get-bulk queries to determine table size and quickly retrieve the entire table as quickly as possible. Second, inter-router policy launching timing
was improved. Custom policy spawning TCL procedures in which slave policies could notify master policies as soon as they were
successfully launched. Third, CLI and SNMP collection was separated into autonomous independent policies that can be queued
in any order by the EEM policy engine. This allows tuning of the EEM sessions scripting thread number per platform.
The Information collected includes:
eLI MIB
Show ver OldCiscoChassis
Show diag OldCiscoChassisCard
Show module CiscoStack
Show hardware Cisco Stack Module
Show inventory System
Show idprom all Old Cisco Sys
Show running-config Cisco Flash
Show startup-config Cisco Memory Pool
Inferface
IP Address
Entity
752
Device Performance Impact
Transaction Verify
CPU
utlliz.atlon
Gateway
10
Assign
Gateway
10
CPU utilization - cat6k 100
90 co 80 o 70
� �� :; 40
Verify Collector
30
ii'30 � u 20
10 0 ------------------------
_CPU utilization
Install
Collector
20
IEEE 1M 2011, Cisco Systems
Collection Tarring of Collection
30
Files Complete
30 10
Clsco.com
9
One of the primary goals of a low-touch Embedded Collector is its transparent operation. After initial setup, it does its
job with minimal impact to the target network. In a trade off, replacing the computing power of a dedicated appliance with a
collection of scripts that runs on the devices themselves is a challenge. To minimize the performance impact, the number of scripts that are allowed to run concurrently is controlled and set to a minimum as the default. EEM calls these session scripting
threads. EEM policies can be marked as members of scheduler classes, and each class can be throttled to a specific thread number. The diagram above shows a test run with a Catalyst 6000 as a Gateway at 10 concurrent EEM threads. There is an
approximate 20% rise in CPU usage during the heavy operations. This impact will be less when running on newer, higher power
devices, and more on low power, smaller devices. Remember that the EEM policies only run on collectors, not end devices, so on
average, only I in 10 devices will be impacted as the recommended collector to end-device ratio is 1: 10. Since the Embedded Collector is sharing resources with devices whose primary function is to route packets, the EC
scripts check for average CPU usage prior to launching a discovery or inventory collection. If the CPU usage is over 70%, the
Embedded Collector will not start, and a syslog message is generated stating the router was too busy.
If the CPU usage rises during the collection, the Embedded Collector takes advantage of the fact the EEM processes run at a lower priority than routing processes which are running at higher priority. This ensures routing is not interrupted. The Embedded Collector time-outs are set in terms of minutes, so momentary interruptions in collection are acceptable.
In summary the performance of a Embedded Collector depends on device characteristics: type of CPU on the router,
amount of DRAM and the number of modules and cards it supports. In a scenario where a larger router is designated as Gateway
with a powerful CPU and in excess of 2GB of DRAM, would able to handle required number of threads that use heap (i. e.
DRAM) without any concerns. Additionally, the execution of its scripts to collect data is faster and these routers also have higher
capacity flash to store the collected data. It is recommended that for achieving a good performance, a higher end router of given family to be used as a Gateway or a Collector.
753
Seed File and Auto Discovery Clsco.com
10
IEEE 1M 2011, Cisco Systems
Collection starts with a seed file. The seed file is a comma separated value (CSV) ASCII flat file which contains IP
address, Hostname, and access information for each device that need to be collected.
The downloaded helper application which was mentioned earlier also acts as a seed file editor if the users wishes to enter devices manually, make corrections and change login credentials. The helper application can also import other formatted
seed files from other network management application. The first user-defmed field contains the IP address of the device that
is to collect its information. The second user-defmed field contains the IP address of the device that acts as its gateway. Any
record with the gateway field populated is identified as a Collector. The advantage of using a seed file is complete control over which devices are collected. The disadvantage is maintaining
the seed file, especially if security policies dictate changing credentials periodically - which leads us to auto-discovery. If auto-discovery is desired, an optional set ofEEM policies can be installed which are launched ahead of the scheduled
collection to update the seed file. These policies use the CDP (Cisco Discovery Protocol) or LLDP (Link Layer Discovery
Protocol) to discover neighbors, and use known login credentials to perform a prerequisite check of newly discovered
devices. Candidates are forwarded back to the master gateway to be aggregated, and the master seed file updated. Once
discovery is complete, the collection is launched as usual.
754
Scale - Nested and Segmentation
Gateway A
GatewayB
Nested Segmented .,
11
IEEE 1M 2011. Cisco Systems
The major limitation to scale this solution is free local storage on the smart devices. Many customers have back up
images, old core dumps and other files on local storage, or have upgraded the Cisco lOS image over the years without corresponding upgrades to storage. Since the EEM policies must use Cisco lOS to archive the data, it is limited to "tar" command
in lOS, which does not allow for compression, only archiving. Also, the tar command itself must have free storage available for
temporary files during the archival process.
In order to scale to larger networks, the Embedded Collector can be architected such that any device with EEM feature
can act as a Gateway, Collector, or both. This allows Collectors and Gateways to be nested. The scenario is depicted on the left
side of this slide. Currently all collections for a given inventory must pass through the registered gateway and the inventory aggregated and sent to the back end as a single archive. In this case inventory archives are numbered so partial inventories can be
reassembled at the backend to alleviate local storage requirements
As depicted on the right side of this slide, networks can be also segmented to handle larger number of devices. Each
segment can have its own Gateway and be treated as a min-network. Each segment can be collected at different time, and the
backend application will assemble the collections together. Distributing the collection schedule is even more critical using the nested architecture, as the storage bottlenecks are the Gateways. Each collector has a retry mechanism in the case where Gateway
storage has been exhausted.
In either case, intelligence will be required on the destination application where the collection is shipped to for
aggregating the reports.
755
Distributed Computing and Error Recovery
ISCD.com
End Devices
Applications
IEEE 1M 2011, Cisco Systems
End Devices
End Devices
End devices for Collector 2 are
redistributed to Collector 3
12
Distributed computing refers to multiple autonomous computers that communicate through a network. According to the
Wikipedia, the definition of a distributed system is as follows: "A distributed system may have a common goal, such as solving a large computational problem. Alternatively, each computer may have its own user with individual needs, and the
purpose of the distributed system is to coordinate the use of shared resources or provide communication services to the
users."
To compare this approach with the distributed computing, following exam the mechanisms used to make up a distributed
computing system
1. No single point of failure: In this approach, the central commander is the Gateway, which commands to the collectors to start communicate to the end devices that were assigned to it to start the collections, and each collector start collecting
concurrently. The end device list is passed along with the start collection command to each collector. To achieve the
requirement where the system can tolerate failures in individual computers, when the Gateway device detects that a collector
cannot response back with the start collection command, or incapable or doing the collection, either due to device busy,
software or hardware failure, or scheduled maintenance downtime, the Gateway device can redistribute the collection task to other collectors. In the graph in the slide, Collector 2 is not able to start the collection, so the end devices belong to collector 2
will be redistributed to collector 3. Furthermore, when the application detects the scheduled collection failure, the application
can start a backup gateway to perform the gateway processing.
2. The structure of the network topology, network latency and number of computers is not known in advance. Using
auto discovery approach, there is no knowing of how many devices will be discovered in the network, neither the network
topology nor the network latency. The devices are connected thru internet and the collectors can be chosen randomly as long as the collector criteria can be met. The end devices can be assigned to the collector using the distance to the collector, i.e. it
is prefer to assign the end devices to a nearby collector, but this is not mandate, as long as the network connectivity exist
between the collector and the end devices, they can be collected. This implies that the network latency is also unknown.
3. Each collector only knows the end devices it needs to collect and the gateway devices to send the collection to, each
device does not know how many of the other collectors are there or how many total devices the system is trying to collect.
Each end devices only know the collector that is talking to it passively. So each computer only has a limited, incomplete view of the system.
756
Benefit
!::::::R=e=q::u= i= r=e=m=e="=t= =:::! Strategy No appliance needed Time to Market
Low Touch
Security
Performance
Data Driven collection
Auto collection
Scale
Embedded Scripts on the devices for the collection, no external appliance required.
The scripts solution does not require lOS code changes, quicker time to market.
No lOS upgrade needed, easier configuration and utility to allow customers to adapt to this approach. Low support cost.
Secure transfer of the collection from customer site to Application Backend. Collection is encrypted
Efficiently collect network devices, Low impact to the network devices on high end routers. Distributed collection approach
Profile based collection, tailor the collection to different devices data based
The collection can be scheduled and run automatically
Can be easily scale to larger network
IEEE 1M 2011. Cisco Systems
There are several benefits to this approach in comparison to other applications:
13
1) Most significant advantage of this approach is that there is no need for an external appliance, which is a major cost saving to the consumers. This mechanism is highly distributed thus no single point of failure. It is more cost-efficient to obtain the desired
level of performance by using several network devices or low end devices, in comparison with a single high-end appliance. It
satisfies the green criterion by virtue of not requiring an external appliance for data collection. Moreover, a distributed system
may be easier to expand and manage than a single appliance. 2) The script based solution is quicker to develop than developing
the feature in the operating system itself. The customer is usually reluctant to upgrade the image, if not totally resist it. Most customers are unwilling to upgrade their OS because it involves huge amount of testing. This approach will shorten the time to
market, in both development time and the time for customer to adopt. The short time to market benefit can also apply to upgrading. 3) The configuration and installation will be done by a single installation script with very few configuration
commands. It is a low touch model that does not require system engineers to visit onsite for the installation -- unlike the appliance
model, where a sales engineer typically has to travel on site to help with the installation and the initial collection. The scripts can
be downloaded from a central company site. With some instruction, customers will be able to install the scripts on their devices, and set up the scheduled collection. 4) Security of the data is of high concern from customers. To ensure the safety of their data,
the file transfer from the Gateway devices to the backend application is being taken care of by using secure protocols like SCP
and HTTPS. Also the data is encrypted before the transmission and TCL scripts are delivered in byte code format and signed to avoid tampering of the scripts either intentionally or unintentionally. 5) The collection is done by utilizing the computing power
on the network devices and the collection can be done in parallel, by concurrently collecting thru Gateway devices to the collectors and the collectors to the end devices. SNMP bulk collection is used and a profile approach to tailor the collected information accordingly to different device type is used to minimize the time. The performance impact to the devices is capped at less than 10% of CPU and memory usage. 6) The collection can be profiled to collect only the information pertinent to the type of
the devices to avoid collecting unneeded information, using a data driven approach. This data driven mechanism can be utilized for different applications that can use the collection, like monitoring, inventory or network management applications. 7) The approach can be started automatically and periodically. 8) The approach can be scaled for larger network by using mUltiple
gateways or using segmented collection approach. In contrast to external device based collection solution, the EC based collection technique not only works in conjunction with native OS (e.g. lOS) but efficiently uses device resource without
impacting its core functions such as routing, for example. The Embedded data collection solution has a good scope of scaling as a result of distributed nature design. However, the external device collector is likely to require memory and possibly CPU upgrade
to handle increase in network size. One of the challenges of the scaled EC design is that it has to have an in-built ability to handle new generation devices that become part of the network as it evolves.
757
Status and Future Work
Clsco.com
Expansion
14
IEEE 1M 2011, Cisco Systems
The Embedded Collector has been released to the field for market trial, the target users are small to medium size
customers with less than 100 devices. A lot of features can be derived from this approach.
Future enhancement includes: •
•
•
•
•
•
•
Scale to Larger network: the solution can be scaled to collect larger number of devices, using multiple Gateways or collect different segment of the network at one time and then assemble the collection together at the backend.
Auto discovery of the devices can utilize Cisco Discovery protocol, BGP or OSPF routing tables to discover all devices in the customer network. Or customers can specify number of hub that the discovery is limited to.
The profile that the collector collects can be a data driven approach not only limited to the developers, but also open up for the customers to change the profile of collection to fit their needs, by changing the meta data file of the
collections, for example, changing the MIBs name, and CLI commands. Once the devices are discovered, automatically assign the collectors to end devices to auto group them.
Further performance enhancement, currently to not impact much of the CPU and memory usage on the gateway and collector network device, the EEM and TCL is running at a low priority, Polling of the devices may be limited to xx
number/minutes .. Those can be made to configurable or intelligent adjust the polling interval so that the collection
time can be reduced.
The collection can be used to collect diagnostic information, for example certain log and error information. Logging information can be filtered on the devices and collected on demand or during scheduled collection.
Current release support Cisco applications, it can be integrated with other network management application or inventor or asset management applications.
The current release supports only the devices that respond to lOS CLI and SNMP commands. IT can be enhanced to support other OS, like UCS devices, Linux, or CatOS. Or support other 3rd party devices
758
Clsco.com
THANK YOU!
15
IEEE 1M 2011, Cisco Systems
Acknowledgement:
We would like to acknowledge Raja Banerjee, Subrata Dasgupta, Tim Johnson, Jim McDonnell, Sureshbabu Nagarathinam,
Chocks Ramiah, Ammar Rayes and Alex Truong for their contribution to the project and to the concepts and ideas presented in this paper. The Cisco EEM team for providing the device as programmability.
References:
[1] Ghosh, Sukumar (2007), Distributed Systems - An Algorithmic Approach, Chapman & Hall/CRC,
ISBN 978-1-58488-564-1.
[2] Lynch, Nancy A. (1996), Distributed Algorithms, Morgan Kaufmann, ISBN 1-55860-348-4. [3] Peleg, David (2000), Distributed Computing: A Locality-Sensitive Approach, SIAM,
ISBN 0-89871-464-8.
[4] lOS 12.4T User Guide: http://www.cisco.com/eniUS/docs/iosI12 _ 4t/netrngmt/configuration/guide/sign _tcl.htrnl, last accessed
1I311201l.
[5] Cisco Systems, Inc., Signed Tel Scripts,
http://www.cisco.com/eniUS/docs/iosI12 _ 4t/netrngmt/configurationlguide/sign _tcl.htrnl, last accessed 113112011. [6] Welch, Brent B. (2000), Practical Programming in Tel and Tk, Prentice Hall PTR, ISBN 0-13-022028-0.
U.S. Patent pending: System and Methodfor Providing a Script-Based Collection for Devices in a Network Environment, U.S.
Application Serial No. 12/848,146.
759