Overview of a scalable Network Monitoring Architecture
Transcript of Overview of a scalable Network Monitoring Architecture
Overview of a scalable Network MonitoringArchitecture
Providing communication infrastructure characteristics to grid−aware applications
Augusto CiuffolettiDipartimento di Informatica − Univ. di Pisa
next: The environment
index
Overview of a scalable Network Monitoring Architecture
Overview of a scalable Network Monitoring Architecture 1
The environment
A Grid is a network of edge services, seamlessly accessible by authorized users inorder to perform distributed computations.
Users optimize their operations by selecting edge services that fit two basic criteria:
offer the appropriate level of service,• are able to perform efficiently a distributed computation.•
The application of the second criterion is based upon the knowledge of the potentialperformance of the communication infrastructure that supports the distributedcomputation.
next: GGF Grid Monitoring Architecture
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
The environment 2
GGF Grid Monitoring Architecture
controls a complex monitoring system;• is fault tolerant;• minimizes overhead (over monitored resource):• makes available the results of the monitoring activities, also implementingsecurity;
•
integrates different monitoring tools;• scales well with system size and probe frequency.•
next: An ancestor: the Network Weather Service
previous: The environment
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
GGF Grid Monitoring Architecture 3
An ancestor: the Network Weather Service
nws.jpg
The components of a NWS resource monitoring system:
a nameserver, a centralized controller, that keeps a registry of all componentsand monitoring activities;
•
sensors, that produce resource observations;• memories, that store resource observations;• forecasters, that process resource observations;•
Limits of the NWS architecture, bound to the presence of the nameserver:
a communication bottleneck;• a single point of failure;• no security;•
next: From NWS to LDAP
previous: GGF Grid Monitoring Architecture
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
An ancestor: the Network Weather Service 4
From NWS to LDAP
One limit of NWS is centralized control: in order to obtain data, one has to address(possibly indirectly) the nameserver, specifiying the id of the sensor.
NWS was an "all inclusive" architecture, more a proof of concept than a realapplication. It was not applicable to a grid scale, mostly for the absence of a realdatabase.
The distributed nature of the LDAP architecture is appealing from this point of view:information is organized hierarchically, and the resulting tree (might be) distributedover different servers.
In a Grid perspective, the hierarchy often reflects the organization of the Grid: the firstlevel under the root is composed of continental networks, below are national networksand next local networks. Leaves are single resources, like clusters of computers, singlecomputers or storage elements.
next: Globus and LDAP
previous: An ancestor: the Network Weather Service
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
From NWS to LDAP 5
Globus and LDAP
The well known Globus toolkit is based on LDAP.
However, the LDAP architecture is appropriate to store static information, like thenumber of processors in a cluster, or the size of a disk partition.
When data are more volatile, the LDAP architecture is the wrong choice: it isunsuitable to support frequent write operations.
Examples of characteristics that induce frequent writes:
available computing power on a computing resource (e.g. number of idle nodesin a cluster);
•
percent of free space in a partition;• average roundtrip time on a link•
In all these cases the LDAP architecture suffers serious performance and scalabilityproblems.
next: From Globus LDAP to R−GMA SQL
previous: From NWS to LDAP
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
Globus and LDAP 6
From Globus LDAP to R−GMA SQL
SQL provides a relational database. There are several excellent implementations ofthis database, that are designed for extremely demanding applications.
They offer a good compromise between the distributedness, and the cost of queryoperations.
In particular the database can be replicated in order to improve scalability and faulttolerance, as long as the number of queries operations dominates the number ofwrites.
The R−GMA architecture was developed to address this problem: the scalability of thearchitecture is further improved introducing components that combine data from thedatabase and cache the results (similar to NWS forecasters).
next: Scalability of an end−to−end architecture
previous: Globus and LDAP
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
From Globus LDAP to R−GMA SQL 7
Scalability of an end−to−end architecture
When applied to network monitoring, the above architectures overlook a problem,which becomes evident when Grid size scales up: the number of network resources,intended as paths interconnecting edge resources, grows with the square of Grid size.
The case of resources that are bound to a node is quite different from the case ofresources that represent communication between nodes: the former grow with thenumber of nodes, the latter with its square.
As a consequence, representing communication resources using end−to−endcharacteristics may limit the scalability of a monitoring architecture.
In addition, availability of end−to−end measurements implies that:
nodes that host Grid resources also support network monitoring protocols and• each pair of nodes hosting Grid resources should generate network monitoringtraffic, with an increment of network load that is a square of the nodes.
•
next: A user perspective
previous: From Globus LDAP to R−GMA SQL
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
Scalability of an end−to−end architecture 8
A user perspective
This fact raises a number of problems:
the size of a database containing network service characteristics grows rapidly;• the characteristics hardly track a dynamic infrastructure (routes may change);• each edge service becomes eligible as an active network monitoring node(which consumes local and network resources).
•
On the other hand, a network service that does not reflect an end−to−end path is oflittle help for the user that wants to know, for instance, how fast data will flow from astorage to a computing facility.
In order to bring this problem to a manageable size, we need to take into account apecularity of a Grid:
groups of resident services are controlled by distinct administrations, which managethe infrastructure among local services and commit the communication infrastructurewith the rest of the Grid to third parties.
next: A rationale behind Grid partitioning
previous: Scalability of an end−to−end architecture
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
A user perspective 9
A rationale behind Grid partitioning
groups of resident services are controlled by distinct administrations, which managethe infrastructure among local services and commit the communication infrastructurewith the rest of the Grid to third parties. networkservice
The above point authorizes a hierarchical view of a Grid, as composed of domainsinterconnected by an inter−domain infrastructure.
However, there is a good reason for not extending the hierarchical decompositionbeyond the first level, thus preventing from introducing sub−domains, and further.
Our point is that network service characteristics are difficult to compose: for instancefrom the fact that l(ab) is the loss rate from a to b, and l(bc) is the loss rate from b to c,one can hardly guess which is the value of l(ac).
In our view, decomposition is useful only if one is able to guess which of thecomponents dominates the others when determining a characteristic of the overallnetwork service.
next: Basic issues about Grid partitioning
previous: A user perspective
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
A rationale behind Grid partitioning 10
Basic issues about Grid partitioning
groups of resident services are controlled by distinct administrations, which managethe infrastructure among local services and commit the communication infrastructurewith the rest of the Grid to third parties.
There are two distinct scenarios that need to be considered:
if the inter−domain infrastructure is based on a classical packet switchingtechnology, it is exposed to be the bottleneck of an edge to edge path;
This conclusion comes from the consideration that single administrations try toexploit as much as possible their share of an expensive infrastructure: there islittle sense in a community that leases a 1Gbps long−haul interconnection,while the internal connectivity is based on a 100 Mbps infrastructure.
In a "store and forward" Grid where traffic is reasonably engineered,characteristics should be mainly determined by inter−domain fabric.
•
if the inter−domain infrastructure is based on an optical technology, it isexpected not to be the bottleneck of an edge to edge path.
In that case the inter−domain infrastructure is over−dimensioned: theinter−domain infrastructure is less expensive than the intra domain one.
The economic point of view is opposite with respsct to the previous one: theconsequence is that the bottleneck will be probably within the domain.
This reduces the task of monitoring the overall NxN grid to monitoring Ndomains.
•
We conclude that, in both scenarios, having recognized the "special role" played bythe inter−domain infrastructure helps the task of monitoring the netwrokinfrastructure.
This is also consistent with the architectural foundations of the DiffServ architecture.
next: A network of concepts
previous: A rationale behind Grid partitioning
Overview of a scalable Network Monitoring Architecture
Basic issues about Grid partitioning 11
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
Basic issues about Grid partitioning 12
A network of concepts
We have defined three concepts:
edge service: a Grid service provided by a host or cluster;• domain: a set in a partition of the edge services;• network service: a Grid service provided by the communication infrastructureto two domains.
•
We need two more key concepts in order to complete our model: theodolite serviceand multihome.
A theodolite service is an instance of edge service, which consists in representing anentry point of a certain domain. For instance:
in a best effort based Grid, it may consist in a node that hosts monitoring tools,and is a target for theodolites of other domains;
•
in a differentiated services based Grid, it may implement what RFC 2475 callsan ingress or egress point.
•
A multihome is an entity that collects several aliases of the same edge service, whenthey must be included into different domains in order to comply with the basicrequirement (an example follows).
UML
next: A simple example
previous: Basic issues about Grid partitioning
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
A network of concepts 13
A simple example
More complex situations:
more theodolites to reflect distinct network service supports;• more network services to reflect distinct classes of service.•
next: Multihomed storage
previous: A network of concepts
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
A simple example 14
Multihomed storage
In the example, storage S1 will be mapped into a multihome with two members: one inthe same domain as Ca, and the other in the same domain as Cb.
next: GlueDomains: concepts@work
previous: A simple example
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
Multihomed storage 15
GlueDomains: concepts@work
Gluedomains is a prototype implementation of the concepts introduced so far: it isincluded in the official Italian release of LCG 6.0.
We envision a best effort only Grid, where theodolites run active network monitoringtools.
Globus MDS is used to make network service characteristics available to users(typically, grid aware applications or Grid monitoring tools like GridICE).
architecture
The overall architecture is split in 4 modules (disregarding monitoring tools):
the GlueDomains database: used to map services to domains and to storetheodolite descriptions;
•
the GlueDomains theodolite management: which controls the activity oftheodolite services;
•
the GMA plugin: which is in charge of transferring observations from themonitoring tools to the GMA back end;
•
the GMA back−end: which submits the GMA plugin output to the GMArepository.
•
next: GlueDomains database
previous: Multihomed storage
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
GlueDomains: concepts@work 16
GlueDomains database
The GlueDomains database implementation is presently centralized, which limits thescalability of the whole structure to a few tens of domains (based on a prospectivesimulation).
However the present solution exhibits several interesting features:
a monitoring host autonomously downloads the description of its activity fromthe GlueDomains server: once the gluedomains daemon is started, it detectsnetwork interfaces and configures the monitoring sessions without humanintervention;
•
a monitoring host keeps its activity synchronized with the content of thedatabase;
•
the inability to carry out a certain monitoring task is harmless, as well as theunreachability of the centralized database.
•
the update of the database is semi−automatic: a script helps the compilation ofits content, starting from XML descriptions of the activity of each monitoringhost. In practice this means that the human interface is O(n).
•
We plan to shift to a fully distributed database, cached on each monitoring host, usinga gossip based technique.
In this frame, the MySQL technology will be probably dropped: a new release, not yetdistributed, implements a SOAP interface to the database.
next: Theodolite management
previous: GlueDomains: concepts@work
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
GlueDomains database 17
Theodolite management
Theodolite management is structured as a hierarchy of processes, rooted in theGlueDomains daemon.
The GlueDomains daemon spawns theodolite processes, each in charge of controllingthe monitoring activity over an interface.
A binding interface−theodolite has been introduced, which is not implicit in theconceptual framework.
Each theodolite periodically checks its activity against database content, in order tosynchronize its activity.
Each theodolite spawns session controllers, each controlling a specific monitoringsession.
In case the session controller terminates, the theodolite re−spawns it, after reloadingthe description from the database and waiting an exponentially growing, randomlybiased delay.
The session controller invokes a wrapper of the appropriate monitoring tool, whichparses and redirects the output of the tool to the GMA plugin
next: Monitoring tools
previous: GlueDomains database
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
Theodolite management 18
Monitoring tools
To improve modularity, each monitoring tool is packaged in a separate RPM, whichincludes the appropriate GlueDomains wrap.
The prototype currently provides three monitoring tools:
a basic ping facility, built using the perl library Net::Ping.It is introduced for basic testing purposes, and is used to checkconnectivity.
♦
Its operation does not depend on neighbor configuration.♦
•
a more sophisticated delay measurement tool, which provides one−way jitterestimates using a convex hull algorithm.
It provides useful information, especially for data transfer operations.♦ Its operation depends on neighbor configuration.♦
•
the well known iperf facility, that measures available bandwidth by saturatingthe line.
It provides low level information, at a high cost.♦ Its operation depends on neighbor configuration, and a single shot can berequested by authorized users only.
♦
•
next: Prototype set up and deployment
previous: Theodolite management
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
Monitoring tools 19
Prototype set up and deployment
Currently, GlueDomains in deployed on about ten hosts, and is included in theGridICE Grid Management project.
The site in Bologna supports the GlueDomains database server, and tests areperformed in a complete mesh.
The first official release came after (only) four testing releases, with no substantialrewriting required so far: mainly each new test release introduced new features.
The architecture has proven to be reasonably GMA−independent: from 1st to 2ndtesting release we replaced a R−GMA plugin with a MDS oriented one, whichpublishes through the GridICE GMA.
The current distribution is RPM oriented and self−installing: to install a newmonitoring host, the local administrator simply installes the required packages, anduntars an archive containing a few sensible items, like database passwords.
map
next: Security issues in GlueDomains
previous: Monitoring tools
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
Prototype set up and deployment 20
Security issues in GlueDomains
GlueDomains is aware of security problems, but provides a very basic security:
read access to the database is protected by a password, shared by all monitoringhosts;
•
write access is protected by another password;• the tools use passwords to ensure identity of the partners, and capabilities toinvoke on demand sessions;
•
data transfer between the Information Provider and the MDS is assumed tooccur within a protected environment, and is not encrypted.
•
However, we provided "handles" to implement more rigorous security policies, andmore is on the way.
next: Future work (CoreGRID)
previous: Prototype set up and deployment
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
Security issues in GlueDomains 21
Future work (CoreGRID)
The experience with GlueDomains continues within the European CoreGRID project.
Within this frame, we are re−designing GlueDomains in order to keep into accoutissues that were set aside during the design of the first prototype:
full awareness of scalability problems;
the scalability premises of the basic idea were not fully implemented in the firstprototype (a centralized database was used to keep configuration data);
ping like, full mesh sessions do not scale (traffic grows with the square);
•
full awareness of security issues:
password based security of the database is insufficient;
on demand sessions need identification tools.
•
next: Passive monitoring
previous: Security issues in GlueDomains
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
Future work (CoreGRID) 22
Passive monitoring
Passive monitoring eliminates the problem of polynomial growth of monitoringsessions (and consequentialy, monitoring traffic).
However, the problem is moved inside hosts that perform passive monitoring: thenumber of packets they examine grows with the square of the number of sites. This isprobably preferable.
Such hosts might use the content of the domain description database to infer whichtraffic is representative of the performance of the network infrastructure between twodomains.
In this view, active monitoring tools are helpful only when passive monitoring doesnot find useful traffic patterns.
In such event, hosts performing passive monitoring might invoke the execution ofactive monitoring sessions that will either produce observations, or induce significanttraffic patterns.
Such request should be authenticated, for security reasons.
next: Distributed database
previous: Future work (CoreGRID)
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
Passive monitoring 23
Distributed database
The database that describes the overlay domain topology is stored in a distributeddatabase.
Every Network Monitoring Element (the theodolite in the previous terminology)hosts a proxy of such abstract entity.
Such component keeps updated a local image of the database. Updates are propagatedusing an epidemic, peer−to−peer protocol.
The footprint of such protocol is kept under control by adaptively changing thenumber of tokens in the system, using a fully distributed feedback algorithm.
The simulation of such protocol in a system of 1000 NME gave encouraging results:
update frequency 1/5 1/secsoverall traffic 5 MB/secupdate latency 40 secs
with a setup which is appropriate for a classical packet switching environment.
The simulation proved that the algorithm is self−stabilizing under severe stress (forinstance, doubling instantaneously the number of NME
next: Layout of the NME
previous: Passive monitoring
up: A modular approach to Grid Connectivity Monitoring
index
Overview of a scalable Network Monitoring Architecture
Distributed database 24
Layout of the NME
The control plane is in charge of interactions with:user applications, that may request the execution of sessions and access R/W thetopology database;
•
the GIS, that publishes the results of the monitoring activity;• the Certification Authority to retrieve certificates.
Two low level modules are envisioned:
•
the network monitoring sensor which performs passive monitoring;• the module that implements the database proxy
next: Summary
previous: Distributed database
up: A modular approach to Grid Connectivity Monitoring
index
•
Overview of a scalable Network Monitoring Architecture
Layout of the NME 25
Summary
A modular approach to Grid Connectivity Monitoring
The environment• GGF Grid Monitoring Architecture• An ancestor: the Network Weather Service• From NWS to LDAP• Globus and LDAP• From Globus LDAP to R−GMA SQL• Scalability of an end−to−end architecture• A user perspective• A rationale behind Grid partitioning• Basic issues about Grid partitioning• A network of concepts• A simple example• Multihomed storage• GlueDomains: concepts@work• GlueDomains database• Theodolite management• Monitoring tools• Prototype set up and deployment• Security issues in GlueDomains• Future work (CoreGRID)• Passive monitoring• Distributed database• Layout of the NME•
•
Summary
previous: Layout of the NME
index
•
Overview of a scalable Network Monitoring Architecture
Summary 26