ATP Future Directions Availability of historical information for grid resources: It is necessary to...

1
ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources are not static. Presently, historical information is not being preserved. This frequently leads to misleading and inconsistent displays, and a lot of time and effort can be wasted. It is planned to provide historical information that will allow us to peep through the time window -past and present- for tracking the status of various grid resources. This is a vital feature for re-computing service availability and reliability numbers in Gridview. Programmatic interface for other tools: The monitoring tools frequently rely on a number of other databases for extracting required information and in most cases, direct connections to databases are established, leading to problems. The ATP will provide standard interfaces abstracting the underlying structure and allowing the retrieval of information in a structured way. Features and Present State New abstract entities like Project and Infrastructure Project : WLCG Infrastructures: EGEE, OSG Support for grouping of Infrastructures, VOs, sites, services User-specified site names in VOs Synchronization with different information providers A list of requirements was gathered from various sources e.g. Gridview, SAM, Dashboards, LHC experiments. The first prototype of the ATP is currently being tested, including the modules that aggregate, store and present the information in a standardized way. Conclusions: The ATP, acting as a single authoritative information aggregator, will simplify the job of assimilating grid resource information. The repository will also be extremely useful for tracking historical information of grid resources in the WLCG infrastructure. The ATP data will also be consumed by other high-level monitoring tools, improving their traceability, accuracy and reliability. The first prototype system is in the testing phase, and will be deployed soon in production for 24*7 use in the grid. References: WLCG: http://lcg.web.cern.ch/LCG/ OAT Strategy: https://edms.cern.ch/document/927171 GridView: https://twiki.cern.ch/twiki/bin/view/LCG/GridView SAM: https://twiki.cern.ch/twiki/bin/view/LCG/SamCern Topology Information Aggregation – Why, What and How? Today, various WLCG stake holders such as VOs, sites, and ROCs use different monitoring and reporting tools like Gridview, SAM, Experiment Dashboards, Gridmap etc. for monitoring and visualizing grid resources. These tools rely heavily on the topology of the grid resources, which mainly consists of : Sites and associated resources: Sites, Services, Nodes and their groupings High-level resources and user communities : Project, Infrastructure, VOs and their grouping Associations between the entities defined above: Project-VOs, VO- Sites The tools source their data from a number of information providers while they act mostly as information consumers. Although these tools do a great job in visualizing complex monitoring data of the Grid, the flow of information among different producers and consumers needs to be improved for enhanced reliability. In particular, as these tools exchange information about grid resources as needed, there are a number of crisscross database interconnections. The current scenario for various monitoring tools is shown in the upper figure. There is no single authoritative information source that can be queried, thus hampering the effectiveness of aggregation and consumption of data by the applications. The Aggregated Topology Provider (ATP) is a grid topology repository being developed for the WLCG in order to have a single authoritative source of topology information of grid resources and to streamline the flow of this data across various tools in the WLCG monitoring infrastructure. It gathers, aggregates and stores the topology information by periodically contacting all information providers. The aggregated information is then made available to various high-level monitoring tools. The ATP service will also track changes in the topology of grid resources by keeping history. Information Providers GOC DB: The EGEE authoritative source of topology information, such as: sites, nodes, services, etc. It is used by many monitoring and accounting tools. BDII: Provides information about which services are currently advertised in the Grid, plus VO mappings to them. CIC Operations Portal: Stores the WLCG VO cards’ information. OSG Interoperability Monitoring list: Provides information about the OSG resources, such as, sites, services that support WLCG project. Manual inputs: Information about federations and tiers from the WLCG project office. With ATP this will be automated by sourcing data from a proposed WLCG office portal. Information Consumers SAM (Service Availability Monitoring): a framework for the monitoring of production and pre- production grid sites using a set of probes at regular intervals. Gridview: a monitoring and visualization tool that provides a high level view of various functional aspects of WLCG like service status, availability, reliability, and statistics of data transfers, FTS file transfers, jobs running etc. Experiment Dashboards: these provide an overview of the distributed computing activities of the LHC experiments ATLAS, ALICE, CMS & LHCb. Gridmap: top-level grid services monitoring visualization tool Other user-specific applications: various applications developed at the user-end Various information providers and consumers of ATP data are shown in the drawing above. As seen, the introduction of the ATP in the grid monitoring infrastructure streamlines the flow of information across various monitoring tools, thereby improving data consistency and architectural robustness. Current flow of Grid topology data across various monitoring tools GOCDB OIM list CIC Operation s portal Gridview SAM Gridmap BDII Dashboards Manual inputs OSG r esou r ces EGE E re sourc e s services, VO mappings fe der ations, tiers, groups VO ca r ds in f o. Refined resource list GOCDB topology T opology Information CPU Coun t per Si te T o pol ogy I n for mat io n Top o lo gy infor ma ti on CP U C o u n t Topolo gy Inf ormati on Streamlined grid topology data flow using the ATP GOCDB OSG Resource s CIC Portal Other Tools SAM Dashboards BDII Gridview Gridmap ATP Informati on Consumers Informati on Providers Topology Repository : Information Aggregator WLCG Office Portal Aggregated Topology Provider (ATP): A Grid Topology Repository for the WLCG Rajesh Kalmady, Pradyumna Joshi, Digamber Sonvane, Phool Chand, Kislay Bhatt, Kumar Vaibhav, Vinod Boppanna - BARC James Casey, David Collados, John Shade - CERN

Transcript of ATP Future Directions Availability of historical information for grid resources: It is necessary to...

Page 1: ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.

ATP Future Directions

Availability of historical information for grid resources:It is necessary to store the history of grid resources as these resources are not static. Presently, historical information is not being preserved. This frequently leads to misleading and inconsistent displays, and a lot of time and effort can be wasted. It is planned to provide historical information that will allow us to peep through the time window -past and present- for tracking the status of various grid resources. This is a vital feature for re-computing service availability and reliability numbers in Gridview.

Programmatic interface for other tools:The monitoring tools frequently rely on a number of other databases for extracting required information and in most cases, direct connections to databases are established, leading to problems. The ATP will provide standard interfaces abstracting the underlying structure and allowing the retrieval of information in a structured way.

Features and Present State

New abstract entities like Project and Infrastructure Project : WLCG Infrastructures: EGEE, OSG

Support for grouping of Infrastructures, VOs, sites, servicesUser-specified site names in VOsSynchronization with different information providers A list of requirements was gathered from various sources e.g. Gridview, SAM, Dashboards, LHC experiments. The first prototype of the ATP is currently being tested, including the modules that aggregate, store and present the information in a standardized way.

Conclusions:

The ATP, acting as a single authoritative information aggregator, will simplify the job of assimilating grid resource information.The repository will also be extremely useful for tracking historical information of grid resources in the WLCG infrastructure.The ATP data will also be consumed by other high-level monitoring tools, improving their traceability, accuracy and reliability.The first prototype system is in the testing phase, and will be deployed soon in production for 24*7 use in the grid.

References:

WLCG: http://lcg.web.cern.ch/LCG/OAT Strategy: https://edms.cern.ch/document/927171GridView: https://twiki.cern.ch/twiki/bin/view/LCG/GridViewSAM: https://twiki.cern.ch/twiki/bin/view/LCG/SamCern

Topology Information Aggregation – Why, What and How?

Today, various WLCG stake holders such as VOs, sites, and ROCs use different monitoring and reporting tools like Gridview, SAM, Experiment Dashboards, Gridmap etc. for monitoring and visualizing grid resources. These tools rely heavily on the topology of the grid resources, which mainly consists of :

• Sites and associated resources: Sites, Services, Nodes and their groupings

• High-level resources and user communities : Project, Infrastructure, VOs and their grouping

• Associations between the entities defined above: Project-VOs, VO-Sites

The tools source their data from a number of information providers while they act mostly as information consumers. Although these tools do a great job in visualizing complex monitoring data of the Grid, the flow of information among different producers and consumers needs to be improved for enhanced reliability. In particular, as these tools exchange information about grid resources as needed, there are a number of crisscross database interconnections. The current scenario for various monitoring tools is shown in the upper figure. There is no single authoritative information source that can be queried, thus hampering the effectiveness of aggregation and consumption of data by the applications.

The Aggregated Topology Provider (ATP) is a grid topology repository being developed for the WLCG in order to have a single authoritative source of topology information of grid resources and to streamline the flow of this data across various tools in the WLCG monitoring infrastructure. It gathers, aggregates and stores the topology information by periodically contacting all information providers. The aggregated information is then made available to various high-level monitoring tools. The ATP service will also track changes in the topology of grid resources by keeping history.

Information ProvidersGOC DB:The EGEE authoritative source of topology information, such as: sites, nodes, services, etc. It is used by many monitoring and accounting tools. BDII:Provides information about which services are currently advertised in the Grid, plus VO mappings to them.CIC Operations Portal:Stores the WLCG VO cards’ information.OSG Interoperability Monitoring list:Provides information about the OSG resources, such as, sites, services that support WLCG project.Manual inputs:Information about federations and tiers from the WLCG project office. With ATP this will be automated by sourcing data from a proposed WLCG office portal.

Information ConsumersSAM (Service Availability Monitoring):a framework for the monitoring of production and pre-production grid sites using a set of probes at regular intervals.Gridview:a monitoring and visualization tool that provides a high level view of various functional aspects of WLCG like service status, availability, reliability, and statistics of data transfers, FTS file transfers, jobs running etc.Experiment Dashboards:these provide an overview of the distributed computing activities of the LHC experiments ATLAS, ALICE, CMS & LHCb.Gridmap:top-level grid services monitoring visualization toolOther user-specific applications: various applications developed at the user-end

Various information providers and consumers of ATP data are shown in the drawing above. As seen, the introduction of the ATP in the grid monitoring infrastructure streamlines the flow of information across various monitoring tools, thereby improving data consistency and architectural robustness.

Current flow of Grid topology data across various monitoring tools

GOCDB

OIM list

CIC Operations

portal

GridviewSAM

Gridmap

BDII

Dashboards

Manual inputs

OSG

resources

EGEE resources

services, VO mappings

federations, tiers,

groups

VO c

ards

info

.

Refined resource list

GOCDB topology

Top

olog

y Inf

orm

ation

CPU Count per S

ite

Topology Information

Topo

logy

info

rmat

ion

CPU

Count

Topology Inform

ation

Streamlined grid topology data flow using the ATP

GOCDBOSGResources

CIC Portal

Other ToolsSAM

Dashboards

BDII

Gridview Gridmap

ATP

Information Consumers

Information Providers

Topology Repository : Information Aggregator

WLCGOffice Portal

Aggregated Topology Provider (ATP): A Grid Topology Repository for the WLCG

Rajesh Kalmady, Pradyumna Joshi, Digamber Sonvane, Phool Chand, Kislay Bhatt, Kumar Vaibhav, Vinod Boppanna - BARCJames Casey, David Collados, John Shade - CERN