M.Biasotto, CERN, 5 november 2001 1 Fabric Management Massimo Biasotto, Enrico Ferro – INFN LNL.
-
Upload
loreen-mckinney -
Category
Documents
-
view
218 -
download
0
Transcript of M.Biasotto, CERN, 5 november 2001 1 Fabric Management Massimo Biasotto, Enrico Ferro – INFN LNL.
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 11
Fabric ManagementFabric Management
Massimo Biasotto, Enrico Ferro – INFN LNLMassimo Biasotto, Enrico Ferro – INFN LNL
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 22
Legnaro CMS Farm LayoutLegnaro CMS Farm Layout
FFastastEEthth
32 – GigaEth 1000 BT32 – GigaEth 1000 BT
SWITCHSWITCH
N1N1FFastastEEthth
SWITCHSWITCH
11 88
S1S1 S16S16
NN2424 N1N1 NN2424
Nx – Computational NodeNx – Computational NodeDual PIII – 1 GHzDual PIII – 1 GHz512 MB512 MB3x75 GB Eide disk + 1x20 GB for O.S.3x75 GB Eide disk + 1x20 GB for O.S.
Sx – Disk Server NodeSx – Disk Server NodeDual PIII – 1 GHzDual PIII – 1 GHzDual PCI (33/32 – 66/64 512 MBDual PCI (33/32 – 66/64 512 MB4x75 GB Eide Raid disks (exp up to 10) 4x75 GB Eide Raid disks (exp up to 10) 1x20 GB disk O.S.1x20 GB disk O.S.
FFastastEEthth
SWITCHSWITCH
N1N1 22 NN24242001200140 Nodes40 Nodes4000 SI954000 SI959 TB9 TB
2001-2-32001-2-3up to 190up to 190NodesNodes
S11S112001200111 Servers11 Servers3.3 TB3.3 TB
To WANTo WAN34 Mbps 200134 Mbps 2001155 Mbps 2002155 Mbps 2002
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 33
DatagridDatagrid
Project structured in many “Work Packages”:Project structured in many “Work Packages”:– WP1: Workload ManagementWP1: Workload Management– WP2: Data ManagementWP2: Data Management– WP3: Monitoring ServicesWP3: Monitoring Services– WP4: Fabric ManagementWP4: Fabric Management– WP5: Mass Storage ManagementWP5: Mass Storage Management– WP6: TestbedWP6: Testbed– WP7: NetworkWP7: Network– WP8-10: ApplicationsWP8-10: Applications
3 year project (2001-2003).3 year project (2001-2003). Milestones: month 9 (Sept 2001), month 21 (Sept 2002), Milestones: month 9 (Sept 2001), month 21 (Sept 2002),
month 33 (Sept 2003)month 33 (Sept 2003)
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 44
OverviewOverview
Datagrid WP4 (Fabric Management) overviewDatagrid WP4 (Fabric Management) overview WP4 software architectureWP4 software architecture WP4 subsystems and componentsWP4 subsystems and components Installation and software managementInstallation and software management Current prototype: LCFGCurrent prototype: LCFG LCFG architectureLCFG architecture LCFG configuration and examplesLCFG configuration and examples
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 55
WP4 overviewWP4 overview
Partners: CERN, INFN (Italy), KIP (Germany), Partners: CERN, INFN (Italy), KIP (Germany), NIKHEF (Holland), PPARC (UK), ZIB (Germany)NIKHEF (Holland), PPARC (UK), ZIB (Germany)
WP4 website:WP4 website:http://hep-proj-grid-fabric.web.cern.ch/hep-proj-http://hep-proj-grid-fabric.web.cern.ch/hep-proj-grid-fabric/grid-fabric/
Aims to deliver a computing fabric comprised of Aims to deliver a computing fabric comprised of all the necessary tools to manage a centre all the necessary tools to manage a centre providing Grid services on clusters of thousands providing Grid services on clusters of thousands of nodesof nodes
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 66
WP4 structureWP4 structure
WP activity divided in 6 main ‘tasks’WP activity divided in 6 main ‘tasks’– Configuration management (CERN + PPARC)Configuration management (CERN + PPARC)– Resource management (ZIB)Resource management (ZIB)– Installation & node management (CERN + INFN + Installation & node management (CERN + INFN +
PPARC)PPARC)– Monitoring (CERN + INFN)Monitoring (CERN + INFN)– Fault tolerance (KIP)Fault tolerance (KIP)– Gridification (NIKHEF)Gridification (NIKHEF)
Overall WP4 functionality structured into units called Overall WP4 functionality structured into units called ‘subsystems’, corresponding to the above tasks‘subsystems’, corresponding to the above tasks
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 77
Architecture overviewArchitecture overview
WP4 architectural design document (draft):WP4 architectural design document (draft):– http://hep-proj-grid-fabric.web.cern.ch/hep-proj-grid-fabric/http://hep-proj-grid-fabric.web.cern.ch/hep-proj-grid-fabric/
architecture/eu/default.htmarchitecture/eu/default.htm
Still work in progress: open issues that need further Still work in progress: open issues that need further investigationinvestigation
Functionalities classified into two main categories:Functionalities classified into two main categories:– User job control and managementUser job control and management
handled by Gridification and Resource Management handled by Gridification and Resource Management subsystemssubsystems
– Automated system administrationAutomated system administration handled by Configuration Mgmt, Installation Mgmt, handled by Configuration Mgmt, Installation Mgmt,
Fabric Monitoring and Fault Tolerance subsystemsFabric Monitoring and Fault Tolerance subsystems
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 88
Farm A (LSF) Farm B (PBS)
Grid User
(Mass storage,Disk pools)
Local User
Installation &Node Mgmt
ConfigurationManagement
Monitoring &Fault Tolerance
FabricGridification
ResourceManagement
Grid InfoServices(WP3)
WP4 subsystems
Other Wps
ResourceBroker(WP1)
Data Mgmt(WP2)
Grid DataStorage(WP5)
Architecture overviewArchitecture overview
- Interface between Grid-wide services and local fabric;
- Provides local authentication, authorization and mapping of grid credentials.
- Interface between Grid-wide services and local fabric;
- Provides local authentication, authorization and mapping of grid credentials.
- provides transparent access to different cluster batch systems;
- enhanced capabilities (extended scheduling policies, advanced reservation, local accounting).
- provides transparent access to different cluster batch systems;
- enhanced capabilities (extended scheduling policies, advanced reservation, local accounting).
- provides a central storage and management of all fabric configuration information;
- central DB and set of protocols and APIs to store and retrieve information.
- provides a central storage and management of all fabric configuration information;
- central DB and set of protocols and APIs to store and retrieve information.
- provides the tools to install and manage all software running on the fabric nodes;
- bootstrap services; software repositories; Node Management to install, upgrade, remove and configure software packages on the nodes.
- provides the tools to install and manage all software running on the fabric nodes;
- bootstrap services; software repositories; Node Management to install, upgrade, remove and configure software packages on the nodes.
- provides the tools for gathering and storing performance, functional and environmental changes for all fabric elements;
- central measurement repository provides health and status view of services and resources;
- fault tolerance correlation engines detect failures and trigger recovery actions.
- provides the tools for gathering and storing performance, functional and environmental changes for all fabric elements;
- central measurement repository provides health and status view of services and resources;
- fault tolerance correlation engines detect failures and trigger recovery actions.
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 99
Resource Management diagramResource Management diagram
Stores static and dynamic information describing the states of the RMS and its managed resources
Stores static and dynamic information describing the states of the RMS and its managed resources
Accepts job requests, verifies credentials and schedules the jobs
Accepts job requests, verifies credentials and schedules the jobs
Assigns resources to incoming job requests, enhancing fabric batch systems capabilities (better load balancing, adapts to resource failures, considers maintenance tasks)
Assigns resources to incoming job requests, enhancing fabric batch systems capabilities (better load balancing, adapts to resource failures, considers maintenance tasks)
proxies provide uniform interface to underlying batch systems (LSF, Condor, PBS)
proxies provide uniform interface to underlying batch systems (LSF, Condor, PBS)
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 1010
Monitoring & Fault Tolerance diagramMonitoring & Fault Tolerance diagram
Local node
MSA
MRcache
FTA
FTCE AD
MSMS
MS
Central repository
Data Base
MR server
Service master node
FTCE
Control flowData flow
Human operator host
MUI
MS
Monitoring Sensor Agent - collects data from Monitoring Sensors and forwards them to the Measurement Repository
Monitoring Sensor Agent - collects data from Monitoring Sensors and forwards them to the Measurement Repository
Measurement Repository - stores timestamped information; it consists of local caches on the nodes and a central repository server
Measurement Repository - stores timestamped information; it consists of local caches on the nodes and a central repository server
Monitoring User Interface - graphical interface to the Measurement Repository
Monitoring User Interface - graphical interface to the Measurement Repository
Monitoring Sensor - performs measurement of one or several metrics;
Monitoring Sensor - performs measurement of one or several metrics;
Fault Tolerance Correlation Engine - processes measurements of metrics stored in MR to detect failures and possibly decide recovery actions
Fault Tolerance Correlation Engine - processes measurements of metrics stored in MR to detect failures and possibly decide recovery actions
Actuator Dispatcher - used by FTCE to dispatch Fault Tolerance Actuators; it consists of an agent controlling all actuators on a local node
Actuator Dispatcher - used by FTCE to dispatch Fault Tolerance Actuators; it consists of an agent controlling all actuators on a local node
Fault Tolerance Actuator - executes automatic recovery actions
Fault Tolerance Actuator - executes automatic recovery actions
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 1111
Configuration Management diagramConfiguration Management diagram
High Level
Description
High Level
DescriptionLow Level
Description
Low Level
Description
Cache
Configuration
Manager
Cache
Configuration
Manager
Local
Process
Local
Process
Configuration
Database
A
P
I
Client Node
Configuration Database: stores configuration information and manages modification and retrieval access
Cache Configuration Manager: downloads node profiles from CDBand stores them locally
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 1212
Configuration DataBaseConfiguration DataBase
All computing nodes of CMS Farm #3 use
cmsserver1 as Application Server
All computing nodes of CMS Farm #3 use
cmsserver1 as Application Server
cmsserver1 /etc/exports
/app cmsnode1, cmsnode2, ..
cmsserver1 /etc/exports
/app cmsnode1, cmsnode2, ..
cmsnode3 /etc/fstab
cmsserver1:/app /app nfs..
cmsnode3 /etc/fstab
cmsserver1:/app /app nfs..cmsnode2 /etc/fstab
cmsserver1:/app /app nfs..
cmsnode2 /etc/fstab
cmsserver1:/app /app nfs..cmsnode1 /etc/fstab
cmsserver1:/app /app nfs..
cmsnode1 /etc/fstab
cmsserver1:/app /app nfs..
High LevelDescription
Low LevelDescription
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 1313
Installation Management diagramInstallation Management diagram
ActuatorDispatcher
Monitoring& Fault Tol.
NMASP’s(local)
SP’s
Configuration Management Subsystem
Administrative Scripting Layer Applications
Systemimages
SR BS
Fab
ric No
de
Data Flow: Configuration, SP’s, system images. monitoring
Control Flow: function calls
Node Management Agent - manages installation, upgrade, removal and configuration of software packages
Node Management Agent - manages installation, upgrade, removal and configuration of software packages
Software Repository - central fabric store for Software Packages
Software Repository - central fabric store for Software Packages
Bootstrap Service - service for initial installation of nodes
Bootstrap Service - service for initial installation of nodes
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 1414
Distributed designDistributed design
Distributed design in the architecture, in order to ensure Distributed design in the architecture, in order to ensure scalability:scalability:– individual nodes as much autonomous as possibleindividual nodes as much autonomous as possible– local instances of almost every subsystem: operations local instances of almost every subsystem: operations
performed locally where possibleperformed locally where possible– central steering for control and collective operationscentral steering for control and collective operations
Config DB
Local Cache
Config DB
Local Cache
Monitoring
Local Reposito
ry
Monitoring
Local Reposito
ry
Config DB
Local Cache
Config DB
Local Cache
Monitoring
Local Reposito
ry
Monitoring
Local Reposito
ry
Monitoring
Central
Repository
Monitoring
Central
Repository
Central
Config DB
Central
Config DB
Config DB
Local Cache
Config DB
Local Cache
Monitoring
Local Reposito
ry
Monitoring
Local Reposito
ry
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 1515
Scripting layerScripting layer
All subsystems are tied together using a high level All subsystems are tied together using a high level ‘scripting layer’:‘scripting layer’:– allows administrators to code and automate complex allows administrators to code and automate complex
fabric-wide management operationsfabric-wide management operations– coordination in execution of user jobs and coordination in execution of user jobs and
administrative task on the nodesadministrative task on the nodes– scripts can be executed by Fault Tolerance subsystem scripts can be executed by Fault Tolerance subsystem
to automate corrective actionsto automate corrective actions All subsystems provide APIs to control their componentsAll subsystems provide APIs to control their components Subsystems keep their independence and internal Subsystems keep their independence and internal
coherence: the scripting layer only aims at connecting coherence: the scripting layer only aims at connecting them for building high-level operationsthem for building high-level operations
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 1616
Maintenance tasksMaintenance tasks
Control function calls to NMA are known as ‘maintenance Control function calls to NMA are known as ‘maintenance tasks’tasks’– non intrusive: can be executed without interfering with non intrusive: can be executed without interfering with
user jobs (e.g. cleanup of log files)user jobs (e.g. cleanup of log files)– intrusive: for example kernel upgrades or node rebootsintrusive: for example kernel upgrades or node reboots
Two basic node states from the administration point of Two basic node states from the administration point of viewview– production: node is running user jobs or user services production: node is running user jobs or user services
(e.g. NFS server). Only non intrusive tasks can be (e.g. NFS server). Only non intrusive tasks can be executedexecuted
– maintenance: no user jobs or services. Both intrusive maintenance: no user jobs or services. Both intrusive and non intrusive tasks can be executedand non intrusive tasks can be executed
Usually a node is put into maintenance status only when it Usually a node is put into maintenance status only when it is idle, after draining the job queues or switching the is idle, after draining the job queues or switching the services to another node. But there can be exceptions to services to another node. But there can be exceptions to immediately force the status change.immediately force the status change.
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 2020
Installation & Software Mgmt PrototypeInstallation & Software Mgmt Prototype
The current prototype is based on a software tool The current prototype is based on a software tool originally developed by the Computer Science Department originally developed by the Computer Science Department of Edinburgh University: LCFG (Large Scale Linux of Edinburgh University: LCFG (Large Scale Linux Configuration)Configuration)http://www.dcs.ed.ac.uk/home/paul/publications/ALS2000/http://www.dcs.ed.ac.uk/home/paul/publications/ALS2000/
Handles automated installation, configuration and Handles automated installation, configuration and management of machinesmanagement of machines
Basic features:Basic features:– automatic installation of O.S.automatic installation of O.S.– installation/upgrade/removal of all (rpm-based) software installation/upgrade/removal of all (rpm-based) software
packagespackages– centralized configuration and management of machinescentralized configuration and management of machines– extendible to configure and manage custom application extendible to configure and manage custom application
softwaresoftware
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 2121
A collection of agents read configuration parameters and either generate traditional config files or directly manipulate various services
Abstract configuration parameters for all nodes stored in a central repository
ldxprof
LoadProfile
Generic
Component
ProfileObject
rdxprof
ReadProfile
LCFG Objects
Local cache
Client nodes
Web Server
HTTP
XML Profile
LCFG Config Files
Make XMLProfile
Server
LCFG diagramLCFG diagram
+inet.services telnet login ftp
+inet.allow telnet login ftp sshd
+inet.allow_telnet ALLOWED_NETWORKS
+inet.allow_login ALLOWED_NETWORKS
+inet.allow_ftp ALLOWED_NETWORKS
+inet.allow_sshd ALL
+inet.daemon_sshd yes
.....
+auth.users myckey
+auth.userhome_mickey /home/mickey
+auth.usershell_mickey /bin/tcsh
+inet.services telnet login ftp
+inet.allow telnet login ftp sshd
+inet.allow_telnet ALLOWED_NETWORKS
+inet.allow_login ALLOWED_NETWORKS
+inet.allow_ftp ALLOWED_NETWORKS
+inet.allow_sshd ALL
+inet.daemon_sshd yes
.....
+auth.users myckey
+auth.userhome_mickey /home/mickey
+auth.usershell_mickey /bin/tcsh
Config files
<inet>
<allow cfg:template="allow_$ tag_$ daemon_$">
<allow_RECORD cfg:name="telnet">
<allow>192.168., 192.135.30.</allow>
</allow_RECORD>
.....
</auth>
<user_RECORD cfg:name="mickey">
<userhome>/home/MickeyMouseHome</userhome>
<usershell>/bin/tcsh</usershell>
</user_RECORD>
<inet>
<allow cfg:template="allow_$ tag_$ daemon_$">
<allow_RECORD cfg:name="telnet">
<allow>192.168., 192.135.30.</allow>
</allow_RECORD>
.....
</auth>
<user_RECORD cfg:name="mickey">
<userhome>/home/MickeyMouseHome</userhome>
<usershell>/bin/tcsh</usershell>
</user_RECORD>
XML profiles
ProfileObject
inet auth
/etc/services/etc/services
/etc/inetd.conf/etc/inetd.conf
/etc/hosts.allow
in.telnetd : 192.168., 192.135.30.
in.rlogind : 192.168., 192.135.30.
in.ftpd : 192.168., 192.135.30.
sshd : ALL
/etc/hosts.allow
in.telnetd : 192.168., 192.135.30.
in.rlogind : 192.168., 192.135.30.
in.ftpd : 192.168., 192.135.30.
sshd : ALL
/etc/shadow/etc/shadow
/etc/group/etc/group
/etc/passwd
....
mickey:x:999:20::/home/Mickey:/bin/tcsh
....
/etc/passwd
....
mickey:x:999:20::/home/Mickey:/bin/tcsh
....
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 2222
LCFG: future developmentLCFG: future development
APIAPI
Generic
Component
ProfileObject
Cache
Manager
CacheManager
LCFG Objects
Web ServerXML Profile
HTTPConfiguration
Database
Cache
CurrentPrototype
FutureEvolution
ldxprof
LoadProfile
Generic
Component
ProfileObject
rdxprof
ReadProfile
LCFG ObjectsLocal cache
Client nodes
Web Server
HTTP
XML Profile
LCFG Config Files
Make XMLProfile
Server
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 2323
LCFG configuration (I)LCFG configuration (I)
Most of the configuration data are common for a category Most of the configuration data are common for a category of nodes (e.g. diskservers, computing nodes) and only a of nodes (e.g. diskservers, computing nodes) and only a few are node-specific (e.g. hostname, IP-address)few are node-specific (e.g. hostname, IP-address)
Using the cpp preprocessor it is possible to build a Using the cpp preprocessor it is possible to build a hierarchical structure of config files containing directives hierarchical structure of config files containing directives like like #define#define, , #include#include, , #ifdef#ifdef, comments with /* */, etc..., comments with /* */, etc...
The configuration of a typical LCFG node looks like this:The configuration of a typical LCFG node looks like this:
#define HOSTNAME pc239 /* Host specific definitions */#define HOSTNAME pc239 /* Host specific definitions */
#include "site.h" /* Site specific definitions */#include "site.h" /* Site specific definitions */
#include "linuxdef.h" /* Common linux resources */#include "linuxdef.h" /* Common linux resources */
#include "client.h" /* LCFG client specific resources #include "client.h" /* LCFG client specific resources */*/
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 2424
LCFG configuration (II)LCFG configuration (II)
From "site.h"From "site.h"#define LCFGSRV grid01#define LCFGSRV grid01#define URL_SERVER_CONFIG http://grid01/lcfg#define URL_SERVER_CONFIG http://grid01/lcfg#define LOCALDOMAIN .lnl.infn.it#define LOCALDOMAIN .lnl.infn.it#define DEFAULT_NAMESERVERS 192.135.30.245#define DEFAULT_NAMESERVERS 192.135.30.245 [...][...]
From "linuxdef.h"From "linuxdef.h"update.interfaces eth0update.interfaces eth0update.hostname_eth0 HOSTNAMEupdate.hostname_eth0 HOSTNAMEupdate.netmask_eth0 NETMASKupdate.netmask_eth0 NETMASK
[...][...]From "client.h"From "client.h"
update.disks hdaupdate.disks hdaupdate.partitions_hda hda1 hda2update.partitions_hda hda1 hda2
update.pdetails_hda1 free /update.pdetails_hda1 free /update.pdetails_hda2 128 swapupdate.pdetails_hda2 128 swapauth.users mickeyauth.users mickeyauth.usercomment_mickey Mickey Mouseauth.usercomment_mickey Mickey Mouseauth.userhome_mickey /home/Mickeyauth.userhome_mickey /home/Mickey[...][...]
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 2525
LCFG: configuration changesLCFG: configuration changes
Server-side: when the config files are modified, a tool Server-side: when the config files are modified, a tool (mkxprof) recreates the new xml profile for all the nodes (mkxprof) recreates the new xml profile for all the nodes affected by the changesaffected by the changes– this can be done manually or with a daemon periodically this can be done manually or with a daemon periodically
checking for config changes and calling mkxprofchecking for config changes and calling mkxprof– mkxprof can notify via UDP the nodes affected by the mkxprof can notify via UDP the nodes affected by the
changeschanges Client-side: another tool (rdxprof) downloads the new Client-side: another tool (rdxprof) downloads the new
profile from the serverprofile from the server– usually activated by an LCFG object at bootusually activated by an LCFG object at boot– can be configured to work ascan be configured to work as
daemon periodically polling the serverdaemon periodically polling the server daemon waiting for notificationsdaemon waiting for notifications started by cron at predefined timesstarted by cron at predefined times
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 2626
LCFG: what’s an object?LCFG: what’s an object?
It's a simple shell script (but in future it will probably be a It's a simple shell script (but in future it will probably be a perl script)perl script)
Each object provides a number of “methods” (start, stop, Each object provides a number of “methods” (start, stop, reconfig, query, ...) which are invoked at appropriate timesreconfig, query, ...) which are invoked at appropriate times
A simple and typical object behaviour:A simple and typical object behaviour:– Started by profile object when notified of a configuration Started by profile object when notified of a configuration
changechange– Loads its configuration from the cacheLoads its configuration from the cache– Configures the appropriate services, either translating Configures the appropriate services, either translating
config parameters into a traditional config file or directly config parameters into a traditional config file or directly controlling the service (e.g. starting a daemon with controlling the service (e.g. starting a daemon with command-line parameters derived from configuration).command-line parameters derived from configuration).
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 2727
LCFG: custom objectsLCFG: custom objects
LCFG provides the objects to manage all the standard LCFG provides the objects to manage all the standard services of a machine: inet, syslog, auth, nfs, cron, ...services of a machine: inet, syslog, auth, nfs, cron, ...
Admins can build new custom objects to configure and Admins can build new custom objects to configure and manage their own applications:manage their own applications:– define your custom “resources” (configuration define your custom “resources” (configuration
parameters) to be added to the node profileparameters) to be added to the node profile– include in your script the object “generic”, which include in your script the object “generic”, which
contains the definition of common function used by all contains the definition of common function used by all objects (config loading, log, output, ...)objects (config loading, log, output, ...)
– overwrite the standard methods (start, stop, reconfig, ...) overwrite the standard methods (start, stop, reconfig, ...) with your custom codewith your custom code
– for simple objects usually just a few lines of codefor simple objects usually just a few lines of code
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 2828
LCFG: Software Packages ManagementLCFG: Software Packages Management
Currently it is RedHat-specific: heavily dependent on the Currently it is RedHat-specific: heavily dependent on the RPM toolRPM tool
The software to install is defined in a file on the server The software to install is defined in a file on the server containing a list of RPM packages (currently not yet containing a list of RPM packages (currently not yet merged in the XML profile)merged in the XML profile)
Whenever the list is modified, the required RPM packages Whenever the list is modified, the required RPM packages are automatically installed/upgraded/removed by a specific are automatically installed/upgraded/removed by a specific LCFG object (updaterpms), which is started at boot or when LCFG object (updaterpms), which is started at boot or when the node is notified of the changethe node is notified of the change
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 2929
First boot via floppy or via network
Initialization script starts
First boot via floppy or via network
Initialization script starts
LCFG: node installation procedureLCFG: node installation procedure
DHCP Server
Software
Packages
Software
Packages
IP address
Config URL
IP address
Config URL
Root Image
with LCFGenvironme
nt
Root Image
with LCFGenvironme
nt
NFS Server
LCFG Config
Files
LCFG Config
FilesXML
Profiles
XML
Profiles
LCFG Server WEB Server
Software Repository
Client Node
After reboot LCFG objects complete the node configuration
After reboot LCFG objects complete the node configuration
Root Image complete with LCFG
environment mounted via NFS
Root Image complete with LCFG
environment mounted via NFS
Load minimal config data via DHCP:
IP Address, Gateway, LCFG Config URL
Load minimal config data via DHCP:
IP Address, Gateway, LCFG Config URL
Load complete configuration via
HTTP
Load complete configuration via
HTTP
Start object “install”:
disk partitioning, network,...
installation of required packages
copy of LCFG configuration
reboot
Start object “install”:
disk partitioning, network,...
installation of required packages
copy of LCFG configuration
reboot
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 3030
LCFG: summaryLCFG: summary
Pros:Pros:– In Edinburgh it has been used for years in a complex In Edinburgh it has been used for years in a complex
environment, managing hundreds of nodesenvironment, managing hundreds of nodes– Supports the complete installation and management of Supports the complete installation and management of
all the software (both O.S. and applications)all the software (both O.S. and applications)– Extremely flexible and easy to customizeExtremely flexible and easy to customize
Cons:Cons:– Complex: steep learning curveComplex: steep learning curve– Prototype: the evolution of this tool is not clear yetPrototype: the evolution of this tool is not clear yet– Lack of user-friendly tools for the creation and Lack of user-friendly tools for the creation and
management of configuration files: errors can be very management of configuration files: errors can be very dangerous!dangerous!
M.Biasotto, CERN, 5 november 2001M.Biasotto, CERN, 5 november 2001 3131
Future plansFuture plans
Future evolution not clearly defined: it will depend also on Future evolution not clearly defined: it will depend also on results of forthcoming tests (1results of forthcoming tests (1stst Datagrid milestone) Datagrid milestone)
Integration of current prototype with Configuration Integration of current prototype with Configuration Management componentsManagement components– Config Cache Manager and API released ad prototypes Config Cache Manager and API released ad prototypes
but not yet integrated with LCFGbut not yet integrated with LCFG Configuration DataBaseConfiguration DataBase
– complete definition of node profilescomplete definition of node profiles– user-friendly tools to access and modify config user-friendly tools to access and modify config
informationinformation Development of still missing objectsDevelopment of still missing objects
– system services (AFS, PAM, ...)system services (AFS, PAM, ...)– fabric software (grid sw, globus, batch systems, ...)fabric software (grid sw, globus, batch systems, ...)– application software (CMS, Atlas, ...) in collaboration application software (CMS, Atlas, ...) in collaboration
with people from experimentswith people from experiments