Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

17
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC

Transcript of Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Page 1: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Setting up of condor scheduler on computing cluster

Raman SehgalNPD-BARC

Page 2: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Outline

Softwares required for the cluster.Network Topology CONDOR and different roles of machine in condor pool.Various Condor Environments.Pre-requisite of CondorConfiguration of Condor on our LAN.Running jobs using Condor and some commonly used condor commandsMPIConclusion

`

Page 3: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Softwares required for the cluster

•The Cluster requires following softwares:

•Operating System : Scientific Linux CERN 5.4 – 64 bit version

•Cluster Management : Management is done through IPMI (inbuilt)

•Cluster Usage and Statistics : Using “Ganglia”

•Cluster Middleware : CONDOR

•Parallel Programming Environment : MPI

Page 4: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Network Topology

•One head node containing all users

home directory

•16 worker node that will provide

computational power

•Head node will be connected to both public and private network

Public Network : Allow users to login on Head Node

Private Network : Connect all worker node to head node using Gigabit and Infiniband Network.

Used for job submission and execution

•File System : Network File System to have a shared area among Head node and Worker nodes

Page 5: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Prototype distributed and parallel computing environment•A Prototype distributed and parallel computing environment for the cluster is setup on a LAN of 4 computer.

•Distributed computing environment : Using CONDOR

•Parallel computing environment : Using MPI

CONDOR•Condor is an open source high-throughput computing software package for distributed parallelization of computationally intensive task.

•Used to manage workload on a cluster of computing nodes.

•It can integrate both dedicated resourced (rack mounted clusters) and non dedicated

desktop machines by making use of cycle scavenging.•Can run both sequential and parallel jobs.

•Provide different universes to run jobs (vanilla, standard, MPI, Java etc..)

Page 6: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Condor exceptional features:•Checkpointing and Migration

•Remote System calls

•No changes are necessary to user source code

•Sensitive to desires of Machine owner (in case of non dedicated machine).

Different roles of a machine in condor

pool•Central Manager : The main administration

machine.

•Execute : These are machine where job executes.

•Submit : These machine are used to submit the

job.

Page 7: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Various Condor DaemonsFollowing condor daemons runs on different machine in the condor pool

•Condor_master : Take care of rest of the daemons running on a machine.

•Condor_collector : Responsible for collecting information about status of pool

•Condor_negotiator : Responsible for all the match-making within Condor System

•Condor_schedd : This daemon represent resource requests to the Condor pool

•Condor_startd : This daemon represents a given resource to the Condor pool

Page 8: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Various condor environment to run different types of jobs

Condor provides several universes to run different type of jobs, some are as follows:

Standard : This universe provides condor’s full power, it provides following features

1. Checkpointing 2. Migration 3. Remote System Calls.

•The job needs to be relinked with condor libraries in order to run in standard universe.•This can be easily achieved by putting condor_compile in front of usual link commandEg. Normal linking of a job : gcc –o my_prog my_prog.c for standard universe the job is prepared by condor_compile gcc –o my_prog my_prog.c Now this job can utilize the power of standard universe.

Page 9: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Vanilla : This universe is intended for programs which cannot be successfully relinked with condor libraries.

1. Shell scripts are one of the example of jobs where vanilla is useful.

2. Jobs that run under vanilla universe cannot utilize checkpointing or remote system calls.

3. Since remote system call feature is not available so we need a

shared file system such as NFS or AFS.

Parallel : This universe allow parallel programs, such as MPI jobs to be run in

condor environment.

Page 10: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Prerequisites of condor configurationSetup of Private network of machine in computing poolPasswordless login from submit machines to all execute machine (rsh or ssh)

Configuration of condor on our small LAN of 4 computers•On our LAN of four machines we have one head node and remaining 3 worker nodes

•Condor is installed and configured on our pool and role of each machine is mentioned below:

1. Head Node : Central Manager, Submit

2. Worker Node : Execute

•Home directory of all the users resides on head node.

•These home directories resides in a shared area (using NFS) which can be accessed

by all the worker nodes. (required for vanilla universe).

•Now user can submit job from their home directories.

Page 11: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Running jobs using Condor

Following are the steps to run the condor job.•Prepare the Code

•Chose the Condor Universe

•Make the Submit description file (submit.ip), a sample file is shown below:# # # # # # # # # # # # # # # # # # # # # # # ##Sample Submit Description file # # # # # # # # # # # # # # # # # # # # # # # #Executable = getIpUniverse = standardOutput = getIp.outError = getIp.errLog = hello.logQueue 15

•Submit the Job: Now this job can be submitted by following condor commandCondor_submit submit.ip

Page 12: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Commonly used Condor commandsCondor_submit : Used to submit the job

Condor_q : displays information about jobs in condor job queue

Page 13: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Condor_status : used to monitor, query and display status of the Condor pool

`

Page 14: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Condor_history : helps the users to View log of Condor jobs completed up to date.

Condor_rm : removes one or more jobs from the Condor job queue.Condor_compile : used to relink the job with condor libraries, so that now it can be executed in standard universe.

Page 15: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

MPI•MPI is language independent communication

protocol used to do parallel programming.

•Different languages provides their wrapper

compiler to do MPI programming.

•Here we have installed MPICH that will

allow us to do parallel programming in

C,C++, fortran etc.

•Computation v/s communication.

•SISD,SIMD,MISD,MIMD (Flynn’s classification)

•MPI requires the executable to present on

all the machine in the pool

•This achieved via NFS shared area.

•Testing is done through matrix multiplication program.

•Considerable reduction in execution time.

Page 16: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Conclusion:CONDOR is installed and configured on a small LAN of 4 computers and it is working properly and is giving expected results.Later on this prototype setup will be replicated on a computing cluster having 16 worker nodes that will provide a processing power of 1.3 TFlops plus a storage of 20 TBytes.The setup is also ready to run parallel jobs. So in future if we have some parallel job application then we are ready for it.

``

Page 17: Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.