Advanced Condor mechanisms CERN Feb 14 2011

Condor ProjectComputer Sciences DepartmentUniversity of Wisconsin-Madison

Advanced Condor mechanisms

CERN Feb 14 2011

2www.condorproject.org

a better title…“Condor Potpourri”

› Igor feedback “Could be useful to people, but not Monday”

› If not of interest, new topic in 1 minute


Central Manager Failover

› Condor Central Manager has two services

› condor_collector Now a list of collectors is supported

› condor_negotiator (matchmaker) If fails, election process, another takes

over Contributed technology from Technion


Submit node robustness:Job Progress continues if connection is interrupted

› Condor supports reestablishment of the connection between the submitting and executing machines. If network outage between execute and submit

machine If submit machine restarts

› To take advantage of this feature, put the following line into their job’s submit description file:

JobLeaseDuration = <N seconds>For example:

job_lease_duration = 1200


Submit node robustness:Job Progress continues if

submit machine failsAutomatic Schedd FailoverCondor can support a submit

machine “hot spare” If your submit machine A is down for

longer than N minutes, a second machine B can take over

Requires shared filesystem (or just DRBD*?) between machines A and B

*Distributed Replicated Block Device – www.drbd.org


DRBD


Interactive Debugging

› Why is my job still running?Is it stuck accessing a file?Is it in an infinite loop?

› condor_ssh_to_job Interactive debugging in UNIX Use ps, top, gdb, strace, lsof, … Forward ports, X, transfer files, etc.


condor_ssh_to_job Example

% condor_q

-- Submitter: perdita.cs.wisc.edu : <128.105.165.34:1027> : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 einstein 4/15 06:52 1+12:10:05 R 0 10.0 cosmos

1 jobs; 0 idle, 1 running, 0 held

% condor_ssh_to_job 1.0

Welcome to [email protected]!Your condor job is running with pid(s) 15603.

$ gdb –p 15603 …


How it works› ssh keys created for each invocation

› ssh Uses OpenSSH ProxyCommand to use connection

created by ssh_to_job

› sshd runs as same user id as job receives connection in inetd mode

• So nothing new listening on network• Works with CCB and shared_port


What?? Ssh to my worker nodes??

› Why would any sysadmin allow this?

› Because the process tree is managed Cleanup at end of job Cleanup at logout

› Can be disabled by nonbelievers


Concurrency Limits

› Limit job execution based on admin-defined consumable resources E.g. licenses

› Can have many different limits

› Jobs say what resources they need

› Negotiator enforces limits pool-wide

11


Concurrency Example

› Negotiator config file MATLAB_LIMIT = 5 NFS_LIMIT = 20

› Job submit file concurrency_limits = matlab,nfs:3

This requests 1 Matlab token and 3 NFS tokens

12


Green Computing

› The startd has the ability to place a machine into a low power state. (Standby, Hibernate, Soft-Off, etc.) HIBERNATE, HIBERNATE_CHECK_INTERVAL If all slots return non-zero, then the machine

can powered down via condor_power hook A final acked classad is sent to the collector

that contains wake-up information› Machines ads in “Offline State”

Stored persistently to disk Ad updated with “demand” information: if

this machine was around, would it be matched?


Now what?


condor_rooster

› Periodically wake up based on ClassAd expression (Rooster_UnHibernate)

› Throttling controls

› Hook callouts make for interesting possibilities…


Dynamic Slot Partitioning

› Divide slots into chunks sized for matched jobs

› Readvertise remaining resources

› Partitionable resources are cpus, memory, and disk

› See Matt Farrellee’s talk

20


Dynamic Partitioning Caveats

› Cannot preempt original slot or group of sub-slots Potential starvation of jobs with large

resource requirements

› Partitioning happens once per slot each negotiation cycle Scheduling of large slots may be slow

21


High Throughput Parallel Computing

› Parallel jobs that run on a single machine Today 8-16 cores, tomorrow 32+ cores

› Use whatever parallel software you want It ships with the job MPI, OpenMP, your own scripts Optimize for on-board memory access


Configuring Condor for HTPC

› Two strategies: Suspend/drain jobs to open HTPC slots Hold empty cores until HTPC slot is open

› We have a recipe for the former on the Condor Wiki http://condor-wiki.cs.wisc.edu

› User accounting enabled by Condor’s notion of “Slot Weights”


CPU AffinityFour core Machine

running four jobs w/o affinity

j1 j2 j3 j4

j3a j3b j3c j3d

core1 core2 core3 core4


CPU Affinityto the rescue

SLOT1_CPU_AFFINITY = 0SLOT2_CPU_AFFINITY = 1SLOT3_CPU_AFFINITY = 2SLOT4_CPU_AFFINITY = 3


Four core Machinerunning four jobs

w/affinity

j1 j2 j3 j4

j3a

j3b

j3c

j3d

core1 core2 core3 core4


Condor + Hadoop FS (HDFS)

Condor+HDFS = 2 + 2 = 5 !!! A Synergy exists (next slide)

• Hadoop as distributed storage system• Condor as cluster management system

Large number of distributed disks in a compute cluster

Managing disk as a resource


condor_hdfs daemon

› Main integration point of HDFS within Condor

› Configures HDFS cluster based on existing condor_config files

› Runs under condor_master and can be controlled by existing Condor utilities

› Publish interesting parameters to Collector e.g IP address, node type, disk activity

› Currently deployed at UW-Madison


Condor + HDFS : Next Steps?

› Integrate with File Transfer Mechanism

› FileNode Failover› Management of HDFS› What about HDFS in a GlideIn

environment?? › Online transparent access to

HDFS??


Remote I/O Socket› Job can request that the condor_starter process

on the execute machine create a Remote I/O Socket

› Used for online access of file on submit machine – without Standard Universe. Use in Vanilla, Java, …

› Libraries provided for Java and for C, e.g. :Java: FileInputStream -> ChirpInputStream

C : open() -> chirp_open()› Or use Parrot!


Job

Fork

startershadow

HomeFile

System

I/O Library

I/O Server I/O Proxy

Secure Remote I/O

Local System Calls

Local I/O(Chirp)

Execution SiteSubmission Site


DMTCP› Written at Northeastern U. and MIT

› User-level process checkpoint/restart library

› Fewer restrictions than Condor’s Standard Universe Handles threads and multiple processes No re-link of executable

› DMTCP and Condor Vanilla Universe integration exists via a job wrapper script


Questions?

Thank You!

Advanced Condor mechanisms CERN Feb 14 2011

Documents

Transcript of Advanced Condor mechanisms CERN Feb 14 2011