Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001...

25
Cluster Configuration Cluster Configuration Update Including LSF Update Including LSF Status Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Thursday, June 23, 2022

Transcript of Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001...

Page 1: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

Cluster Configuration Cluster Configuration Update Including LSF Update Including LSF StatusStatus

Thorsten Kleinwort forCERN IT/PDP-ISHEPiX I/2001

LAL OrsayFriday, April 21, 2023

Page 2: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

Cluster Configuration Cluster Configuration Update and LSF StatusUpdate and LSF Status

Function

Software

Hardware Management

Cluster Configuration

Page 3: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

Cluster Configuration Cluster Configuration Update and LSF StatusUpdate and LSF Status

Function

Software

Hardware Management

Cluster Configuration

Page 4: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

FunctionFunction

CERN IT/PDP-IS responsible for:• Central Unix based batch &

interactive platforms:• LXPLUS, LXBATCH, RSPLUS, DXPLUS, HPPLUS

• Installation, maintenance & support• Dedicated clusters for several

experiments (batch & interactive):• Different setups, different HW, user mgmt…• Individual configurations

Page 5: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

FunctionFunction

PC systems run by PDP

0

200

400

600

800

1000

1200

1400

1600

Jul-97

Jan-98

Jul-98

Jan-99

Jul-99

Jan-00

Jul-00

Jan-01

Tim Smith IT/PDP

#CP

Us

testbed

lxshare

eff

lxbatch

lxplus

tomog

tapes

pcsf

nomad

na49

na48

na45

mta

l3c

ion

cms

ccf

atlas

alice

Page 6: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

FunctionFunction

• LEP Experiments:• ‘Old’ Experiments,all kind of legacy platforms:

leave until 2003, freezing earlier not practical

• Non-LEP Experiments:• Transition to Linux/Solaris ASAP• Merge experiment clusters into

LXBATCH/LXPLUS:• Reduce diversity• More efficient use of shared resources

Page 7: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

Cluster Configuration Cluster Configuration Update and LSF StatusUpdate and LSF Status

Function

Software

Hardware Management

Cluster Configuration

Page 8: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

SoftwareSoftware

• In the past: All Unix flavours• Now: Mainly Linux (RedHat)• Solaris as 2nd platform:

• Check software for platform dependencies• Enhanced debugging/development tools on

Solaris

• AFS for software/homedir/scratch• Started recently to investigate OpenAFS

• RFIO for data access:we want to avoid NFS

Page 9: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

Software: InstallationSoftware: Installation

• Kickstart & Jumpstart (Linux & Solaris):For basic system installation

• SUE:For post installation & configuration

• ASIS:For software installation in /usr/local:now whole ASIS (~3GB) is local

• LSF

Page 10: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

Software: BatchSoftware: Batch

• LSF with Multicluster option:• Interactive nodes: submission hosts (cluster)• Batch nodes: execution hosts (cluster)• Some interactive nodes have night/weekend

queues

• On public cluster (LXBATCH):• Dedicated resources for experiments• Some clusters are “cross linked”, e.g.

submission from a dedicated cluster to LXBATCH

• Open question of scalability

Page 11: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

Software: LSF Software: LSF Multicluster Multicluster

Submit Cluster: Execution Cluster:

LXPLUS LXBATCH

Queue: 1nd 1nd

cms_1nd cms_1nd

CMS_CLUSTER CMS_BATCH

cms_queuecms_queue

Page 12: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

Software: Batch Software: Batch

• Shared batch facility requirements:• If dedicated resource is unused, it should be

available for others• On the other hand, allocation of dedicated

nodes ASAP, if needed• Queues/Resources should be controlled by

UNIX groups rather than users to handle huge number and frequently changing users

• “Wish list” for LSF in preparation, to send to Platform Computing

Page 13: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

Cluster Configuration Cluster Configuration Update and LSF StatusUpdate and LSF Status

Function

Software

Hardware Management

Cluster Configuration

Page 14: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

HardwareHardware

• All kind of legacy HW in clusters:IBM, SGI, DEC, HP…

• Now concentrating on Intel PC running Linux (on both client & server side)

• Sun (Solaris) as 2nd HW platform:Building development cluster SUNDEV

• RISC decommissioning in progress

Page 15: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

Hardware:Hardware:RISC DecommissioningRISC Decommissioning

0

100

200

300

400

500

600

4Q

2000

1Q

2001

2Q

2001

3Q

2001

4Q

2001

1Q

2002

2Q

2002

3Q

2002

4Q

2002

1Q

2003

2Q

2003

3Q

2003

4Q

2003

1Q

2004

# P

roce

ssors W/ NT

AI X

I RI X

HP-UX

DUX

Solaris

Page 16: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

Hardware: Intel PCHardware: Intel PC

• Still utilize boxes:• Financial rules & difficult TCO definition for

rack mounted solutions

• But plans to go to rack-mounted solutions in the future

• Intel PCs: differences on each offer:(1 or 2 disks; 2,4,8,12,20,30 GB)

• Experiments buying equipment:Broadens diversity

Page 17: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

HardwareHardware

Page 18: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

HardwareHardware

On the server/service side:• Going from RISC/SCSI to Intel/EIDE:

• Mirrored 1.5TB 20x75GB EIDE disks servers• Testing RAID 5

• All Tape Services are now on PCs• AFS servers are now on SUNs:

• Experimenting with AFS scratch on Linux

Page 19: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

Cluster Configuration Cluster Configuration Update and LSF StatusUpdate and LSF Status

Function

Software

Hardware Management

Cluster Configuration

Page 20: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

ManagementManagement

Currently:• Merging clusters into LXPLUS/LXBATCH• Aligning individual setups into global

ones• Continue RISC decommissioning:

• Restrict usage to LEP Experiments• Transferring users to public facilities

• Face rapidly growing number of clients • Automate & optimise

Page 21: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

ManagementManagement

Starting Testbed (Intel/Linux Dual PCs)

• In 2000 ~ 100 machines• In 2001 ~ 200 machines• In addition:

• LHC Test facility• Testbed for the DataGrid Project

• It will grow over the next two years to reach a significant fraction of the LHC scale by 2003

Page 22: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

Testbed ScheduleTestbed Schedule

0

50

100

150

200

25053 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Week Number

Nu

mb

er o

f P

Cs

NA49

NA48

CMS

NA45

COMPASS

ALICE

Available

Page 23: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

ManagementManagement

Collaboration with DataGrid:• WP4 (Computing Fabric):

• Installation Task• Configuration Task• Monitoring Task

• We contribute to WP4 and want to benefit from it

• Talk by Philippe Defert on DataGrid

Page 24: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

ManagementManagement

New internal projects started:• User account management:

• “How to manage /etc/passwd, /etc/groups,…”• Investigate central service (LDAP)

• Accounting:• How to control access & usage of shared

facilities by different groups

• Security:• Increase the host based security by checking

the integrity of the system

Page 25: Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

April 21, 2023Thorsten Kleinwort

IT/PDP/IS

OutlookOutlook

• Reducing diversity of HW/SW• Continue merging of clusters• Facing growing number of PCs• Starting internal projects• Benefit from DataGrid WP4• Going for LHC:

prepare now to be ready when it starts