The HEPiX IPv6 working group David Kelsey (STFC-RAL) HEPiX meeting, Bologna 17 Apr 2013.
Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001...
-
Upload
spencer-bryan -
Category
Documents
-
view
214 -
download
0
Transcript of Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001...
Cluster Configuration Cluster Configuration Update Including LSF Update Including LSF StatusStatus
Thorsten Kleinwort forCERN IT/PDP-ISHEPiX I/2001
LAL OrsayFriday, April 21, 2023
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
Cluster Configuration Cluster Configuration Update and LSF StatusUpdate and LSF Status
Function
Software
Hardware Management
Cluster Configuration
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
Cluster Configuration Cluster Configuration Update and LSF StatusUpdate and LSF Status
Function
Software
Hardware Management
Cluster Configuration
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
FunctionFunction
CERN IT/PDP-IS responsible for:• Central Unix based batch &
interactive platforms:• LXPLUS, LXBATCH, RSPLUS, DXPLUS, HPPLUS
• Installation, maintenance & support• Dedicated clusters for several
experiments (batch & interactive):• Different setups, different HW, user mgmt…• Individual configurations
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
FunctionFunction
PC systems run by PDP
0
200
400
600
800
1000
1200
1400
1600
Jul-97
Jan-98
Jul-98
Jan-99
Jul-99
Jan-00
Jul-00
Jan-01
Tim Smith IT/PDP
#CP
Us
testbed
lxshare
eff
lxbatch
lxplus
tomog
tapes
pcsf
nomad
na49
na48
na45
mta
l3c
ion
cms
ccf
atlas
alice
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
FunctionFunction
• LEP Experiments:• ‘Old’ Experiments,all kind of legacy platforms:
leave until 2003, freezing earlier not practical
• Non-LEP Experiments:• Transition to Linux/Solaris ASAP• Merge experiment clusters into
LXBATCH/LXPLUS:• Reduce diversity• More efficient use of shared resources
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
Cluster Configuration Cluster Configuration Update and LSF StatusUpdate and LSF Status
Function
Software
Hardware Management
Cluster Configuration
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
SoftwareSoftware
• In the past: All Unix flavours• Now: Mainly Linux (RedHat)• Solaris as 2nd platform:
• Check software for platform dependencies• Enhanced debugging/development tools on
Solaris
• AFS for software/homedir/scratch• Started recently to investigate OpenAFS
• RFIO for data access:we want to avoid NFS
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
Software: InstallationSoftware: Installation
• Kickstart & Jumpstart (Linux & Solaris):For basic system installation
• SUE:For post installation & configuration
• ASIS:For software installation in /usr/local:now whole ASIS (~3GB) is local
• LSF
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
Software: BatchSoftware: Batch
• LSF with Multicluster option:• Interactive nodes: submission hosts (cluster)• Batch nodes: execution hosts (cluster)• Some interactive nodes have night/weekend
queues
• On public cluster (LXBATCH):• Dedicated resources for experiments• Some clusters are “cross linked”, e.g.
submission from a dedicated cluster to LXBATCH
• Open question of scalability
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
Software: LSF Software: LSF Multicluster Multicluster
Submit Cluster: Execution Cluster:
LXPLUS LXBATCH
Queue: 1nd 1nd
cms_1nd cms_1nd
CMS_CLUSTER CMS_BATCH
cms_queuecms_queue
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
Software: Batch Software: Batch
• Shared batch facility requirements:• If dedicated resource is unused, it should be
available for others• On the other hand, allocation of dedicated
nodes ASAP, if needed• Queues/Resources should be controlled by
UNIX groups rather than users to handle huge number and frequently changing users
• “Wish list” for LSF in preparation, to send to Platform Computing
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
Cluster Configuration Cluster Configuration Update and LSF StatusUpdate and LSF Status
Function
Software
Hardware Management
Cluster Configuration
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
HardwareHardware
• All kind of legacy HW in clusters:IBM, SGI, DEC, HP…
• Now concentrating on Intel PC running Linux (on both client & server side)
• Sun (Solaris) as 2nd HW platform:Building development cluster SUNDEV
• RISC decommissioning in progress
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
Hardware:Hardware:RISC DecommissioningRISC Decommissioning
0
100
200
300
400
500
600
4Q
2000
1Q
2001
2Q
2001
3Q
2001
4Q
2001
1Q
2002
2Q
2002
3Q
2002
4Q
2002
1Q
2003
2Q
2003
3Q
2003
4Q
2003
1Q
2004
# P
roce
ssors W/ NT
AI X
I RI X
HP-UX
DUX
Solaris
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
Hardware: Intel PCHardware: Intel PC
• Still utilize boxes:• Financial rules & difficult TCO definition for
rack mounted solutions
• But plans to go to rack-mounted solutions in the future
• Intel PCs: differences on each offer:(1 or 2 disks; 2,4,8,12,20,30 GB)
• Experiments buying equipment:Broadens diversity
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
HardwareHardware
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
HardwareHardware
On the server/service side:• Going from RISC/SCSI to Intel/EIDE:
• Mirrored 1.5TB 20x75GB EIDE disks servers• Testing RAID 5
• All Tape Services are now on PCs• AFS servers are now on SUNs:
• Experimenting with AFS scratch on Linux
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
Cluster Configuration Cluster Configuration Update and LSF StatusUpdate and LSF Status
Function
Software
Hardware Management
Cluster Configuration
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
ManagementManagement
Currently:• Merging clusters into LXPLUS/LXBATCH• Aligning individual setups into global
ones• Continue RISC decommissioning:
• Restrict usage to LEP Experiments• Transferring users to public facilities
• Face rapidly growing number of clients • Automate & optimise
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
ManagementManagement
Starting Testbed (Intel/Linux Dual PCs)
• In 2000 ~ 100 machines• In 2001 ~ 200 machines• In addition:
• LHC Test facility• Testbed for the DataGrid Project
• It will grow over the next two years to reach a significant fraction of the LHC scale by 2003
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
Testbed ScheduleTestbed Schedule
0
50
100
150
200
25053 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
Week Number
Nu
mb
er o
f P
Cs
NA49
NA48
CMS
NA45
COMPASS
ALICE
Available
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
ManagementManagement
Collaboration with DataGrid:• WP4 (Computing Fabric):
• Installation Task• Configuration Task• Monitoring Task
• We contribute to WP4 and want to benefit from it
• Talk by Philippe Defert on DataGrid
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
ManagementManagement
New internal projects started:• User account management:
• “How to manage /etc/passwd, /etc/groups,…”• Investigate central service (LDAP)
• Accounting:• How to control access & usage of shared
facilities by different groups
• Security:• Increase the host based security by checking
the integrity of the system
April 21, 2023Thorsten Kleinwort
IT/PDP/IS
OutlookOutlook
• Reducing diversity of HW/SW• Continue merging of clusters• Facing growing number of PCs• Starting internal projects• Benefit from DataGrid WP4• Going for LHC:
prepare now to be ready when it starts