Florida Tech Grid Cluster
-
Upload
idona-leach -
Category
Documents
-
view
31 -
download
0
description
Transcript of Florida Tech Grid Cluster
Florida Tech Grid Florida Tech Grid ClusterClusterP. FordP. Ford22 * X. Fave * X. Fave11 * M. Hohlmann * M. Hohlmann11
High Energy Physics GroupHigh Energy Physics Group11Department of Physics and Space SciencesDepartment of Physics and Space Sciences22Department of Electrical & Computer EngineeringDepartment of Electrical & Computer Engineering
HistoryHistory
Original conception in 2004 with FIT ACITC Original conception in 2004 with FIT ACITC grant.grant.
2007 - Received over 30 more low-end systems 2007 - Received over 30 more low-end systems from UF. Basic cluster software operational.from UF. Basic cluster software operational.
2008 - Purchased high-end servers and 2008 - Purchased high-end servers and designed new cluster. Established Cluster on designed new cluster. Established Cluster on Open Science Grid.Open Science Grid.
2009 - Upgraded and added systems. 2009 - Upgraded and added systems. Registered as CMS Tier 3 site.Registered as CMS Tier 3 site.
Current StatusCurrent Status
OS: Rocks V (CentOS 5.0)OS: Rocks V (CentOS 5.0)
Job Manager: Condor 7.2.0Job Manager: Condor 7.2.0
Grid Middleware: OSG 1.2, Berkeley Storage Manager Grid Middleware: OSG 1.2, Berkeley Storage Manager (BeStMan) 2.2.1.2.i7.p3, Physics Experiment Data (BeStMan) 2.2.1.2.i7.p3, Physics Experiment Data Exports (PhEDEx) 3.2.0Exports (PhEDEx) 3.2.0
Contributed over 400,000 wall hours to CMS experiment. Contributed over 400,000 wall hours to CMS experiment. Over 1.3M wall hours total.Over 1.3M wall hours total.
Fully Compliant on OSG Resource Service Validation Fully Compliant on OSG Resource Service Validation (RSV), and CMS Site Availability Monitoring (SAM) tests. (RSV), and CMS Site Availability Monitoring (SAM) tests.
System ArchitectureSystem Architecture
nas-0-0nas-0-0
Compute Element (CE)Compute Element (CE)
Storage Element (SE)Storage Element (SE)
compute-2-Xcompute-2-Xcompute-1-Xcompute-1-X
HardwareHardware
CE/Frontend: 8 Intel Xeon E5410, 16GB RAM, RAID5CE/Frontend: 8 Intel Xeon E5410, 16GB RAM, RAID5
NAS0: 4 CPUs, 8GB RAM, 9.6TB RAID6 ArrayNAS0: 4 CPUs, 8GB RAM, 9.6TB RAID6 Array
SE: 8 CPUs, 64GB RAM, 1TB RAID5SE: 8 CPUs, 64GB RAM, 1TB RAID5
20 Compute Nodes: 8 CPUs & 16GB RAM each. 20 Compute Nodes: 8 CPUs & 16GB RAM each. 160 total batch slots.160 total batch slots.
Gigabit networking, Cisco Express at core.Gigabit networking, Cisco Express at core.
2x 208V 5kVA UPS for nodes, 1x 120V 3kVA 2x 208V 5kVA UPS for nodes, 1x 120V 3kVA UPS for critical systems.UPS for critical systems.
Rocks OSRocks OS
Huge software package for clusters (e.g. 411, Huge software package for clusters (e.g. 411, dev tools, apache, autofs, ganglia)dev tools, apache, autofs, ganglia)
Allows customization through “Rolls” and Allows customization through “Rolls” and appliances. Config stored in MySQL.appliances. Config stored in MySQL.
Customizable appliances auto-install nodes Customizable appliances auto-install nodes and and post-install scripts.post-install scripts.
StorageStorage
Set up XFS on NAS partition - mounted on all Set up XFS on NAS partition - mounted on all machines.machines.
NAS stores all user and grid data, streams over NAS stores all user and grid data, streams over NFS.NFS.
Storage Element gateway for Grid storage on Storage Element gateway for Grid storage on NAS array.NAS array.
Condor Batch Job Condor Batch Job ManagerManager
Batch job system that enables distribution of Batch job system that enables distribution of workflow jobs to compute nodes.workflow jobs to compute nodes.
Distributed computing, NOT parallel.Distributed computing, NOT parallel.
Users submit jobs to a queue and system finds Users submit jobs to a queue and system finds places to process them.places to process them.
Great for Grid Computing, most-used in Great for Grid Computing, most-used in OSG/CMS.OSG/CMS.
Supports “Universes” - Vanilla, Standard, Grid...Supports “Universes” - Vanilla, Standard, Grid...
Personal Condor / Central Manager
Master
collector
negotiator
startd
schedd
MasterMaster: Manages all daemons: Manages all daemons
NegotiatorNegotiator: “Matchmaker” between idle jobs and : “Matchmaker” between idle jobs and pool nodes.pool nodes.
CollectorCollector: Directory service for all daemons. : Directory service for all daemons. Daemons send ClassAd updates periodically.Daemons send ClassAd updates periodically.
StartdStartd: Runs on each “execute” node.: Runs on each “execute” node.
ScheddSchedd: Runs on a “submit” host, creates a : Runs on a “submit” host, creates a “shadow” process on the host. Allows manipulation “shadow” process on the host. Allows manipulation of job queue.of job queue.
Master
Collectorschedd
negotiator
Workstation
Master
startdschedd
Workstation
Master
startdschedd
Central Manager
Cluster Node
Master
startd
Cluster Node
Master
startd
Typical Condor setupTypical Condor setup
Condor PriorityCondor Priority
User priority managed by complex algorithm User priority managed by complex algorithm (half-life) with configurable parameters.(half-life) with configurable parameters.
System does not kick off running jobs.System does not kick off running jobs.
Resource claim is freed as soon as job is Resource claim is freed as soon as job is finished.finished.
Enforces fair use AND allows vanilla jobs to Enforces fair use AND allows vanilla jobs to finish. Optimized for Grid Computing.finish. Optimized for Grid Computing.
OSG MiddlewareOSG Middleware
OSG middleware installed/updated by Virtual Data OSG middleware installed/updated by Virtual Data Toolkit (VDT).Toolkit (VDT).
Site configuration was complex before 1.0 release. Site configuration was complex before 1.0 release. Simpler now.Simpler now.
Provides Globus framework & security via Certificate Provides Globus framework & security via Certificate Authority.Authority.
Low maintenance: Resource Service Validation (RSV) Low maintenance: Resource Service Validation (RSV) provides snapshot of site.provides snapshot of site.
Grid User Management System (GUMS) handles Grid User Management System (GUMS) handles mapping of grid certs to local users.mapping of grid certs to local users.
BeStMan StorageBeStMan Storage
Berkeley Storage Manager: SE runs basic Berkeley Storage Manager: SE runs basic gateway configuration - short config but hard gateway configuration - short config but hard to get working.to get working.
Not nearly as difficult as dCache - BeStMan is a Not nearly as difficult as dCache - BeStMan is a good replacement for small to medium sites.good replacement for small to medium sites.
Allows grid users to transfer data to-and-from Allows grid users to transfer data to-and-from designated storage via LFN e.g.designated storage via LFN e.g.srm://uscms1-se.fltech-grid3.fit.edu:8443/srm/v2/server?SFN=/bestman/srm://uscms1-se.fltech-grid3.fit.edu:8443/srm/v2/server?SFN=/bestman/BeStMan/cms...BeStMan/cms...
WLCGWLCG
Large Hadron Collider - expected 15PB/year. Large Hadron Collider - expected 15PB/year. Compact Muon Solenoid detector will be a large part Compact Muon Solenoid detector will be a large part of this.of this.
World LHC Computing Grid (WLCG) handles the data, World LHC Computing Grid (WLCG) handles the data, interfaces with sites in OSG, EGEE (european), etc.interfaces with sites in OSG, EGEE (european), etc.
Tier 0 - CERN, Tier 1 - Fermilab, Closest Tier 2 - Tier 0 - CERN, Tier 1 - Fermilab, Closest Tier 2 - UFlorida.UFlorida.
Tier 3 - US! Not officially part of CMS computing Tier 3 - US! Not officially part of CMS computing group (i.e. no funding), but very important for group (i.e. no funding), but very important for dataset storage and analysis.dataset storage and analysis.
T2/T3 sites in the UST2/T3 sites in the US
T3T3
T3T3T3T3
T3T3
T3T3
https://cmsweb.cern.ch/sitedb/https://cmsweb.cern.ch/sitedb/sitelist/sitelist/
T3T3
T3T3
T3T3
T3T3
T3T3
T2T2
T2T2
T2T2 T2T2
T2T2
T2T2
T2T2
T3T3
T3T3
Local Usage TrendsLocal Usage Trends
TrendsTrends
Over 400,000 cumulative hours for CMSOver 400,000 cumulative hours for CMS
Over 900,000 cumulative hours by local Over 900,000 cumulative hours by local usersusers
Total of 1.3 million CPU hours utilizedTotal of 1.3 million CPU hours utilized
Tier-3 SitesTier-3 Sites
Not yet completely defined. Consensus: T3 Not yet completely defined. Consensus: T3 sites give scientists a framework for sites give scientists a framework for collaboration (via transfer of datasets), also collaboration (via transfer of datasets), also provide compute resources.provide compute resources.
Regular testing by RSV and Site Availability Regular testing by RSV and Site Availability Monitoring (SAM) tests, and OSG site info Monitoring (SAM) tests, and OSG site info publishing to CMS.publishing to CMS.
FIT is one of the largest Tier 3 sites.FIT is one of the largest Tier 3 sites.
PhEDExPhEDEx
Physics Experiment Data Exports: Final milestone for our Physics Experiment Data Exports: Final milestone for our site.site.
Physics datasets can be downloaded from other sites or Physics datasets can be downloaded from other sites or exported to other sites.exported to other sites.
All relevant datasets catalogued on CMS Data All relevant datasets catalogued on CMS Data Bookkeeping System (DBS) - keeps track of locations of Bookkeeping System (DBS) - keeps track of locations of datasets on the grid.datasets on the grid.
Central web interface allows dataset copy/deletion Central web interface allows dataset copy/deletion requests.requests.
DemoDemo
http://myosg.grid.iu.eduhttp://myosg.grid.iu.edu
http://uscms1.fltech-grid3.fit.eduhttp://uscms1.fltech-grid3.fit.edu
https://cmsweb.cern.ch/dbs_discovery/https://cmsweb.cern.ch/dbs_discovery/aSearch?aSearch?caseSensitive=on&userMode=user&sortOrdercaseSensitive=on&userMode=user&sortOrder=desc&sortName=&grid=0&method=dbsapi&=desc&sortName=&grid=0&method=dbsapi&dbsInst=cms_dbs_ph_analysis_02&userInput=fdbsInst=cms_dbs_ph_analysis_02&userInput=find+dataset+where+site+like+*FLTECH*+andind+dataset+where+site+like+*FLTECH*+and+dataset.status+like+VALID*+dataset.status+like+VALID*
CMS Remote Analysis CMS Remote Analysis Builder (CRAB)Builder (CRAB)
Universal method for experimental data Universal method for experimental data processingprocessing
Automates analysis workflow, i.e. status Automates analysis workflow, i.e. status tracking, resubmissionstracking, resubmissions
Datasets can be exported to Data Discovery Datasets can be exported to Data Discovery PagePage
Locally used extensively in our muon Locally used extensively in our muon tomography simulations.tomography simulations.
Network PerformanceNetwork Performance
Changed to a default 64kB blocksize across NFSChanged to a default 64kB blocksize across NFS
RAID Array change to fix write-cachingRAID Array change to fix write-caching
Increased kernel memory allocation for TCPIncreased kernel memory allocation for TCP
Improvements in both network and grid transfer Improvements in both network and grid transfer ratesrates
DD copy tests across networkDD copy tests across network
Changes from 2.24 to2.26 GB/s in readingChanges from 2.24 to2.26 GB/s in reading
Changes from 7.56 to 81.78 MB/s in WritingChanges from 7.56 to 81.78 MB/s in Writing
Block size
WRITE MB/s READ GB/s
64k 12.7 84.4 0.42 2.513.1 81.7 0.45 2.313.1 81.5 0.48 2.214 76.5 0.53 2.3
12.6 84.8 0.46 2.313.1 81.78 0.468 2.26
Block size WRITE MB/s READ GB/s64k 102.9 10.4 0.45 2.4
94.5 11.4 0.49 2.2288.5 3.7 0.49 2.2244.89 4.4 0.49 2.2
135 7.9 0.49 2.2173.15
8 7.56 0.482 2.24
TCP:S TCP:CUDP: jitter lost
(Mbits/sec)
753 754 0.11 0 1.05912 913 0.022 0 1.05896 897 0.034 0 1.05891 892 0.393 0 1.05888 889 1.751 0 1.05868 869 0.462 0 1.05
Iperf on the Iperf on the Frontend BeforeFrontend Before
TCP: S TCP: CUDP: jitter lost
(Mbits/sec)
941 942 0.048 0 1.05939 940 0.025 0 1.05935 937 0.022 0 1.05930 931 0.023 0 1.05941 942 0.025 0 1.05
937.2 938.4 0.0286 0 1.05
Iperf on the Iperf on the Frontend AfterFrontend After
DD on the DD on the Frontend AfterFrontend After
DD on the DD on the Frontend BeforeFrontend Before