A Regional Analysis Center at the University of Florida Jorge L. Rodriguez University of Florida...
-
Upload
percival-booker -
Category
Documents
-
view
216 -
download
3
Transcript of A Regional Analysis Center at the University of Florida Jorge L. Rodriguez University of Florida...
A Regional Analysis Center at the
University of Florida
Jorge L. RodriguezUniversity of FloridaSeptember 27, [email protected]
2Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Outline
• Facility’s Function and Motivation• Physical Infrastructure• Facility Systems Administration• Future Plans
3Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Facility’s Function & Motivation
Operational support for various organization– Experimental High Energy Physics
• CMS (Compact Muon Solenoid @ the LHC)• CDF (Central Detector Facility @ FNAL)• CLEO ( e+e- collider experiment @ Cornell)
– Computer Science• GriPhyN • iVDGL
– “Friends and Neighbors”
US Grid computing projects
4Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
High Energy Physics Activities
• US-CMS Tier2 Center– Monte Carlo (MC) production
• Small number of expert and dedicated users • Primary consumer of computing resources
– Support US-CMS regional analysis community• Larger number (~40) of not so expert users• Large consumer of disk resources
• CLEO & CDF analysis and MC simulation – Local CDF activities in recent past – Expect ramp up of local CLEO-C activities
5Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Computer Science Research• Grid computing research projects
– iVDGL & Grid3 production cluster(s)• Grid3 Production site• Provides resource to 100s of users via grid “logins”
– GriPhyN & iVDGL Grid development• Middleware infrastructure development
– grid3dev and other testbeds
• Middleware and Grid application development– GridCat : http://www.ivdgl.org/gridcat– Sphinx : http://www.griphyn.org/sphinx– CAVES : http://caves.phys.ufl.edu
• Need to test on real cluster environments
6Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
The Primary Challenge
All of this needs to be supported with minimal staffing, “cheap” hardware and moderate expertise
Physical Infrastructure
8Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Physical Infrastructure• Server Room
– Dedicated floor space from dept.– Our feed into campus backbone
FLR upgrade to 10 Gbps by 2005
• Hardware: Currently 75 servers – Servers: mix of Dual PIII & P4 Xeon– LAN: mix of FastE and GigE – Total of 9.0 TB of Storage, 5.4 TB
on dCache • 4U dual Xeons fileservers w/dual
3ware RAID controllers…• SunEnterprise with FC and RAID
enclosures– More storage and servers on order
• New dedicated analysis farm• Additional 4 TB dual Opteron
system 10 GigE ready (S2io cards)
9Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Video Conferencing• Two Polycom
equipped conference rooms – Polycom ViewStations
H.323 and H.262– Windows XP PCs
• Access Grid– Broadcast and
participate in lectures from our large conference room
10Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Visitor’s Offices and Services
• Visitor Work Spaces– 6 Windows & Linux
Desktops– Expanding visitor workspace
Workspaces, printers, LCD projector …
• Espresso Machine– Cuban Coffee– Lighthearted conversation
Facility Systems Administration
Design Overview
12Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Facility Design Considerations
• Centralization of systems management– A single repository of server, cluster and meta-
cluster configuration and description • Significantly simplifies maintenance and upgrades• Allows for easy resource deployment and reassignments
– Organization is a very, very good thing!
• Support multiple versions of the RedHat dist.– Keep up with HEP experiments expectations – Currently we support RH7.3 and RHEL 3– The future is Scientific linux (based on RHEL) ?
13Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Cluster Management Technology
We are a pretty much a ROCKS Shop!– ROCKS is a open source Cluster Distribution – Its main function is to deploy an OS on a cluster– ROCKS is layered on top the RedHat dist.– ROCKS is extensible
ROCKS provides us with the framework and
tools necessary to meet our design requirements simply and efficiently
http://www.rocksclusters.orghttp://www.rocksclusters.org
14Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
ROCKS the 5 Minute Tour
ROCKS builds on top RH kickstart technology– A simplistic Kickstart digression
• A kickstart is a single ASCII script – Lists a single and/or groupings of RPMS to be installed– Proviced staged installation sections %pre, %main %post …
• Anaconda the installer– Parses and processes kickstart commands and installs the
system
This is in fact what you interact with when you install RedHat on your desktop
15Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
ROCKS the 5 Minute Tour cont.
ROCKS enhances kickstart by – Providing machinery to push installations to servers
• DHCP: node identity assignment to a specific MAC address• https: protocol used to exchange data (kickstart, images,
RPM)• cgi script: generates kickstart on the fly
– ROCKS also provides a kickstart management system• kickstart generator parses user defined XML spec. files and
combines that with node specific information stored in a MySQL database
• Complete system description is packaged in components grouped into logical object modules
16Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
ROCKS the 5 Minute Tour cont.
The standard ROCKS graphdescribes a single cluster with service nodesNote: use of “rolls”
17Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
gatekeeper-pg
ROCKS’ and MySQL/XML
frontend
gatekeeper
compute
gaterkeeper-dgt
compute-pg
compute-dgt
uflorida-frontend
grinux01
grinux40
ufloridaPG
ufloridaDGT
grinuxN
grinuxM
ufgrid01
gatekeeper-grid01
compute-grid01
XML graphs:appliances are like the classes
Glo
bal
Var
iab
les,
MA
C a
dd
res
s,
no
de
nam
es,
dis
trib
uti
on
, m
emb
ersh
ip,
app
lian
ces
…
Physical servers:are the objects
ROCKS’ MySQL
DataBase
Facility Systems Administration
Implementation
19Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
ROCKS at UFlorida
• Current installation is based on ROCKS 3.2.0
• Basic graph architecture modified to meet our requirements – Single frontend manages multiple clusters and
service nodes– Support dual distributions RH7.3 & RHEL 3– Direct interaction with ROCKS MySQL database– Extensive modification to xml trees
• The XML tree is based on an older ROCKS vers. 2.3.2• Our own “uflorida” graphs, one for each distribution• Many changes & additions to the stnd XMLs
20Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
The uflorida XML graphs
uflorida.xml for RHEL 3 servers
uflorida.xml for RH7.3 servers
21Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
grid users only and limited access from WAN
The Servers RHEL 3
RH7.3
uflorida-frontend
gatoraid1
gatoraid2
grinuxN+1
grinuxM
grinux41
grinuxN
grinux01
grinux40
nfs-homes
ufdcache
alachua archer micanopy
ufgrid01
ufgrid03
ufgrid02
ufgrid04
GriPhyN
ufloridaDGTufloridaPG
WAN
all users and limited access from WAN
private LAN
No user log in and very limited or no access from the WAN
22Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
private LAN
uflorida-frontend
gatoraid1
gatoraid2
grinuxN+1
grinuxM
grinux41
grinuxN
grinux01
grinux40
nfs-homes
ufdcache
alachua archer micanopy
ufgrid01
ufgrid03
ufgrid02
ufgrid04
GriPhyN
ufloridaDGTufloridaPG
WAN
The Services RHEL 3
RH7.3
The big kahuna: Our stripped down version of the ROCKS frontend
• ROCKS administration node• RPM server• ROCKS DB server• Kickstart generation• DHCP server• Admin nfs server • etc
• No users, no logins, no home … • Strict firewall rules• Primary DNS server
NFS servers• Users home area• Users data • Grid users
User Interface machines• Grid User login• Access to grid via condorG
GriPhyn & Analysis Servers • User login machines• User environment
Other Services• cvs pserver • webserver
23Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
private LAN
Batch System ServicesRHEL 3
RH7.3
ufdcache
alachua archer micanopy
ufgrid01
ufgrid03
ufgrid02
ufgrid04
GriPhyN
ufloridaDGTufloridaPG
WAN
uflorida-frontend
gatoraid1
gatoraid2
grinuxN+1
grinuxM
grinux41
grinuxN
grinux01
grinux40
nfs-homes
gatekeeper / master nodes • Grid access only • GRAM, GSIFTP…• Other services minimum
24Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
private LAN
dCache with SRM: Virtual file system (pnfs) with cache’ing
ufdcache
alachua archer micanopy
ufgrid01
ufgrid03
ufgrid02
ufgrid04
GriPhyN
ufloridaDGTufloridaPG
WAN
dCache Disk Storage Service RHEL 3
RH7.3
dCache pool nodes 40 x 50 GB partitions
uflorida-frontend
gatoraid1
gatoraid2
grinuxN+1
grinuxM
grinux41
grinuxN
grinux01
grinux40
nfs-homes
dCache Administrator “admin door”SRM & dCache webserverRAID fileserverEntire 2 TBs of disk on dCache
25Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
The uflorida XML graphsRHEL 3 graph
RH 7.3 graph
The production cluster with dCache pool compute nodes
The analysis cluster NIS server and clients analysis environment
26Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
UFlorida Website Currently just the default ROCKS website. Ours is
accessible only from .ufl.edu domain
the ganglia pages
Status and Plans
28Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Facility’s Status• Participated in all major CMS data challenges
– Since Fall of 2001: major US contributor of events Since end of 2002: GRID only MC production
– Contributed to CDF analysis• UFlorida Group: 4 faculty & staff and 2-3 students
• Recently re-deployed infrastructure– Motivated mostly by security concerns– Added new clusters for additional user
communities
• Ram up to support full USCMS Tier2 center
29Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Facility’s Future• Plan to re deploy with ROCKS 3.3.x
– Recast our own 2.3.2 based XMLs as real 3.3.x – Provide access to new features
• Recast our tweaked and tuned services terms of Rolls• Make use of new ROCKS on the WAN
• Improve collaboration with our peers– With US-CMS FNAL (Tier1)– State wide FIU, FSU (Tier3) and campus wide
HPC – Other Tier2 centers US CMS & Atlas
30Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Facility’s Future
• New hardware is always comming– New dedicated analysis and files servers on
order• 6 new dual Xeon based servers• 8.0 TB of disk
– New Opteron systems on order• Participate in SC ‘04 bandwidth challenge CIT to JAX• Connect to new FLR 10GigE network equipment
– Official Tier2 center will bring new hardware• Approximately 120 servers• Additional 200 TB of storage
31Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Summary
We have successfully built, from scratch, a computing facility at the University of Florida– Support HEP experiments (LHC, FNAL …) – Support Computer Science activities– Expect a much more active analysis/user
community
Infrastructure designed to support an large increase in hardware in support of a larger community of users in anticipation of LHC turn on and beyond