HEP DataGrid inchep.knu.ac.kr/HEPDGWS04/ihdgw/talks/27_aug/Ping_Yeh.pdf · LHC Computing Grid (LCG)...
Transcript of HEP DataGrid inchep.knu.ac.kr/HEPDGWS04/ihdgw/talks/27_aug/Ping_Yeh.pdf · LHC Computing Grid (LCG)...
HEP DataGrid in HEP DataGrid in
Ping Yeh (葉平 )
National Taiwan University
A Minority ReportA Minority Report
HEP DataGrid Workshop
August 26 – 28, 2004
Daegu, Korea
There's Something about Taiwan
● Size: 35,980 km2, about 13% smallerthan Switzerland. ~ 9.5% of Japan.– 1,566 km of coastline– Mountain Jade: 3,952 m.– Tropical southern part of the island
● Population: 23 million– Taiwanese (~ 70%), Hakka
(~ 14%), mainland Chinese (14%), aborigine 2%, allspeaking different languages
Academic Network Links
in Taiwan
International Link 2004 Summer
JapanJapan
US-EastUS-East
Hong KongHARNet
NL-Amsterdam
AMSIX
622M
622M 2.4G
6.2G (now)* 2.5 to Seattle* 3.7 to PAIX
2.4Gbps
US-WestUS-West
SingaporeSingAREN
155M
622M + 1G
Domestic Network
● TANET: Taiwan Academic NET, the first academic IP network in Taiwan, connect universities, schools, reseach organizations.
● TANET2: IP network with reserved bandwidth for research universities and organizations, connected to Internet-2.
● TWAREN: Taiwan Advanced Research and Education Network– Network equipments installed, being intensively tested.– Connected to all major research universities in Taiwan.– Start service anytime now.– Backbone: 80 Gbps, GigaPoPs: 145 Gbps– Six dark fibers
TWAREN TopologyAcademia Sinica
Nat'l Taiwan Univ.
Nat'l Central Univ.
Nat'l Tsinghua U.
Nat'l Chiaotung U.Nat'l Chunghsing U.
Nat'l Chinan Univ.
Nat'l Chungcheng U.
Nat'l Chengkong U.
Nat'l Sun Yat Sen U.
Nat'l Donghwa U.
Taipei
Hsinchu
Taichung
Tainan
HEP Data Grid in Taiwan
Taiwanese Players Involved in Grid
● Institute of Physics, Academia Sinica (AMS, ATLAS, CDF, Texono)– ATLAS data challenge
● Department of Physics, National Central University (AMS, Belle, CMS, Phobos):– CMS data production and data challenge– Working with KEK and Melbourne on Belle datagrid
● Department of Physics, National Taiwan University (Belle, CMS, Ashra/NuTel)– CMS data production and data challenge– Ashra data management in the future
All-sky Survey High Resolution Air-shower detectorCherenkov/Fluorescense photon detector @ 1 arcmin resolution to be installed in Hawaii Big Island in 2005.
... Players Involved (cont.)
● Academia Sinica Computing Centre– Support all HEP institutes in Taiwan on technology,
manpower, consultation, and sometimes network connection– The tier 1.5 center for ATLAS and CMS– Has helped other disciplines (digital archive, bioinformatics) to
port their applications to grid. Is gradually expanding to support more disciplines.
Strategies
● Even though Taiwan is strong in PC manufacturing, it is very small: limited manpower, limited budget.
● Hard to develop from scratch, prefer to learn by following / participating in international projects.
● ASCC recognize HEP as the driver of Grid technology before Grid is ready for layman.
● So: ASCC, IPAS, NCU chose to participant in LCG, while NTU chose to test Grid3.
Hardware Resources
CPU RAM (each) Storage ManpowerASCCIPAS 22 1GB 0.6 TB 2NCU 24+60 1GB 2 TB 1NTU 24 256MB – 1GB 3.2 TB 1
Involvement in the
LHC Computing Grid
(LCG) Project
LCGLCG
Brief History of Participation
● First 2 deployment sites in Asia: ASCC and NCU● ASCC is very active in participation:
– Rotating 2- 6 staffs stationed at CERN: September 2002– LCG-0 deployed: March 19, 2003 (3rd after RAL and CNAF)– EDG testbed deployed: March 2003– Academia Sinica Grid Computing Certification Authority
(ASGCCA) accepted: June 12, 2003– LCG-1 testbed ready: July 30, 2003– LCG-2 deployed: February 2, 2004– Mass Storage Service Installed: July 15, 2004
● Current deployment version: LCG-2-1-1
LCG testbeds in Taiwan
● NCU LCG testbed: for CMS, but allow LHCb data challenge jobs to run when CMS doesn't occupy all resources.
● ASCC LCG testbed: used for all 4 LHC experiments.– 1 CE, 2 SE, 1 RB, 1 BDII, 1 Proxy, 2 UI, 1 LCFGng– Mass storage: Castor Installed on IBM 3854 (225 tapes) with
help from CERN experts (Benjamin Couturier and Jean-Philippe Baud), access via GridFTP.
CPU RAM (each) StorageASCC 138 1GB 24TB(D) + 22.5TB(T)IPAS 22 1GB 0.6 TB(D)NCU 24+60 1GB 2 TB(D)
ASGCCA: Academia Sinica Grid Computing Certification Authority
● Online since July 2002 (http://ca.grid.sinica.edu.tw/)– Approved for LCG on June 12, 2003, the first Asian CA for
LCG.– Also recognized by Grid3.
● Enrollment, renewal and revocation of certificates– User, host and service certificates– All records are archived
● Serves Taiwan and China for now
Certificates
TotalIssued 43 56 0 99Effective 30 51 0 81Revoked 13 5 0 18
User Certificate
Host Certificate
Service Certificate
Subscriber RA CA
ASGCDS1
2
74
3
56Applicant Registration
AuthorityCA
ASGCDirectoryService1
2
74
3
56
● Requires a registration authority in applicant's organization to accept an application.
● Uses a directory service to keep track of certificates.
Job Statistics
● Three types of jobs uses the resource:– local, ATLAS, CMS validation
100 CPU
Accounting started
138 CPU
HT enabled2004
ATLAS Data Challenge DC2
● Tier-1 Center (ASCC)– LCG software installation– LCG job submission– Software release coordination
● Tier-2 (IPAS)● 1135 Jobs, 79% success rate (higher than before)
July 1, 2004
ATLAS DC2
CMS Production
● Continuous running to generate Monte Carlo samples used for physics study and data challenges
● NCU: running production with local farms (60 CPUs) and transfer data to CERN with SRB
● ASCC: is going through the validation process● NTU: is going through the validation process with production
jobs submitted via Grid3.
LHCb VO in NCU
Open VO for LHCbwhen CMS productioncannot be run on LCG
Contribute~ 0.5 %
ASCC on LCG● ASCC Is heavily involved in LCG since 2002● Started with deploying testbeds, gradually expand the team
to participate in development work of LCG● Operation and support:
– The Tier "1.5" center for HEP in Taiwan– One of two Grid Operations Centre (GOC) of LCG– One of two Global Grid User Support (GGUS) Centres of LCG
● Development– GIIS Monitor– EMS– ARDA– LCG 3D
Support Teams within LCG
CERN DeploymentSupport (CDS)
Middleware Problems
Grid OperationsCenter (GOC)
Operations Problems
Global Grid User Support (GGUS)Single Point of Contact
Coordination of User Support
Regional Centers (RC)
Hardware Problems
Experiment SpecificUser Support (ESUS)
Software Problems
OtherCommunities
(VOs)
4 LHC experiments(ALICE, ATLAS,
CMS, LHCb)
4 non-LHC experiments(BaBar, Compass,
CDF, D0, )
LCG Grid Operations Centre (GOC)
● GOCs will act as a central point of operational information such as configuration information and contact details.
● The GOC has a resposibility for co-ordinating and monitoring the operation of the Grid Infrastructure– Job submissions, Lifetime of host certificates of computing
elements and storage elements, etc.● GOCs work with Local Support Groups to assist them in
providing the best possible service while their equipment is connected to the Grid.
● Two GOCs in LCG now:– RAL @ UTC+0: http://goc.grid-support.ac.uk/gridsite/gocmain/– ASCC @ UTC+8: http://goc.grid.sinica.edu.tw/
GIIS Monitor
● ASCC is developing the GIIS monitor● Monitors and/or checks the following dynamic resources:
– Site information & configuration (based on BDII information)– GIIS availability– BDII LDAP schema– Statistics on general usage– Extendable with python plugins
● http://goc.sinica.edu.tw/gstat/
GIIS Monitor Screenshot
Issues on Monitoring
● Multiple existing monitoring systems
● Correlation is difficult● Need experts familiar with
all systems to make quickresponse and judgement calls
● Lack of automatic notification
EMS (Event Management System)
Monitor App Monitor App Monitor App
EVEN
TS
EVEN
TS
EVEN
TS
NO
TIFY
Overview of EMS
● Main developer: Min Tsai of IPAS/ASCC● Provide a centralized event management service● Archiving of event histories● Console interface
– facilitate event correlation● Event notification and policy management
Architecture of EMS
Monitor App Monitor App Monitor App
Produdcer: Soap
EventArchive
NotificationSystem
EMS Service
Listener: Soap
Current Status of EMS
● Initial use case analysis● Prototype development
– EMS core and API definition– Integration with GIIS monitor– Command line listener– Released in mid August 2004
● Future plans– Integration with other GOC tools– Notification listener– Archive listener– Test RGMA as transport
LCG Global Grid User Support Centre (GGUS)
● A single point of contact for Grid users for all kinds of problems
● Provides a web portal for problem report submission, status information, user FAQ and news
● Goal: 24x7 support by 3 support teams at different time zones
● Two GGUS centers in LCG now:– GGUS FZK: Karlshule @ UTC+1– GGUS ASCC: Academia Sinica @ UTC+8– Really like to have a 3rd GGUS in America
● http://www.ggus.org/
User Problem Process Model
ESUSESUS CUSCUS
Local operationsLocal operations
GGUS SupportGGUS SupportInteraction Interaction
Interaction
Support request
Support request
Red lines: Interface the GGUS support system and/or supply an interface to GGUS system
GOCGOC
Probably Interaction
Feedback &
Solution
UserProblem
Accumulation of Knowledge: FAQ & KB
Two Grid OperationsCentres (GOC):RAL & ASCC
Two Global Grid User SupportCentres (GGUS):GridKA & ASCC
Participation of
Development
in LCG
ARDA● Started as a LCG project in 2003: "A Roadmap to
Distributed Analysis" Workgroup.– All 4 LHC experiments involved
● November 2003: ARDA RTAG Report– Blueprint "Architectural Roadmap for Distributed Analysis"– Set of collaborating Grid services and their interfaces– http://www.uscms.org/s&c/lcg/ARDA/ARDA-report-final.pdf– Recommendataions:
● New service decomposition● Role of experience, existing technology: web service framework● interfacing to existing middlewares to enable their use in the
experiment software frameworks● Early deployment of (a series of) prototypes to ensure
functionality and coherence.
EGEE Middleware
ARDA Project
ARDA (cont.)
● January 2004: ARDA Workshop 1– ARDA Project = A Realisation of Distributed Analysis
● Coordination and early integration between Generic Middleware (EGEE) and LHC experiments' softwares
● May 2004: EGEE prototype– glite = new-generation generic middleware
● very first prototype available internally to ARDA group● June 2004: ARDA Workshop 2
– "First 30 days of ARDA prototype"● Fall 2004: ARDA Workshop 3
– summary of the first phase of ARDA prototype
ARDA ProjectCollaborationCoordination
IntegrationSpecifications
PrioritiesPlanning
Generic Middleware ProjectEGEE/VDT/..
ALICEDistr.
analysis
ALICEDistr.
analysis
ALICEDistr.
analysis
ALICEDistr.
analysis
Resource Providers –Regional Centres
GDB Security Group
……..
SEAL
POOL
……..
GAE
PROOF
Specifications
ExperienceARDA ProjectCollaborationCoordination
IntegrationSpecifications
PrioritiesPlanning
Generic Middleware ProjectEGEE/VDT/..
ALICEDistr.
analysis
ALICEDistr.
analysis
ALICEDistr.
analysis
ALICEDistr.
analysis
Resource Providers –Regional Centres
GDB Security Group
……..
SEAL
POOL
……..
GAE
PROOF
ARDA ProjectCollaborationCoordination
IntegrationSpecifications
PrioritiesPlanning
Generic Middleware ProjectEGEE/VDT/..
ALICEDistr.
analysis
ALICEDistr.
analysis
ALICEDistr.
analysis
ALICEDistr.
analysis
ALICEDistr.
analysis
ATLASDistr.
analysis
CMSDistr.
analysis
LHCbDistr.
analysis
Resource Providers –Regional Centres
GDB Security Group
……..
SEAL
POOL
……..
SEAL
POOL
……..
GAE
PROOF
……..
GAE
PROOF
GAG
Specifications
Experience
RequirementsGuidelines
Manpower in ARDA Project
● LCG:– Project leader (Massimo Lamanna/CERN)– 4 LCG staff (100% at CERN) matching the 4 EGEE staff– About 2 FTEs from other sources (not always at CERN)
● EGEE:– 4 NA4 staff
● Stakeholders:– EGEE MW team, Experiment interfaces (ALICE, ATLAS, CM
S, LHCb), E2E integration team (4 EGEE + CERN matching), and users from the 4 experiments.
ASCC invovlement
ASCC in ARDA
● Four people involved: Eric Yen, Claude Wang, WeiLong Wong and Doug Chen.
● Concentrate on software testing and integration:– gLite (EGEE middleware) testing and integration.– LHCb metadata catalog testing and integration, with local
support of expertise on Oracle.– ATLAS analysis tool (DIAL) testing and integration.
CERN/ASCC Tests ofLHCb Metadata Catalog
Client
TAIWAN
Oracle DB
Oracle DB
Bookkeeping Server
Bookkeeping Server CPU Load
Network Process time
Web & XML-RPC Service performance tests CPU Load Network Process time
DB I/O Sensor
Network monitorVirtual Users
● Clone Bookkeeping DB in Taiwan
● Install theWS layer
● Performance Tests– Database I/O Sensor– Bookkeeping Server performance tests
● ASCC/CERN Bookkeeping Server DB, XML-RPC service, CPU Load, Network send/recv sensor, Process time
– Client Host performance tests● Feedback to LHCb
LCG 3D
● Stands for Distributed Deployment of Databases for LCG, a new project endored by LCG in July 2004.
● Goal is to make a abstract database layer, such as pool, to make programs "database location transparency".
● Based on its expertise in Oracle database, ASCC is interested in putting in ~ 1 FTE initially.– Now Mark Ho of ASCC is in the Service Definition and
Implementation Working Group.– Will work on development and deployment.
Non-LCG HEP Grid
Activities in Taiwan
DCAF for CDF
● IPAS is responsible for the on-site data production farm of CDF in Fermilab
● CDF is seeking a way to distribute data production tasks to collaboration institutes.
● IPAS/ASCC has sent an FTE to Fermilab to work on the transition for 6 months.
● For more details please see Kihyeon Cho's talk.
SRB for Belle
● ASCC and NCU started SRB test with KEK in August 2004● See Sasaki-san's talk for more details.
Grid3 for CMS
● NTU installed Grid3 on a small cluster and ran CMS jobs from the U.S. in February 2004.
● Expanded to 24 CPUs by summer 2004.● Successfully submit CMS jobs (CMKIN and CMSIM) with
McRunjob and MOP in August 2003.● Successfully ran MonALISA on the Grid3 testbed of
Academia Sinica.● NTU is now in the process of officially joining the Grid3+
testbeds.● Would like to continue the participation as Grid3 evolves into
Open Science Grid.
Outreach andFrom LCG Toward e-Science
Symposia & Tutorials
● Annual International Symposium on Grid Computing (ISGC) at Academia Sinica since 2003.– Has a one-day LCG-Asia Workshop this year– Offered hands-on tutorials of LCG in local language.
● Plan to have a tutorials tour in Taiwan to bring more people's attention to a operational grid.
A Newly Funded Project
● New project is funded by National Science Council: US$432K for 1 year starting on September 1, 2004.
● Major participants: ASCC, IPAS, NCU, NTU● Objectives:
– Construct e-Science infrastructure, based on LCG/EGEE, by collaborating with multi-discipline institutes in Taiwan.
● Health● Bioinformatics● Digital Archive● Astrophysics
– Cooperate with industry partners to develop enterprise grid applications.
– Formulate a concrete plan for Taiwan National Grid.
Future Plans: Physics Institutions
● IPAS, NCU and NTU:– Participate in data poduction and data challenges in 2005 and
2006 for respective experiments (ATLAS and CMS)– Be ready for real data in 2007.
● NCU: Establish data grid connection with KEK (with support from ASCC)
● NTU: Build data grid for air shower simulation and data management for the Ashra collaboration.
Future Plans: ASCC
● Develop World Wide Grid infrastructure– Taiwan– Asia
● eScience– Leverage experiences gained in LCG to extend Grid
technology to other research domains.– Develop required technology for our applications– Address interoperability of Grid systems when encountered
● Outreach and Collaboration– Promote LCG/EGEE technology and applications– Collaborate in technology and applications development with
other institutes.
Thank you!