LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project ...

21
LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project https://lcg3d.cern.ch CHEP 2006, 15th February, Mumbai

Transcript of LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project ...

Page 1: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Project Status and Production Plans

Dirk Duellmann, CERN IT

On behalf of the LCG 3D project

https://lcg3d.cern.ch

CHEP 2006, 15th February, Mumbai

Page 2: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 2

Related Talks

– LHCb conditions database framework [168 M. Clemencic]

– Database access in Atlas computing model[38 A. Vaniachine]

– Software for a variable Atlas detector description [67 V. Tsulaia]

– Optimized access to distributed relational database system[331 J. Hrivnac]

– COOL Development and Deployment - Status and Plans[337 A.Valassi]

– COOL performance and distribution tests [338 A. Valassi poster]

– CORAL relational database access software [329 - I. Papadopoulos]

– POOL object persistency into relational databases [330 G. Govi poster]

Page 3: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 3

Distributed Deployment of Databases (=3D)

• LCG today provides an infrastructure for distributed access to file based data and file replication

• Physics applications (and grid services) require similar services for data in relational databases

– Physics applications and grid services use RDBMS

– LCG sites have already experience in providing RDBMS services

• Goals for common project as part of LCG

– increase the availability and scalability of LCG and experiment components

– allow applications to access data in a consistent, location independent way

– allow to connect existing db services via data replication mechanisms

– simplify a shared deployment and administration of this infrastructure during 24 x 7 operation

• Scope set by LCG PEB

– Online - Offline - Tier sites

Page 4: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 4

LCG 3D Service Architecture

T2 - local db cache-subset data-only local service

MO

O

O

M

T1- db back bone- all data replicated- reliable service

T0- autonomous reliable service

Oracle Streamshttp cache (SQUID)Cross DB copy &MySQL/SQLight Files

O

Online DB-autonomous reliable service

F

S S

S S

R/O Access at Tier 1/2(at least initially)

Page 5: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 5

Building Block for Tier 0/1 -

Oracle Database Clusters• Two+ dual-

CPU nodes

• Shared storage (eg FC SAN)

• Scale CPU and I/O ops (independently)

• Transparent failover and s/w patches

Page 6: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 6

How to keep Databases up-to-How to keep Databases up-to-date? Asynchronous Replication date? Asynchronous Replication via Streamsvia Streams

CNAF

RAL Sinica

FNAL

IN2P3

BNL

CERN

CERN

LCRLCR

LCRLCR

LCRLCR

LCRLCR

LCRLCR

LCRLCR

LCRLCR

LCRLCR

insert into emp values ( 03, “Joan”,….)

applypropagationcapture applypropagationcapture

Slide : Eva Dafonte Perez

Page 7: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 7

Further Decoupling between Further Decoupling between DatabasesDatabases

CERN RAC

SOURCE DATABASE

COPY redo log files

DOWNSTREAM DATABASE DESTINATION SITES

CNAF

FNAL

CERN

propagation jobs

Objectives Remove impact of capture from Tier 0

Database2. Isolate Destination sites from each other

pair capture process + queue x each target site big Streams pool size redundant events ( x number of queues)

capture processcapture process

capture processcapture process

capture processcapture process

Slide : Eva Dafonte Perez

Page 8: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 8

Offline FroNTier Resources/Deployment

• Tier-0: 2-3 Redundant FroNTier servers.• Tier-1: 2-3 Redundant Squid servers.• Tier-N: 1-2 Squid Servers.• Typical Squid server requirements:

– CPU/MEM/DISK/NIC=1GHz/1 GB/100GB/Gbit

– Network: visible to Worker LAN (private network) and WAN (internet)

– Firewall: Two Ports open for URI (FroNTier Launchpad) access and SNMP monitoring (typically 8000 and 3401 respectively)

• Squid non-requirements– Special hardware (although high-throughput

Disk I/O is good)– Cache backup (if disk dies or is corrupted,

start from scratch and reload automatically)• Squid is easy to install and requires little

on-going administration.

Squid(s)Tomcat(s)

Squid Squid Squid

DB

Squid Squid Squid

Tier 0

Tier 1

Tier N

FroNTierLaunchpad

http

JDBC

Slide : Lee Lueking

Page 9: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 9

Test Status : 3D testbed

• Replication test progressing well

– Offline->T1: • COOL ATLAS : Stefan Stonjek (CERN, RAL, Oxford)• COOL LHCb : Marco Clemencic (CERN, RAL, GridKA?)• FroNtier CMS : Lee Lueking (CERN and several t1/t2 sites)

• ARDA AMGA: Birger Koblitz (CERN->CERN) • AMI : Solveig Albrandt (IN2P3->CERN - setting up)

– Online->offline: • CMS Conditions : Saima Iqbal (functional testing)• ATLAS : (Gancho Dimitrov) Server setup, pit network • LHCb : planning with LHCb online

• Coordination during weekly 3D meetings

Page 10: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 10

LCG Database Deployment Plan

• After October ‘05 workshop a database deployment plan has been presented to LCG GDB and MB

– http://agenda.cern.ch/fullAgenda.php?ida=a057112

• Two production phases • March - Sept ‘06 : partial production service

– Production service (parallel to existing testbed)– H/W requirements defined by experiments/projects– Based on Oracle 10gR2– Subset of LCG tier 1 sites: ASCC, CERN, BNL, CNAF, GridKA, IN2P3, RAL

• Sept ‘06- onwards : full production service– Adjusted h/w requirements (defined at summer ‘06 workshop)

– Other tier 1 sites joined in: PIC, NIKHEF, NDG, TRIUMF

Page 11: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 11

Proposed Tier 1 Hardware Setup

• Propose to setup for first 6 month

– 2/3 dual-cpu database nodes with 2GB or more• Setup as RAC cluster (preferably) per experiment• ATLAS: 3 nodes with 300GB storage (after mirroring) • LHCb: 2 nodes with 100GB storage (after mirroring) • Shared storage (eg FibreChannel) proposed to allow for clustering

– 2-3 dual-cpu Squid nodes with 1GB or more• Squid s/w packaged by CMS will be provided by 3D• 100GB storage per node• Need to clarify service responsibility (DB or admin team?)

• Target s/w release: Oracle 10gR2

– RedHat Enterprise Server to insure Oracle support

Page 12: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 12

DB Readiness Workshop last week

• Readiness of the production services at T0/T1

– status reports from tier 0 and tier 1 sites

– technical problems with the proposed setup (RAC clusters)?

• Readiness of experiment (and grid) database applications

– Application list, code release, data model and deployment schedule

– Successful validation at T0 and (if required T1)?

• Review site/experiment milestones from the database project plan

– (Re-)align with other work plans - eg experiment challenges, SC4

• Detailed presentations of experiments and sites at

– http://agenda.cern.ch/fullAgenda.php?ida=a058495

Page 13: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 13

CERN Hardware evolution for 2006

Current State

ALICE ATLAS CMS LHCb Grid 3D Non-LHC Validation

- 2-node offline

2-node 2-node 2-node - - 2x2-node

2-node online test

Pilot on disk server

Proposed structure in Q2 2006 2-node

4-node 4-node 4-node 4--node

2-node 2-node (PDB replacement

)

2-node valid/tes

t

2-node valid/te

st

2-node valid/test

2-node pilot

Compass??

Online?

• Linear ramp-up budgeted for hardware resources in 2006-2008

• Planning next major service extension for Q3 this year

Slide : Maria Girone

Page 14: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 14

Frontier Production Frontier Production Configuration at Tier 0Configuration at Tier 0

Squid runs in http-accelerator mode (as a reverse proxy server)

Slide : Luis Ramos

Page 15: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 15

Tier 1 Progress

• Sites largely on schedule for a service start end of March

– h/w either installed already (BNL, CNAF, IN2P3) or expect delivery of order shortly (GridKA, RAL)

– Some problems with Oracle Clusters technology encountered and solved!

– Active participation from sites - DBA community building up• First DBA meeting focusing on RAC installation, setup and monitoring hosted by Rutherford scheduled for second half of March

• Need to involve remaining Tier 1 sites now

– Establishing contact to PIC, NIKHEF, NSG, TRIUMF to follow workshops, email and meetings

Page 16: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 16

LCG Application s/w Status

• Finished major step towards distributed deployment:

– added common and configurable handling of server lookup, connection retry, failover and client side monitoring via CORAL

– COOL and POOL have released versions based on new CORAL package[talks by I. Papadopoulos and A. Valassi]

– FroNTier has been added as plug-in into CORAL

• CMS is working on FroNTier caching policy • FroNTier apps need to implement this policy to avoid stale cached

data lookups

• LCG persistency framework s/w expected to be stable by end of February for distributed deployment as part of SC4 or experiment challenges

• Caveat: the experiment conditions data model may stabilize only later -> possible deployment issues

Page 17: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 17

Open Issues

• Support for X.509 (proxy) certificates by Oracle?

– May need to study possible fallback solutions

• Server and support licenses for Tier 1 sites

• Instant client distribution within LCG

• In discussion with Oracle via commercial contact at CERN

Page 18: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 18

Databases in Middleware & Castor

• Took place already for services used in SC3

– Existing setups at the sites

– Existing experience with SC workloads -> extrapolate to real production

• LFC, FTS - Tier 0 and above

– Low volume, but high availability requirements

– CERN: Run on 2-node Oracle cluster; outside single box Oracle or MySQL

• CASTOR 2 - CERN and some T1 sites

– Need to understand scaling up to LHC production rates

• Currently not driving the requirements for the database service

• Need to consolidate databases configs and procedures

– may reduce effort/diversity at CERN and Tier 1 sites

Page 19: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 19

Experiment Applications

• Conditions - Driving the database service size at T0 and T1

– EventTAGs (may become significant - need replication tests and concrete experiment deployment models)

• Framework integration and DB workload generators exist

– successfully tested in various COOL and POOL/FroNTier tests

– T0 performance and replication tests (T0->T1) looks ok

• Conditions: Online -> Offline replication only starting now

– May need additional emphasis for online tests to avoid surprises

– CMS and ATLAS are executing online test plans

• Progress in defining concrete conditions data models

– CMS showed most complete picture (for Magnet Test)

– Still quite some uncertainty about volumes, numbers of clients

Page 20: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 20

Summary

• Database Deployment Architecture defined

– Streams connected Database Clusters for Online, Tier 0 (ATLAS, CMS, LHCb)

– Streams connected Database Cluster for Tier 1 (ATLAS, LHCb)

– FroNTier/SQUID distribution for Tier 1/Tier 2 (CMS)

– File snapshots (SQLight/MySQL) via CORAL/Octopus (ATLAS, CMS)

• Database Production Service and Schedule defined

• Setup proceeding well at Tier 0 and 1 sites

– Start at end of March seems achievable for most sites

• Application performance tests progressing

– First larger scale conditions replication tests with promising results for streams and frontier technologies

• Concrete conditions data models still missing for key detectors

Page 21: LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project  CHEP 2006, 15th February, Mumbai.

LCG 3D Status Dirk Duellmann 21

Conclusions

• There is little reason to believe that a distributed database service will move into stable production any quicker than any of the other grid services

• We should start now to ramp up to larger scale production operation to resolve the unavoidable deployment issues

• We need the cooperation of experiments and sites to make sure that concrete requests can be quickly validated against a concrete distributed service