OSG Operations

36
OSG Operations All Hands Meeting Rob Quick (Ops Coordinator) Slides by: Scott Teige and Kyle Gross

description

OSG Operations. All Hands Meeting Rob Quick (Ops Coordinator ) Slides by: Scott Teige and Kyle Gross. Support Overview. Communications Hub Coordinate Ticketing & Exchanges End-user Support OSG RA Documentation. Communications Hub. 24x7 Telephone – 1-317-278-9699 - PowerPoint PPT Presentation

Transcript of OSG Operations

Page 1: OSG  Operations

OSG Operations

All Hands MeetingRob Quick (Ops Coordinator)

Slides by: Scott Teige and Kyle Gross

Page 2: OSG  Operations

March 2011 2

Support Overview

• Communications Hub• Coordinate Ticketing & Exchanges• End-user Support• OSG RA• Documentation

Page 3: OSG  Operations

March 2011 3

Communications Hub

• 24x7 Telephone – 1-317-278-9699• 24x7 Email – [email protected]• 24x7 Ticket Creation

Leverage the 24 hour coverage of the GRNOC at IU

• Community Notification Tools• Blogspot postings, twitter and RSS feed

http://osggoc.blogspot.com/ Twitter: OSGGOC (test)

• Weekly Operations Meeting Mondays

Page 4: OSG  Operations

March 2011 4

Ticketing & Ticket Exchange

• Central OSG Ticket System• GOCTicket interface

http://ticket.grid.iu.edu• Ticket Exchange – SC, GGUS, GOC-TX• 10,000 ticket milestone – 2/22/2011

Page 5: OSG  Operations

March 2011 5

End User Support

• OIM Registration http://oim.grid.iu.edu/

• VOMS (MIS, OSGEDU, CSIU)• Certificate Requests• Twiki Support

Page 6: OSG  Operations

March 2011 6

OSG RA

• Alain Deximo as new OSG RA• Updating Procedures/Docs for effective

backup

• Other than new POC (Alain), transparent to users

Page 7: OSG  Operations

March 2011 7

Documentation

• Work with OSG Documentation Team Help them with Twiki setup https://twiki.grid.iu.edu/twiki

• Cleaning up Operations Docs

Page 8: OSG  Operations

March 2011

Service Overview

• Information Services Information to people Information to machines

• Accounting Services• Monitoring Services• Collaborative Services

Page 9: OSG  Operations

March 2011

MyOSG

• http://myosg.grid.iu.edu

Page 10: OSG  Operations

March 2011

Display

• http://display.grid.iu.edu/

Page 11: OSG  Operations

March 2011

OIM

• Open Science Grid Information Management

• https://oim.grid.iu.edu• Semi-static information to people and

machines• Find contacts, VO information,

resources, much more

Page 12: OSG  Operations

March 2011

BDII

• Berkeley Database Information Interface

• Mostly provides information to machines

• Most critical service for GOC• Dynamic information, ~2 minute period• Many services depend on BDII• http://is.grid.iu.edu

Some information to people

Page 13: OSG  Operations

March 2011

Ticket

• https://ticket.grid.iu.edu/goc/open• Don’t get stuck, cut a ticket• Ticket Exchange

GOC ticketing system interacts with other support organization ticket systems via the ticket exchange.

Allows seamless interaction of multiple ticket systems, seem to behave as one system.

Page 14: OSG  Operations

March 2011

RSV

• Resource and Service Validation

Page 15: OSG  Operations

March 2011

WLCG Comparison

• A accounting service• Some OSG resources are also WLCG

resources• Separate accounting systems

Page 16: OSG  Operations

March 2011

Software Cache

• http://software.grid.iu.edu• Pointers to VDT software• Certificate Authority Distribution

http://software.grid.iu.edu/pacman/cadist/• VO package• Certificate requests

Page 17: OSG  Operations

March 2011 17

xxx-ITB

• Ditto above but for testing• 1st and 3rd Tuesdays updates to ITB

You are encouraged to test services, particularly those of interest to you

• 2nd and 4th Tuesdays updates to Prod.• 5th Tuesday, the GOC rests.

Page 18: OSG  Operations

March 2011 18

Change Management and Ops Meetings

• Change Management Review Tuesdays

https://twiki.grid.iu.edu/bin/view/Operations/ChangeMgmtMeetingMinutes

Page 19: OSG  Operations

March 2011 19

Recap from the Ops Coordinator

• 15 Minutes• Sustainability• “Yet, in spite of these spectacular

strides in science and technology, and still unlimited ones to come, something basic is missing… We have learned to fly the air like birds and swim the sea like fish, but we have not learned the simple art of living together as brothers.” -MLK

Page 20: OSG  Operations

Three things you’ve just gotta know about the VDT

(And Frank)Alain Roy

Open Science Grid Software Coordinator

Page 21: OSG  Operations

March 2011

But first a poem

21

I have a flower on my headBy Andrea Roy

I have aFlower on my headWhat should I do?Should I water it? I think so.

Page 22: OSG  Operations

March 2011

The three things you just gotta know about the VDT

1. RSV is way cooler2. RPMs for the VDT are on the way3. CREAM is coming to the VDT soon

22

Page 23: OSG  Operations

March 2011

1. RSV is way cooler

As of February 7th, OSG 1.2.17, RSV is just so much cooler for two main reasons:1. Common RSV tasks are made simple with

the new rsv-control command.2. It is really easy to extend RSV with new

probes If you can write a script to test something, you

can put it into RSV. Is there something else you’d like to test?

3. Standalone installations are much easier (with config.ini)

23

Page 24: OSG  Operations

March 2011

Easy to list your RSV probes!% rsv-control --list

Metrics enabled for host: osg-edu.cs.wisc.edu:10443 | Service ----------------------------------------------------------+-------- org.osg.srm.srmcp-readwrite | OSG-SRM org.osg.srm.srmping | OSG-SRM

Metrics enabled for host: osg-edu.cs.wisc.edu | Service ----------------------------------------------------------+---------org.osg.batch.jobmanager-default-status | OSG-CE org.osg.batch.jobmanagers-available | OSG-CE org.osg.certificates.cacert-expiry | OSG-CE... 24

Page 25: OSG  Operations

March 2011

Easy to see the RSV jobs!

25

% rsv-control --job-list

Hostname: osg-edu.cs.wisc.edu ID OWNER ST NEXT RUN TIME METRIC 659.0 rsv I 03-06 10:08 org.osg.globus.gridftp-simple 660.0 rsv I 03-06 09:32 org.osg.gip.lastrun 661.0 rsv R 03-06 18:47 org.osg.general.vdt-version ... Hostname: osg-edu.cs.wisc.edu:10443 ID OWNER ST NEXT RUN TIME METRIC 655.0 rsv I 03-06 09:33 org.osg.srm.srmping 656.0 rsv R 03-06 09:28 org.osg.srm.srmcp-readwrite

ID OWNER ST CONSUMER 679.0 rsv R html-consumer 680.0 rsv R gratia-consumer

Page 26: OSG  Operations

March 2011

Easy to enable/disable RSV probes!

26

% rsv-control --enable --host osg-edu.cs.wisc.edu \ org.osg.ress.ress-classad-exists

Enabling metric 'classad-exists' for host 'osg-edu.cs.wisc.edu'

One or more metrics have been enabled and will be started the nexttime RSV is started. To turn them on immediately run 'rsv-control--on'.

Page 27: OSG  Operations

March 2011

Easy to run a probe right now!

27

% rsv-control --run --host osg-edu.cs.wisc.edu org.osg.general.osg-version

Running metric org.osg.general.osg-version:

metricName: org.osg.general.osg-versionmetricType: statustimestamp: 2011-03-06 09:24:42 CSTmetricStatus: OKserviceType: OSG-CEserviceURI: osg-edu.cs.wisc.edugatheredAt: osg-edu.cs.wisc.edusummaryData: OKdetailsData: OSG 1.2.18EOT

Page 28: OSG  Operations

March 2011

Easy to run all probes to refresh

28

% rsv-control --run –all-enabled

Running metric org.osg.certificates.cacert-expiry (1 of 24)

metricName: org.osg.certificates.cacert-expirymetricType: statustimestamp: 2011-03-07 07:40:40 CSTmetricStatus: OKserviceType: OSG-CEserviceURI: osg-edu.cs.wisc.edugatheredAt: osg-edu.cs.wisc.edusummaryData: OKdetailsData: Security Probe Version: 1.1OK: CAs are in sync with OSG distributionEOT

Running metric org.osg.general.osg-directories-CE-permissions (2 of 24)...

Page 29: OSG  Operations

March 2011

Straightforward to get debugging info

29

% rsv-control --verify

Testing if Condor-Cron is running...OK

Testing if metrics are running...OK (24 running metrics)

Testing if consumers are running...OK (2 running consumers)

Checking which consumers are configured...The following consumers are enabled: html-consumer gratia-consumer

% rsv-control --profileRunning the rsv-profiler...OSG-RSV ProfilerAnalyzing...Making tarball (rsv-profiler.tar.gz)

Page 30: OSG  Operations

March 2011

And now a slight detour: Frank

• Frank [last-name removed]• Wrote some code for Condor that “worked”.• But he meant:

Works == Compiles• A common mistake for beginners, so we

won’t hold it against him.• But it’s a useful indication of progress:

A lot has been done, but it requires more before you can test it.

30

Page 31: OSG  Operations

March 2011

2. RPMs for the VDT are on the way

• We have franked binary RPMs without configuration for: gLexec

(Actually, they’ve been tested pretty well) Xrootd 95% of the worker node (56/59 RPMs)

Currently missing: FTS client

• They are in a yum repo, will be available for testing soon.

31

Page 32: OSG  Operations

March 2011

3. CREAM is coming to the VDT soon

• Basic CREAM install via Pacman Currently franks, but known problems End of March

• CREAM install via RPMs End of April

   And then a period of testing/finalizing

• Ready for production by September• Timeline driven by ATLAS needs

32

Page 33: OSG  Operations

March 2011

I’m happy if you leave with those three things

1. RSV is way cooler2. RPMs for the VDT are on the way3. CREAM is coming to the VDT soon

But I’ll say a two more things:

33

Page 34: OSG  Operations

March 2011

Two More Things

• Plan for next round of OSG: Do RPMs right: source packages, intermix

with external dependencies neatly… Community-oriented distributions

• We are getting better about collecting accurate requirements and reporting work plans/time lines

34

Page 35: OSG  Operations

March 2011

But wait! There’s more!

• The Second Annual OSG Summer School! June 26-30, 2011 Learn about high-throughput computing,

OSG, and more! Tell anyone that would be interested,

spread the word! https://twiki.grid.iu.edu/bin/view/Education/OSGSummerSchool2011

35

Page 36: OSG  Operations

March 2011

Any Questions?

• I’m here until Thursday—please come and talk to me.

• Or email me: [email protected]

36