Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan,...

17
Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 www.geni.net

Transcript of Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan,...

Page 1: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation

GENICampus Ops Workflow

Chaos GolubitskySan Juan, Puerto Rico Mar 16 2011

www.geni.net

Page 2: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 2March 16, 2011 www.geni.net

Outline

• Introduction• Experimenter support• Resources• Monitoring

Page 3: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 3March 16, 2011 www.geni.net

Towards a more “production-like” GENI

• Some Spiral 3 ops goals:– Resources are easier for experimenters to find and use– Provisioning an experiment doesn’t require picking up

the phone (as often)– Resources are more reliably available– Problems with resources are easier to detect and

resolve

• Here are some steps we think will be useful

Page 4: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 4March 16, 2011 www.geni.net

Campus ops workflow?

• A workflow is a set of steps to achieve a goal:– Become a production GENI campus!

• This process will change as more campuses try it• Proposed workflow steps we think will be useful• Three categories:

– Experimenter support– Resource deployment– Monitoring

• There’s more than one way to do this; input is welcome!

Page 5: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 5March 16, 2011 www.geni.net

GPO as reference campus

• We try things out, test, and provide guidance and support to campuses deploying similar things– And pass along ideas for other reference campuses

• We hope to help:– Small testbeds with diverse resources (OpenFlow, MyPLC,

ProtoGENI, L2 backbone connectivity)– Campuses who want to create testbeds– Bigger testbeds (where we can)

• We’re working on:– Experimenter support– More (and more GENI-like) resources– Useful monitoring– Templates for transitioning to GENI operations

Page 6: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 6March 16, 2011 www.geni.net

Workflow Steps for Experimenter Support

• Subscribe to [email protected]: http://lists.geni.net/mailman/listinfo/response-team– Report your outages– Answer questions from experimenters

• Tell GPO ([email protected]) you’re willing to support some experimenters:http://groups.geni.net/geni/wiki/ProductionResources

• Create a page advertising each of your aggregates:http://groups.geni.net/geni/wiki/GeniAggregate/YourSiteAggregate– What resources do you have?– Who can use them?– How do they use them?– Resources don’t need to be fully open to the public to be advertised

here– Template: http://groups.geni.net/geni/wiki/TemplateAggregatePage

Page 7: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 7March 16, 2011 www.geni.net

Experimenter Support at GPO

http://groups.geni.net/geni/wiki/GeniAggregate/GpoLabProtoGeni

Page 8: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 8March 16, 2011 www.geni.net

Workflow Steps for Adding Resources

• Connectivity• Aggregates:

– Give local users access to your resources– Run software that supports the GENI AM API– Give remote users access to your resources (consistent

with your site policy)

• Configuration management:– Know what you’re running– Especially if it’s GENI software (things change fast)– Allows you to help experimenters better– Allows us (and other campuses) to help you better

Page 9: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 9March 16, 2011 www.geni.net

Resources at GPO

• GPO can provide templates and help for aggregates we have experience with

• Things we have:– Connections to NLR and I2 backbones– OpenFlow switches (HP/NEC/Quanta), FlowVisors,

controllers, GENI AM API support– Reference installation of WiMAX software– ProtoGENI cluster

• A simple resource you can deploy:– MyPLC plus SFA to support the GENI AM API:

http://groups.geni.net/geni/wiki/GpoLab/MyplcReferenceImplementation

Page 10: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 10March 16, 2011 www.geni.net

Workflow Steps for Monitoring (1)

• Two consumers of monitoring data:– Operators and experimenters

• Operators:– Goals:

• Detect and resolve outages quickly• Plan for the future

– Monitoring steps:• Polling and trending of local resources• Alerting on local resource outages• Visibility into status of connected remote resources• Visibility into many remote resources in a consistent format

Page 11: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 11March 16, 2011 www.geni.net

Workflow Steps for Monitoring (2)

• Experimenters:– Goals:

• Identify problems affecting the slice• Collect measurement data for their slice

– Monitoring steps:• Status of available resources (how many nodes?)• Status of resources I’m using (is my node up?)• External characteristics of slice (CPU usage? Network

bandwidth?)• Internal characteristics of slice (I&M working session Thursday)

Page 12: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 12March 16, 2011 www.geni.net

Monitoring at GPO

• Strategy:– Collect as much data as possible from our site now:

http://monitor.gpolab.bbn.com– Integrate our data with collectors (GMOC, aggregates)

• Tactics:– Trending is more important than alerting:

• Remote operators and experimenters are casual consumers• Don’t want alerts for resources which may not be relevant• Do want historical availability information on request

– Collect numeric trending data in a consistent format:• Using ganglia to collect data in rrdtool format for now

– Generate webpages that format ganglia’s data more meaningfully

Page 13: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 13March 16, 2011 www.geni.net

Monitoring at GPO: Ganglia’s native UI

Page 14: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 14March 16, 2011 www.geni.net

Monitoring at GPO: Collecting GENI Data

• Active testing:– Use simple scripts to run tests and report results to

ganglia– Test recent values for freshness and sanity– GPO uses this to monitor reachability across the NLR

and Internet2 OpenFlow backbone

• Collecting external slice data:– Run locally on aggregate manager– Query aggregate data: slice names, node counts– Query operational data: packet counters, node state,

CPU usage

Page 15: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 15March 16, 2011 www.geni.net

Monitoring at GPO: Status of core VLANs

Page 16: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 16March 16, 2011 www.geni.net

Monitoring at GPO: FlowVisor slice status

Page 17: Sponsored by the National Science Foundation GENI Campus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 .

Sponsored by the National Science Foundation 17March 16, 2011 www.geni.net

Summary

• Spiral 3 ops goals:– Test operations across several unaffiliated campuses– Ramp up GENI-wide experiment support

• GPO is trying to be an example campus, but there are many others

• If you do only two things, please:– Join [email protected]– Make sure we ([email protected]) know what you

would like to support this year, and what we can do to help