Sanger, upcoming Openstack for Bio-informaticians

47
Sanger and our upcoming flexible compute platform Peter Clapham - Jan 2017

Transcript of Sanger, upcoming Openstack for Bio-informaticians

Page 1: Sanger, upcoming Openstack for Bio-informaticians

Sanger and our upcoming flexible compute platform

Peter Clapham - Jan 2017

Page 2: Sanger, upcoming Openstack for Bio-informaticians

Why a private cloud ?

Collaboration is hard enough already

HPC is a weak security modelCat 4 data is a large elephant

We’re reaching POSIX scalability

Increasing demand for more flexibility regarding operating systems and supplied libraries

Running services at scale should be able to burst to meet demand and collapse when no longer required

We should be able to more readily take advantage of developing technology

Linking up with common standards across the broader community.

Page 3: Sanger, upcoming Openstack for Bio-informaticians

Openstack at Sanger.

July 2015 - Development Juno system.September 2015 - Limited access POC Kilo system ( using Triple-O ).January 2016 - Hybrid cloud for commercial entities. June 2016 - Wider access POC Kilo system ( Triple-O ).Sep -> Dec 2016 - First production Liberty system ( Triple-O )

Page 4: Sanger, upcoming Openstack for Bio-informaticians

Production openstack (I)

• 107 Compute nodes (Supermicro) each with:• 512GB of RAM, 2 * 25GB/s network interfaces,• 1 * 960GB local SSD, 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )

• 6 Control nodes (Supermicro) allow 2 openstack instances.• 256 GB RAM, 2 * 100 GB/s network interfaces,• 1 * 120 GB local SSD, 1 * Intel P3600 NVMe ( /var )• 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )

• Total of 53 TB of RAM, 2996 cores, 5992 with hyperthreading.• Redhat Liberty deployed with Triple-O

Page 5: Sanger, upcoming Openstack for Bio-informaticians

Production openstack (II)• 9 Storage nodes (Supermicro) each with:

• 512GB of RAM,• 2 * 100GB/s network interfaces,• 60 * 6TB SAS discs, 2 system SSD.• 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )• 4TB of Intel P3600 NVMe used for journal.

• Ubuntu Xenial.• 3 PB of disc space, 1PB usable.• Single instance ( 1.3 GBytes/sec write, 200 MBytes/sec read )• Ceph benchmarks imply 7 GBytes/second

Page 6: Sanger, upcoming Openstack for Bio-informaticians

Production openstack (III)

• 3 Racks of equipment, 24 KW load per rack.• 10 Arista 7060CX-32S switches .

• 1U, 32 * 100Gb/s -> 128 * 25Gb/s .• Hardware VxLan support integrated with openstack *.• Layer two traffic limited to rack, VxLan used inter-rack.• Layer three between racks and interconnect to legacy systems.• All network switch software can be upgraded without disruption.• True linux systems.• 400 Gb/s from racks to spine, 160 Gb/s from spine to legacy systems.

(* VxLan in ml2 plugin not used in first iteration)

Page 7: Sanger, upcoming Openstack for Bio-informaticians

But what are we providing?

CloudFormsService driven access

OpenStack HorizonGranular control over

instance

Direct API Access

Direct https access from anywhere

Accessible only from within Sanger

Ceph Object Storage(Used to provide volume

and image storage)

S3 Object StorageLayer

Page 8: Sanger, upcoming Openstack for Bio-informaticians

How Does this Fit with Existing Services?

OpenStack “Bubble”

ComputeCeph

100GB/s SDN network infrastructure

Sanger internal systems

Access to secured services

i.e.iRODSDatabasesCIFS (Windows shares)

S3 API accessOpenStack API and GUI access

80GB/s connectivity

No access to:

NFSLustre

Page 9: Sanger, upcoming Openstack for Bio-informaticians

CloudForms Interface

Page 10: Sanger, upcoming Openstack for Bio-informaticians

Horizon Interface

Page 11: Sanger, upcoming Openstack for Bio-informaticians

Efficient Resource Management

OpenStack resources are managed at a tenant group level.• Each “tenant” group has an assigned quota for:

• Disk• CPU• Memory

Once limits are full, tenant members will either have to wait for resources to become available or shutdown or terminate a running instance.Initial quotas are agreed with the IC before creation

Page 12: Sanger, upcoming Openstack for Bio-informaticians

Quotas, they are not all the same.

Some groups have a requirement that they have an absolute number of spots available for essential servicesOther groups would like to burst to meet demand as required.

These requirements do not fit well with each other.

Page 13: Sanger, upcoming Openstack for Bio-informaticians

The Proposed Workaround

For those projects which require guaranteed access:• We create a dedicated tenant group that has specific access to a set quota

allocation of vCPU, Disk and memory.• This is tied directly against reserved hardware

This guarantees requested resource will be available when required, whilst providing security, operating system flexibility and instance management.

BUT there is no ability to use more than the requested allocation

Page 14: Sanger, upcoming Openstack for Bio-informaticians

Dynamic Workflows

Dynamic workflows can expand to meet demand and collapse when not required.So a quota that matches the initial resource request will mean constantly under quota’ing the systemFor the initial release we will start by:

• Overcommitting CPU by 1.5 : 1 (available total vCPU ~9000)• Overallocate quotas so that 115% of the overcommitted vCPU is available to

tenants.So some initial ability to use more of the system than may be available.

Page 15: Sanger, upcoming Openstack for Bio-informaticians

For More Details, see

https://docs.google.com/a/sanger.ac.uk/document/d/17z9urhh3bTLRhQo9b8CcsZW_3O7cxlGY9uiwpAS_GqQ/edit?usp=sharingOrhttp://tinyurl.com/zzurp5s

We are adding monitoring and metrics gathering to the system. This will provide a feedback loop for quota and project management.

Page 16: Sanger, upcoming Openstack for Bio-informaticians

New Opportunities for Application Development

Cloud application development aims to scale out compute and provide:

• Auto scaling of key services• Making pipelines cost effective on commercial platform providers• Self-healing of service components that fail• Creating resilient services with reduced impact when service components fail• Not tied to any one specific environment• Enabling sharing of code, images and services with collaborators. This can

dramatically reduce the need to copy large data sets around the world and permit running complex pipelines where the data resides.

Page 17: Sanger, upcoming Openstack for Bio-informaticians

How do we see Migration ?

Page 18: Sanger, upcoming Openstack for Bio-informaticians

Initial Early Adopters.

We have some early adopters !

1. Mutational signatures2. Imputation service3. Blast service4. Pan-prostate

We look forward to hearing more from these groups soon !

Page 19: Sanger, upcoming Openstack for Bio-informaticians

Mostly Share a Common Approach

Web Interface

Data upload

Run analysis

Update job status data

basePresent data

Invoke Analysis

Retain a copy

Page 20: Sanger, upcoming Openstack for Bio-informaticians

Adaption to Cloud based toolsStage Current approach Cloud approach

User details local databases, directory services, Oauth Oauth, directory services

Data Downloads Globus or https S3, Globus

Job status RDBM: MySQL, Oracle or PostgreSQL NoSQL: Mongodb, Cassandra or REDIS

Invoke Job Analysis

Hand crafted equest to LSF AMQP

Run Analysis LSF job submission AMQP, Heat orchestration or API call to Openstack

Present data Make available via sftp, Globus or https web upload

S3 automatically generated URL's

Keep data No consistent approach S3, archive as required

Service failure Await systems Use IFTTT or add code to instance to raise or restart an instance as required

Autoscale options Await systems Use IFTTT or add code to instance to raise or restart an instance as required

Service discovery Manual Cloud init, heat templates, dynamic DNS

Page 21: Sanger, upcoming Openstack for Bio-informaticians

New service, New Image ?

Cloud software stacks are based around services (micro-services) and are an exemplar of service-orientated architectures.Instances are mostly started from pre-created images and these form the building blocks for a given service.

Starting with:• Ubuntu with Docker support• Rstudio• An NFS server• OpenLava cluster

But what if you need something different ? You could ask or you could use the tools provided to create your own. Think /software+++

Page 22: Sanger, upcoming Openstack for Bio-informaticians

Developing machine images.

• Start simple and add complexity later.• We understand that Biologists are not often software engineers.• We believe that the process of images creation should be codified

and software development best practices followed.• Openstack images are based on images from a vendor.• It is possible to import other virtualisation system images to

Openstack ( these images could be made with automated tools ).• Virtualisation allows the possibility of software reproducibility.

Page 23: Sanger, upcoming Openstack for Bio-informaticians

Software development

• Source control ( git ), gitflow• Infrastructure as code ( Packer )• Continuous Integration ( gitlab CI )• Test driven development ( test kitchen )

Page 24: Sanger, upcoming Openstack for Bio-informaticians

Git branches (gitflow)

• Gitflow http://nvie.com/posts/a-successful-git-branching-model/• We follow the principle but do not use the software

• The master branch is always useable.• New features are integrated on the development branch.• Develop on a feature branch created from the development branch.• When a feature is complete pull feature to development branch.• When a set of features is ready pull development to master and tag release.• Develop bug fixes on a branch off development and cherry pick to bug

release branch created from tag of release.

Page 25: Sanger, upcoming Openstack for Bio-informaticians

Semantic versioning

MAJOR.MINOR.PATCHhttp://semver.org/spec/v2.0.0.html

• MAJOR version when you make incompatible API changes,• MINOR version when you add functionality in a backwards-

compatible manner, and• PATCH version when you make backwards-compatible bug fixes.

We treat changes in environment variables as a change to the “api”.

Page 26: Sanger, upcoming Openstack for Bio-informaticians

Packer

• https://packer.io/• Machine image configuration as code.• In use by systems at Sanger since 2014 ( used to build lustre clients )• Supports multiple virtualization platforms.• Supports both linux and Windows.• Simple example that can be used without CI:

• https://github.com/wtsi-ssg/image-creation

Page 27: Sanger, upcoming Openstack for Bio-informaticians

Packer, Provisioners

• Provisioners change the state of the machine.• Provisioners are bits of code written in various languages.• Multiple provisioners are allowed in an order.• Can be restricted to specific builds.

• Shell - simple shell scripts.• File uploads.• Ansible• Chef, Puppet, Salt• Powershell, Windows-Shell

Page 28: Sanger, upcoming Openstack for Bio-informaticians

Packer, Builders

• Builders are responsible for creating machines and generating images from them for various platforms.• Amazon, Takes an Image and applies changes.• Openstack, Takes an Image and applies changes.• Vmware , Uses an ISO and installs, then applies changes.• Docker, Takes a container and applies changes.• VirtualBox, Uses an ISO and installs, then applies changes.• Others….

Page 29: Sanger, upcoming Openstack for Bio-informaticians

Gitlab CI

• Allows processes to be run in response to a push to a repository.• Configured by a yaml file ( .gitlab-ci.yml ) • A build consists of multiple stages.• Each stage is run sequentially.• Parallel execution of tasks in each stage

• State needs to be stored in separate files/directories ( $CI_BUILD_ID )• Tags control which processes execute the stage.• https://about.gitlab.com/gitlab-ci/

Page 30: Sanger, upcoming Openstack for Bio-informaticians

Test Kitchen

• http://kitchen.ci/• Creates new instances to run tests on.• Drivers for various systems eg.

• Amazon• Openstack• Docker• Windows

• Configured with a single file ( .kitchen.yml ) which is a erb template.

Page 31: Sanger, upcoming Openstack for Bio-informaticians

Test Kitchen

• Each group should have an openstack tenant for CI.• Credentials are stored in gitlabs variables section.• Tenant needs to have a ssh security group.• Tenant needs a single network.• Configuration is shared in environment variables.• Supports multiple test frameworks:

• ServerSpec• Bats

Page 32: Sanger, upcoming Openstack for Bio-informaticians

Testing orchestration

Test kitchen can have multiple servers running at one time, each test runs from a separate directory, this allows us to test client server systems:• In a server directory start a machine and run server tests.• Extract the internal ip address from the master.• In a client directory start a machine, inject master location.• Run client tests.• Stop client, stop master.

Page 33: Sanger, upcoming Openstack for Bio-informaticians

ServerSpec

• RSpec is a behaviour-driven development framework for unit tests.• ServerSpec allows rspec tests to check server status.• E.g

require 'serverspec'

# Required by serverspecset :backend, :exec

describe "file system checks" do describe file('/data1') do

it { should be_mounted } endend

Page 34: Sanger, upcoming Openstack for Bio-informaticians

A CI workflow

Page 35: Sanger, upcoming Openstack for Bio-informaticians

image creation

• Our base image.• Used to make changes that will affect all the images.• https://github.com/wtsi-ssg/image-creation-ci• Multiple tags, each tag is a release eg.

• v5.0.0 migration from openstack beta to openstack gamma• v6.0.0 adding ansible as a system for configuration• v7.0.0 adding support for xenial and centos 7.2 as well as trusty.

Page 36: Sanger, upcoming Openstack for Bio-informaticians

ISG repository.

• https://github.com/wtsi-ssg/simple-image-builder• Continuous Integration and tests infrastructure framework already

available, additional tests will need writing.• Chain of software reproducibility relies on

• Trust that vendor built an image consistently.• Note that operating system packages will be pulled in a time of creation.• Critical components need to be pulled in from a fixed source.• Test should be written to validate system.

Page 37: Sanger, upcoming Openstack for Bio-informaticians

Batch scheduling is a bit old...

Page 38: Sanger, upcoming Openstack for Bio-informaticians

Openlava image

• A single image which is used for both master/head node and compute nodes.

• Includes NFS server for home directory.• Currently based on trusty ( Ubuntu 14.04 ).

• Development branch for Xenial ( Ubuntu 16.04 ) .• Development branch for Centos 7.2 .

• ServerSpec tests using multiple servers.

Page 40: Sanger, upcoming Openstack for Bio-informaticians

New tools and images are already being created internally

WR from Sendu:https://github.com/VertebrateResequencing/wr

NPG are producing an AMQP service image https://gitlab.internal.sanger.ac.ukWhen completed

Page 41: Sanger, upcoming Openstack for Bio-informaticians

And there’s more !

We are listening to your requests for features and supporting infrastructure:https://docs.google.com/spreadsheets/d/1_oeBz27beLLj_4xe3yoyZYjpaYhDFTE__F_L6pVNcTE/editORhttp://tinyurl.com/z5bh5q5

Page 42: Sanger, upcoming Openstack for Bio-informaticians

But also online tutorials, videos and documentation

Some, hopefully, useful examples have been collated here:

https://ssg-confluence.internal.sanger.ac.uk/display/OPENSTACK/Distributed+applications%3A+links+and+resourcesOrhttp://tinyurl.com/gwbrtfl

Page 43: Sanger, upcoming Openstack for Bio-informaticians

And still more

10th March OpenStack event here at Sanger

Tim Bell from Cern.• Head of the CERN OpenStack team• 200,000+ vCPU’s• Many Many PB of Ceph

Final schedule TBA

Page 44: Sanger, upcoming Openstack for Bio-informaticians

Almost done

Release date: On time for March 1st.

Watch out for the upcoming flyers

Page 45: Sanger, upcoming Openstack for Bio-informaticians
Page 46: Sanger, upcoming Openstack for Bio-informaticians

Acknowledgements

Current group staff: Pete Clapham, James Beal, Helen Brimmer, John Constable, Brett Hartley, Dave Holland, Jon Nicholson, Matthew Vernon.Previous group staff: Simon Fraser, Andrew Perry, Matthew Rahtz.All our early testers and those who have provided constructive feedback !

Page 47: Sanger, upcoming Openstack for Bio-informaticians

P.S.

11 more days to migrate from lustre 108, 109,110 and 111 before the system is made read only.

And only 1 month (1st March) until the old lustre systems are securely wiped and ready for removal from campus

Remember, lustre is not backed up