pp_nextgencloudcon 120915 FINAL v2

24
@NexGenCloudCon @TheChannelCo #NGC15 1

Transcript of pp_nextgencloudcon 120915 FINAL v2

@NexGenCloudCon

@TheChannelCo #NGC15

1

@NexGenCloudCon

@TheChannelCo #NGC15

Cloud Adoption @ PayPal

10+ Availability Zones

10k+ hypervisors

55k+ VM’s

200k+ cores

Havana, Kilo

12+ pb storage

100% KVM

100% OVS

• PayPal’s OpenStack based

private cloud serves 160+M

customers for payment, website

interactions, mobile and more

• 100% of PayPal web-tier and

mid-tier services run on

OpenStack cloud.

@NexGenCloudCon

@TheChannelCo #NGC15

Build

Scale

Operate

Learnings

@NexGenCloudCon

@TheChannelCo #NGC15

Journey to Cloud began in 2012 in response to specific business asks.

Business Agility Cost Efficiency Service Quality

+ Reduce time spent between

“code to Live to Site (LTS)”

+ Rapid elasticity (on demand flex up)

+ Self-service

+ Standardization

+ Automation

+ Time to operationalize

+ MTTD, MTTR

Business Requirements

@NexGenCloudCon

@TheChannelCo #NGC15

Design/Code

Goal: Enable the Developer

1

Build/Test 2

Test/Integrate 3

Deploy/Monitor 4

@NexGenCloudCon

@TheChannelCo #NGC15

Cloud as the interface

for the data center 1

Improve efficiency in managing apps

2

Lights Out Management

(LOM) 3

Goal: Enable the Business

@NexGenCloudCon

@TheChannelCo #NGC15

Adopt Open Source Where Possible

Avoid Vendor Lock-In

Automate, automate

Guiding Principles

@NexGenCloudCon

@TheChannelCo #NGC15

Building the Cloud: Timeline Provide a friction-less end-to-end PDLC experience to developers via secure, reliable and ubiquitous cloud

Improve operational efficiency via informed capacity planning, optimized infrastructure utilization, granular usage visibility and end-to-end automation

1

2

@NexGenCloudCon

@TheChannelCo #NGC15

Build

Scale

Operate

Learnings

@NexGenCloudCon

@TheChannelCo #NGC15

2012 2015

+ Dev/Test Cloud

+ Less Than a Rack of Compute

+ Handcrafted by an Engineer

+ Supported by Another Engineer

+ Zero Automation

+ Thousands of Nodes

+ Distributed Across Several AZs

+ Automated

+ Operated 24x7

+ Running the Business

@NexGenCloudCon

@TheChannelCo #NGC15

Cloud BackOffice

Reclaim Capacity management

Remediate Onboard

Monitor

@NexGenCloudCon

@TheChannelCo #NGC15

Automate

1

@NexGenCloudCon

@TheChannelCo #NGC15

Fully Automate Deployments 1

Take Automation as a

Product Feature 2

Measure Outcomes with KPIs 3

Well defined and agreed upon.

Road Map for patching, upgrade, Sprints, Bugs, Backlog, Releases

Time to Deploy, Time to Recover, Time to Rollout a Change

@NexGenCloudCon

@TheChannelCo #NGC15

Data Integrity

On-Boarding

Supply Chain and Capacity

Asset Remediation

VM + BM on Demand

Storage on Demand

Network on Demand

Platform services on Demand

Agility OE + Efficiency

Scope of Automation

@NexGenCloudCon

@TheChannelCo #NGC15

Manage Drift

2

@NexGenCloudCon

@TheChannelCo #NGC15

Drift

Automation Gaps Transitional

Debugging Habits

Incidents

@NexGenCloudCon

@TheChannelCo #NGC15

Incidents Waiting

to Happen

Impacts Time to Recover

Impacts Customers

System is in an Adverse State

+

+

+

@NexGenCloudCon

@TheChannelCo #NGC15

Automated Audits 1

Drift Tracking 2

Mitigation as a Planned

Routine Activity 3

Managing Drift

Mitigation

Culture – Reward

Good Habits 4

@NexGenCloudCon

@TheChannelCo #NGC15

Multi-tier oversight

3

‒ Self Service

‒ Self healing systems

‒ Lights Out Management

‒ Operations lifecycle discipline

@NexGenCloudCon

@TheChannelCo #NGC15

Build

Scale

Operate

Learnings

@NexGenCloudCon

@TheChannelCo #NGC15

Operator’s Dilemma

Agility Security

Availability Cost

Scale

@NexGenCloudCon

@TheChannelCo #NGC15

Scaling Cloud (for optimal usage)

IAAS

‒ Horizontal Scaling

‒ Flex up capacity

‒ Oversubscribe

PAAS

‒ Flex up application pools

‒ Managing workloads as Mesos clusters & Docker containers

Thank you.