Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG...

19
Continuous availability: from the shift paradigm to unmanned operation. Pietro Tiberi 17 January 2018 – TIPS Contact Group

Transcript of Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG...

Page 1: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

Continuous availability: from the shift paradigm

to unmanned operation.

Pietro Tiberi

17 January 2018 – TIPS Contact Group

Page 2: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

2

Agenda

Continuous availability: from the shift paradigm to unmanned operation

1

Introduction

2

Continuous

Availability

3

Results

4

Conclusions and perspective

Page 3: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

3

Introduction TIPS Non functional requirements - Reliability / Availability

(RPO=0)

(RTO=15 minutes)

Transactions Lost

Downtime

99.9%

Continuous availability: from the shift paradigm to unmanned operation

Page 4: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

4

Introduction Datacenter Operations

Continuous availability: from the shift paradigm to unmanned operation

Human based

(on shifts) Unmanned

Page 5: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

5

CONTINUOUS OPERATION

Continuous availability: from the shift paradigm to unmanned operation

Page 6: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

6

Continuous Availability From high availability to continuous availability

Continuous availability: from the shift paradigm to unmanned operation

o Redundancy

o Fault Tolerance

o Clustering

o Active Active configuration

o Proactive

monitoring

o Continuous

delivery

o Automatic

remediation

o Dynamic capacity

management

Page 7: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

7

Continuous Availability Proactive Monitoring

Continuous availability: from the shift paradigm to unmanned operation

o Infrastructure monitoring

o Application monitoring o Detect events

before failures

o Trigger automatic

actions

o Analyze the event

Page 8: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

8

Continuous Availability IT Automation

Continuous availability: from the shift paradigm to unmanned operation

Page 9: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

9

Continuous Availability From Agile to Devops

Continuous availability: from the shift paradigm to unmanned operation

Page 10: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

10

Continuous Availability DevOps - Everything as Code

Continuous availability: from the shift paradigm to unmanned operation

Code

Virtual Infrastructure

Page 11: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

11

Continuous Availability Dynamic Capacity Management

Continuous availability: from the shift paradigm to unmanned operation

o Consumption

trend analysis

o Resource utilization

rate optimization o What if scenarios

o Predict future

requirements and

trends

Page 12: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

12 Continuous availability: from the shift paradigm to unmanned operation

Page 13: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

13

Test Plant Architecture

Continuous availability: from the shift paradigm to unmanned operation

Message Layer

Database Layer

User A User B

Message Router

Message Processor

Message Router

Kafka Broker

Aerospike Database

write

store store

write

write read

put

get

get

put

Application Layer

Page 14: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

14

Results Test Architecture

Specific tests to verify the relevant

domain functions.

Common simulation layer to

reproduce real operational

environment.

executed on

Continuous availability: from the shift paradigm to unmanned operation

Page 15: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

15

Results Simulation – continous delivery (1)

Normal traffic condition (500 msg/s), timeout = 10.000 ms

Kafka cluster rolling update

0 messages lost

0 timeout expired

Continuous availability: from the shift paradigm to unmanned operation

SIMUL.APP.01 : message latency (1 sec average)

Page 16: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

16

Results

Continuous availability: from the shift paradigm to unmanned operation 07 November 2017 – CMG Impact 2017

SIMUL.APP.02 : message latency (1 sec average)

Simulation – continous delivery (2)

Heavy traffic condition (2000 msg/s), timeout = 10.000 ms

Kafka cluster rolling update

0 messages lost

some timeout expired

Page 17: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

17

Results Simulation – proactive monitoring

Continuous availability: from the shift paradigm to unmanned operation

Normal traffic condition (500 msg/s)

average E2E processing time = 45 ms

High vCPU load added to Message Processor nodes.

T0-T1 below threshold

T2-T3 exceed threshold

Page 18: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

18

Conclusions and perspective

Phased

Approach Bi-modal

Data Center

Tool

Continuous availability: from the shift paradigm to unmanned operation

Page 19: Continuous availability: from the shift paradigm to ...€¦ · 16 Results 07 November 2017 –CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation

Continuous availability: from the shift paradigm

to unmanned operation.

Pietro Tiberi ([email protected])

Thanks for your attention