Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes...

42
Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production [email protected] [email protected] KubeCon - Berlin, 29-30 March 2017

Transcript of Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes...

Page 1: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production

[email protected] [email protected]

KubeCon - Berlin, 29-30 March 2017

Page 2: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

An inspiring travel company ..

Page 3: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

A tech company to the core

Tech department: 300+ people

Applications: ~100

Database: 4 TB of data

Servers: 1400 VMs, 300 physical machines

Locations: Chiasso, Milan, Madrid, London, Bengaluru

Page 4: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Business: "technology is slow"

https://www.pexels.com/photo/turtle-walking-on-sand-132936/

Page 5: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Technology: "the monolith is the problem"

https://www.flickr.com/photos/southtopia/5702790189

Page 6: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/

"... let’s break into microservices!"

Page 7: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

A lot of issues

● LONG provisioning time

● LACK OF alignment across environments

● LACK OF alignment across applications

● LACK OF awareness about ops

Page 8: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

A year-long endeavour

● build a new, modern infrastructure

● migrate the search (flight/hotel) product there

... without:

● impacting the business● throwing away our whole datacenter

Page 9: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

https://www.pexels.com/photo/colorful-toothed-wheels-171198/

Our infrastructure and our architecture

Page 10: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Virtualization platform

TONS

OF

VIRTUAL MACHINES

Page 11: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Virtualization platform

TONS

OF

VIRTUAL MACHINES

Page 12: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

● CoreOS, the all-in-one choice

○ Cloudconfig configuration

○ Automatable in a shot

○ Really simple patch management

Engage

Page 13: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Our Kubernetes on CoreOS architecture is born

● The stack○ ETCD○ FLANNELD○ DOCKER

● KUBERNETES (Google!)

K8S

DOCKER

FLAN

NE

LD

ET

CD

CoreOS

Po

dP

od

Po

d

Server

Page 14: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

NODE 2

NODE 1

NODE 2

NODE 1

How to talk with pods

NGINX

NGINX

Pod

Pod

Pod

Pod

Pod

np

np

np

Pod

Pod

Pod

Proxy

np

np

Pod

Pod

Podnp

Proxy

Proxy

Proxy

F5 F5

tcp http

NodePort Ingress

Page 15: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

In the name of service

- host: awesomeservice.prd.mykubecluster.intra http: paths: - path: / backend: serviceName: awesomeservice servicePort: 8081

awesomeservice-ingress.yaml

Page 16: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

In the name of service

*.[prd|qa|dev].mykubecluster.intra. IN CNAME kubef5ingress

Page 17: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

The return of NodePort

np

np

Pod

Pod

Pod

np Proxy

NODE n

F5 TLS TLSTLS

tcp

TLS

Page 18: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

The registry brought another question...

?

Page 19: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Seriously?

Page 20: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Rear window on kubernetes

Server

graphite

OScollectd

image

Nagios first Grafana 4 now

icons from http://www.flaticon.com

Kube API

Page 21: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

We were happy!

Page 22: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Not happy anymore

Page 23: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Seriously?

Page 24: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

The change… It’s a kind of magic

KEEP CALM

andTRUST KUBERNETES

Page 25: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

What we learned

Lots of things!

Page 26: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

The final architecture (so far…)

K8S

DOCKER

FLAN

NE

LD

ET

CD

Ubuntu

Po

dP

od

Po

d

F5

OU

TSID

EKU

BERN

ETES INSIDE KUBERNETES:

3 different environments7 MASTERS

2 REGISTRYs+ 70 PHYSICAL NODES

+ 47 ETCDs+ 7 DNS

+ 140 Namespaces+ 1300 PODs

ingress

Page 27: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Our infrastructure and our architecture

https://www.pexels.com/photo/colorful-toothed-wheels-171198/

Page 28: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Our core axioms

● same architecture across environments● a common framework to align software● centralized monitoring/logging, with alerts● zero downtime deployment ● automation everywhere

Page 29: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

APP3-PRODUCTION

Kubernetes: our architecture

APP2-PRODUCTIONAPP1-PRODUCTION

APP3-PRODUCTIONAPP2-PRODUCTIONAPP1-PREVIEW

APP3-PRODUCTIONAPP2-PRODUCTIONAPP1-DEVELOPMENT

APP3-PRODUCTIONAPP2-PRODUCTIONAPP1-QA

nonproductionproduction

Page 30: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Kubernetes: our architecture and choices

APP1-PRODUCTION

deployment

replica-set

app1-production.prd.mykubecluster.intra

secret configmap

POD-3POD-2POD-1

production

Page 31: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

"To ingress or not to ingress? .."

NODE 1

NODE 2

NODE 3

● easier DNS management● customizable proxy server

● 3rd party tool● requires external sync● all requests go through it● reload risks

F5

NGINX

NGINX

Page 32: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

APP1-PRODUCTION

Kubernetes: our architecture and choices

POD

production

applicationfluentdcollectd

carbon

Page 33: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

APP1-PRODUCTION

POD

Monitoring and alerting: grafana/graphitecluster

graphite

applicationcollectd

Grafana 4

icons from http://www.flaticon.com

carbon

Page 34: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Zero downtime (1): graceful shutdown

lifecycle: preStop: exec: command: ["/stop_helper.sh"]

deployment.yaml

#!/bin/bash

wget http://localhost:8002/stop

stop_helper.sh

Page 35: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Zero downtime (2): graceful startup

private CompletableFuture run(Stream<CompletableFuture> startupJobs) {

return allOf(startupJobs.toArray(CompletableFuture[]::new)) .thenAccept(this::raiseReadinessUp) .exceptionally(this::shutdown);

}

JobsExecutor.java

Page 36: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Automate everything: pipeline DSL

microservice = factory.newDeployRequest().withArtifact("com.lastminute.application1",2).fromGitRepo("git.lastminute.com/team/application")

lmn_deployCanaryStrategy(microservice,"qa") lmn_deployCanaryStrategy(microservice,"preview")lmn_deployCanaryStrategy(microservice,"production")

pipeline

Page 37: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Automate everything: pipeline

pulljar

builddocker

(gate)

QAcanary

(gate)

QAstable

(gate)

PREVcanary

(gate)

PREVstable

(gate)

PRODcanary

(gate)

PRODstable

● git push○ continuous integration○ continuous delivery

Page 38: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

https://www.flickr.com/photos/ghost_of_kuji/2763674926

.. failure ..

Page 39: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

nginx ingress controller problem

10.0.0.1

10.0.0.2

10.0.0.3

10.0.0.4

NGINX

NGINX

NGINX

10.0.0.5

10.0.0.6

F5

NGINX

NGINX

NGINX

NGINX

NGINX

Page 40: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-tunnel-211816/

There’s light .. at the end

Page 41: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

● 20K req/sec in the new cluster● 2M metrics/minute flows● 10 minutes to create a new environment ● whole pipeline runs in 16 minutes

○ 4 minutes to release 100 instances of a new version

Give me the numbers .. again!

Page 42: Tales from Lastminute.com machine room: our journey towards a full on-premise kubernetes architecture in production (KubeCon Berlin 2017)

Yes, we’re hiring!

THANKScareers.lastminutegroup.com