in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of...

24
The 1 Year and 1 hour Capacity Plan in the Drupal World

Transcript of in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of...

Page 1: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

The 1 Year and 1 hour Capacity Plan in the Drupal World

Page 2: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

About me

● Principal SRE @Acquia (Cloud Data Team)

● Joined in December 2011

● Location: Lisbon, Portugal

● Co-authored Seeking SRE w/ Machine Learning for SRE (O’Reilly)

● Founder and Lead of the Portuguese Drupal Association

● Fun Facts:

○ Presented in DevOps events including DrupalCons.

○ Dedicated father of 2 kids and still manages to study and write.

○ First Linux installation: Slackware in 1994.

○ Former theatre actor.

AgendaThe problem

What is Capacity

Why do Capacity Planning

Relation to Site Reliability Engineering

Budget & Capacity Planning

Load Testing

Performance Tuning vs. Capacity Planning

What to measure

How to measure

How to track capacity

Forecasting

First Easy Steps

Conclusions

Page 3: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

The ProblemSite Launch & User Expectations

Falcon Heavy launch, Spacex

Typical Drupal Site Launch

What about Capacity Planning??

- Disable devel

- Configure cron

- Check The Uplo

ad Sizes & Exe

cution Time

- Check Recipien

t Email Addres

ses

- Set The File P

ermissions

- Pro-tect Your

Root Account

- Check Per-mis-

sions

- Turn Off Error

Reporting

- Han-dle 404 Er

rors Gracefull

y

- Check Robots.t

xt

- Com-bine Patha

uto With Globa

l Redirect

- Cre-ate A Main

-te-nance Page

- Con-fig-ure Ca

ching

- Css And Javasc

ript Optimisat

ion

- Check Unpub-li

shed Con-tent

Is Not Visible

- Con-fig-ure St

atistics

- Monitor the Si

te

-** Plan

for Failure *

*

Page 4: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

User Expectations

Drupal click screenshot

● The end goal of capacity planning is a smooth and speedy experience for the users

● Varies depending on what type of application is and what portion of the application they interact with

No silver bullet

● Plenty of capacity but a slow website or unavailable

● Capacity is only one part of making the end-user experience fast

● We want to measure and track to make forecasts

● Intolerable amount of latency should raise a flag

Page 5: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

What is Capacity

resources required to run your services in the context you have chosen to run them

Carbon Fiber Tank, SpaceX

Capacity in Site Reliability Engineering (SRE)

● Capacity: The maximum amount of output a product deployment is capable of completing in a given period of time

● Capacity planning: Process that determines the resources needed, like people, instances, CPU, memory, time and more, for the company to meet changing demands for its services

● In the Drupal World we focus mostly on serving WEB capacity

Page 6: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Resource management

The Art of Capacity PlanningArun Kejariwal, John Allspaw"O'Reilly Media, Inc."

● Ensure proper resources are available to handle load

● Define procurement and an approval process

● Justify capital needs

● Manage resources after deployment

Why do Capacity Planning

Kroger grocery store, Lexington Kentucky, 1947, by Brett Streutket

Page 7: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Quick and Dirty Math

● Only spend as much as you actually need

● Be ahead of sharp growth

● Avoid emergencies

Stay Fast and Reliable

Site Reliability Engineering

Rocket Laboratory, 1952NASA/William A. Bowles

Page 8: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Ben Treynor - Google

...an SRE team is responsible for the availability, latency,

performance, efficiency, change management, monitoring,

emergency response, and capacity planning of their service(s)...

Demand Forecasting and Capacity Planning

● Ensuring that there is sufficient capacity and redundancy

● Serve projected future demand with the required availability

● Ensure the required capacity is in place by the time it is needed

● Take both organic and inorganic growth into account

http

s://u

nspl

ash.

com

/pho

tos/

mex

eVP

lTB

6k

Page 9: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

How SRE advocates for Capacity Planning

● Perform regular load testing

● Incorporate SLOs on Capacity

● Capacity is critical to availability, therefore the SRE team leads capacity planning initiatives and provisioning

http

s://u

nspl

ash.

com

/pho

tos/

DX

9X0g

0Cg8

8

Budget & Capacity Planning

Vintage Grow Your Money by Chris Potter, ccPixs.com

Page 10: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Keeping the costs low

● Meet with Finance, Engineering and Product

● Gather Systems and Application metrics

● Use that data to justify the investment

Three forces that impact Capacity Planning

Product

FinanceEngineering

Plan

Load Testing“Hope is not a strategy”

St. Margrethen - Load Test by Kecko

Page 11: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Load testing a Drupal stack

● How to load test? “Hit it until it breaks”

● Include the points of failure in the calculations

● Determining backend limits can be tricky

● Use those resource ceilings as a basis while predicting future growth

https://docs.acquia.com/acquia-cloud/arch/

A Few Load testing Toolssimulate

● Loadrunner

○ http://bit.ly/microfocus-loadrunner

● Iago

○ https://github.com/twitter/iago

● JMeter

○ http://jmeter.apache.org/

collect● Prometheus

○ http://www.prometheus.io/

● Signalfx

○ http://www.signalfx.com/

● Cacti

○ http://cacti.net

● Ganglia

○ http://ganglia.info

● Nagios

○ http://nagios.org/

http

s://w

ww

.goc

omic

s.co

m/c

alvi

nand

hobb

es/1

986/

11/2

6

Page 12: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Performance Tuning vs. Capacity planning

(different goals)

Top Speedby Alexander Nie

What to measuredefining the metrics

End-of-lifeby Dennis van Zuijlekom

Page 13: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Divide & Conquer

● Splitting nodes

● Understand capacity demands of each node

● Measure more distinctly

● How requests or queries per second affect resources

Identifying the key resources to measure

● Disk space (MB)

● Disk throughput (IOPS)

● CPU performance (FLOPS)

● RAM memory (MB)

● Network bandwidth (Mbps)

● Network IP pool (Netmask)

● Others

Page 14: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

How to measure

Living Computer Museum, Seattle

http

://w

ww

.bre

ndan

greg

g.co

m/P

erf/l

inux

_per

f_to

ols_

full.

png

| Tools to measure on Linux servers |

Page 15: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Collecting resources on web servers

TODO: CODE

● Example script that sends metrics to statsd

● Low footprint using /proc, df and ps

● For a constant reliable monitoring service use collectd: https://collectd.org or Telegraf: https://www.influxdata.com/time-series-platform/telegraf/

How to track Capacity

Page 16: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Store and display time-series

● Signalfx

● Cacti

● Ganglia

● Graphite

● Signalfx

● Datadog

● Ruxit

● LogicMonitor

● Sematext

● CoScale

● Riemann

● Prometheus

● Sensu

● Idera

● Bijk

● X-Pack

● vRealize Hyperic HQ

A couple of load testing tips

load testing Tutorials:https://www.tutorialspoint.com/jmeter

https://www.blazemeter.com/load-testing

docker app for grafana: https://github.com/kamon-io/docker-grafana-graphite

Page 17: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Forecasting (predicting trends)

Numbers And Finance by SeniorLiving.org

Predict the future?

● Use Context & Math

● Make educated guesses

● Long-term view is generally steady

● Generate estimates to sustain growth

● Use an adjustable process

● Forecast guides autoscaling policies

Page 18: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Ceilings and Historical data

● Daily storage consumption example

● Metric: total available disk space

● Cumulative total provides an historical perspective

● We can predict future needs

● Storage will probably be exhausted in the ceiling to where the line is headed

Curve fitting

● Curve fitting

● Creative & Scientific

● Stay ahead of growth

● Use time-series data

● Forecast by constructing new data points beyond the known

● Reconciliation of what we know and the best fit equation

● Consider context before math

y = mx+b

Page 19: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Forecasting Peak-Driven Resource Usage

● Track how the peaks change over time

● Extrapolate from that data to predict future needs

● Identify the server resource ceilings

● Find a relation between resources and application-level work

● Decide if we should scale vertically or horizontally

● and perform proactive autoscalling

● Fityk is an Open Source Software for nonlinear fitting of analytical functions to data.

● Incorporate cfityk scripts into automated curve fitting, like:

cfityk ricardo-disk.fit

@0 < ricardo-disk.csv

guess Quadratic

fit

info formula

quit

Returns the formula: 4888.18 + 363.063 * x + 8.91132 + -1.55119*x + 0.0660771*x^2

Homepage: https://fityk.nieto.pl/

cfityk ricardo-disk.fit

@0 < ricardo-disk.csv

guess Quadratic

fit

info formula

quit

Automating Forecasts with fityk & cfityk

Small demo: https://youtube.com/watch?v=EZnyq1Hr_7I

Page 20: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Forecasting with Machine Learning

Seeking SREConversations About Running Production Systems at ScalePublisher: O'Reilly Media

● Most popular method for curve-fitting in fityk is Levenberg-Marquardt

● ML is also an option for forecasting (book I co-authored)

● Code examples and guideshttps://github.com/ricardoamaro/MachineLearning4SRE

Start with Easy Steps

Page 21: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Get Started

1. Select a process owner.2. Identify the resources to be measured.3. Measure these resources.4. Compare to maximum capacity.5. Collect workload forecasts.6. Use forecasts for IT resource requirements.7. Map requirements onto existing utilizations.8. Predict when the system will be out of capacity.9. Update forecasts and utilizations.

Set a Goal!

● Two Classes:

○ Load: usually expressed in arrival rate or peak rate of requests hitting the serviceeg. target for 10.000 authenticated concurrent Drupal users

○ Performance: usually expressed in the form of Service Level Objectiveseg. 99th percentile of all requests should return in less 500ms

Page 22: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Be proactive ( plan & document ahead)

Picasso drawing with Paloma and Claude at Villa la Galloise, 1953.By Edward Quinn, EdwardQuinn.com.

Capacity Planning Dashboard

● Support your conclusions with metrics in a dashboard

● Both manual scaling and auto scaling decision should be based on real data

● When to scale?○ date and time (be alerted if needed)

● How to scale?○ vertical, horizontal or diagonal scaling

(Example) Drupal Cluster Dashboard

type value

limit/ node

ceiling units

limit (total)

current (peak)

peak %

Estimated days left

Varnishcache

28 1024 req/sec 2048 600 29% 830

Web 31 80 busy calls 160 145 90% 12

Database 15 60 connections 120 96 80% 36

Storage 14 30 TB 30 14 46% 21

Page 23: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Conclusions

Drive the system to the appropriate level of risk for the lowest cost.

Questions?

The 1 Year and 1 hour Capacity Plan in the Drupal World

Page 24: in the Drupal World The 1 Year and 1 hour Capacity Plan · of application is and what portion of the application they interact with No silver bullet Plenty of capacity but a slow

Join us forcontribution opportunities

Mentored Contribution

First TimeContributor Workshop

GeneralContribution

#DrupalContributions

What did you think?

https://events.drupal.org/node/22330

https://www.surveymonkey.com/r/DrupalConSeattle