Scale like an ant, distribute the workload - DPC, Amsterdam, 2011

Post on 07-May-2015

3.046 views 0 download

description

Many services / applications now a day are ill equipped with handling a sudden rush of popularity, as is often the case on the internet now a days, to a point where the services either become unavailable or unbearably slow.By taking a chapter from the ant colonies in the wild, where their strength lies in their numbers and the fact that everyone works together towards the same goal, we can apply the same principle to our service by using systems such as- gearman- memcache- daemons- message queues- load balancersand many more, you can achieve greater performance, more redundancy, higher availability and have the ability to scale your services up and down as required easily.During this talk attendees will be lead through the world of distributed systems and scalability, and shown the how, where and what, of how to take the average application and splitting it into smaller more manageable pieces

Transcript of Scale like an ant, distribute the workload - DPC, Amsterdam, 2011

Distribute the workload

Helgi Þormar ÞorbjörnssonDutch PHP Conference, Amsterdam, 19th May 2011

Thursday, 19 May 2011

Who am I?

Thursday, 19 May 2011

Helgi

Thursday, 19 May 2011

VP of Engineering at Orchestra.io

Helgi

Thursday, 19 May 2011

VP of Engineering at Orchestra.io

Developer at PEAR

Helgi

Thursday, 19 May 2011

VP of Engineering at Orchestra.io

Developer at PEAR

From Iceland

Helgi

Thursday, 19 May 2011

VP of Engineering at Orchestra.io

Developer at PEAR

From Iceland

@h on Twitter

Helgi

Thursday, 19 May 2011

Why Distribute?

Thursday, 19 May 2011

Why Distribute?

Thursday, 19 May 2011

Why Distribute?

Budget

Thursday, 19 May 2011

Why Distribute?

Budget

Efficiency

Thursday, 19 May 2011

Why Distribute?

Budget

Efficiency

Perception

Thursday, 19 May 2011

Budget

Thursday, 19 May 2011

Budget

Spend wisely

Thursday, 19 May 2011

Budget

Spend wisely

Commodity servers

Thursday, 19 May 2011

Budget

Spend wisely

Commodity servers

Low overhead, high yield

Thursday, 19 May 2011

Budget

Spend wisely

Commodity servers

Low overhead, high yield

Cloud Computing (EC2)

Thursday, 19 May 2011

Efficiency

10 small servers > 1 big

Thursday, 19 May 2011

Venue Security

Thursday, 19 May 2011

1000 people can exit quicker through 10 small doors than 1 big

Thursday, 19 May 2011

1000 people can exit quicker

through 10 small doors than 1 big

Thursday, 19 May 2011

1000 people can exit quicker

through 10 small doors than 1 big

Thursday, 19 May 2011

1000 people can exit quicker through 10 small doors than 1 big

Thursday, 19 May 2011

Thursday, 19 May 2011

Thursday, 19 May 2011

Thursday, 19 May 2011

Thursday, 19 May 2011

Perception

Thursday, 19 May 2011

Perception

Defer intensive processes

Thursday, 19 May 2011

Perception

Defer intensive processes

Give instant feedback

Thursday, 19 May 2011

Perception

Defer intensive processes

Give instant feedback

Users keep on browsing

Thursday, 19 May 2011

Perception

Defer intensive processes

Give instant feedback

Users keep on browsing

Thursday, 19 May 2011

“It all depends on how we look at things, and not how

they are in themselves.”

- Carl G. Jung

Thursday, 19 May 2011

Thursday, 19 May 2011

Chapter from Nature

Thursday, 19 May 2011

Ant Colonies

Thursday, 19 May 2011

Algorithms

Thursday, 19 May 2011

Algorithms

Scheduling

Thursday, 19 May 2011

Algorithms

Scheduling

Vehicle Routing

Thursday, 19 May 2011

Algorithms

Scheduling

Vehicle Routing

Assignment

Thursday, 19 May 2011

Algorithms

Scheduling

Vehicle Routing

Assignment

Sets

Thursday, 19 May 2011

Algorithms

Scheduling

Vehicle Routing

Assignment

Sets

Other

Thursday, 19 May 2011

Algorithms

Scheduling

Vehicle Routing

Assignment

Sets

Other

Thursday, 19 May 2011

How do ants fit?

Thursday, 19 May 2011

How do ants fit?

Strength in numbers

Thursday, 19 May 2011

How do ants fit?

Strength in numbers

Work together

Thursday, 19 May 2011

How do ants fit?

Strength in numbers

Work together

Size benefits them

Thursday, 19 May 2011

Teamwork

When faced with a problem they will solve the problem as one.

Thursday, 19 May 2011

Thursday, 19 May 2011

Thursday, 19 May 2011

What if they were bigger?

Thursday, 19 May 2011

Types of Ants

Thursday, 19 May 2011

Types of Ants

Military

Thursday, 19 May 2011

Types of Ants

Military

Maids

Thursday, 19 May 2011

Types of Ants

Military

Maids

Tunnel diggers

Thursday, 19 May 2011

Types of Ants

Military

Maids

Tunnel diggers

Food gatherers

Thursday, 19 May 2011

How does this map to my application?

Thursday, 19 May 2011

Thursday, 19 May 2011

Colony = Application

Thursday, 19 May 2011

Colony = Application Ants = Components

Thursday, 19 May 2011

Colony = Application Ants = Components

Ants do many different types of work to keep their colony running

Thursday, 19 May 2011

Architect for Distribution

Thursday, 19 May 2011

Characteristics

Thursday, 19 May 2011

Characteristics

Decoupling

Thursday, 19 May 2011

Characteristics

Decoupling

Elasticity

Thursday, 19 May 2011

Characteristics

Decoupling

Elasticity

High Availability

Thursday, 19 May 2011

Characteristics

Decoupling

Elasticity

High Availability

Concurrency

Thursday, 19 May 2011

Decoupling

Thursday, 19 May 2011

Application

DB API

Cache FE

Thursday, 19 May 2011

Application

DB API

Cache FE

Thursday, 19 May 2011

ApplicationDB API

Cache FE

Thursday, 19 May 2011

ApplicationDB API

Cache FE

Cache

Thursday, 19 May 2011

ApplicationDB API

Cache FE

Cache

API

Thursday, 19 May 2011

ApplicationDB API

Cache FE

Cache

API

API

Thursday, 19 May 2011

Elasticity

Thursday, 19 May 2011

Cloud Computing

Thursday, 19 May 2011

Load Balancing

Thursday, 19 May 2011

HA Proxy

Nginx

My Favourite

Thursday, 19 May 2011

Monitoring

Thursday, 19 May 2011

When do I need more servers?

Thursday, 19 May 2011

Needs to be around from the start!

Thursday, 19 May 2011

Keep records

Thursday, 19 May 2011

Spot trends

Thursday, 19 May 2011

Different types

Thursday, 19 May 2011

Different types

Hardware Performance

Thursday, 19 May 2011

Different types

Hardware Performance

Software Performance

Thursday, 19 May 2011

Different types

Hardware Performance

Software Performance

Availability

Thursday, 19 May 2011

Different types

Hardware Performance

Software Performance

Availability

Resourcing

Thursday, 19 May 2011

Different types

Hardware Performance

Software Performance

Availability

Resourcing

Thursday, 19 May 2011

Applications

Thursday, 19 May 2011

ApplicationsNew Relic

Thursday, 19 May 2011

ApplicationsNew Relic

CloudKick

Thursday, 19 May 2011

ApplicationsNew Relic

CloudKick

ScoutApp

Thursday, 19 May 2011

ApplicationsNew Relic

CloudKick

ScoutApp

Nagios

Thursday, 19 May 2011

ApplicationsNew Relic

CloudKick

ScoutApp

Nagios

Cacti

Thursday, 19 May 2011

ApplicationsNew Relic

CloudKick

ScoutApp

Nagios

Cacti

Circonus

Thursday, 19 May 2011

Automation

Thursday, 19 May 2011

Want to sleep easy at night?

Thursday, 19 May 2011

Want to sleep easy at night?

Go out partying without worrying about getting a phone call?

Thursday, 19 May 2011

Plug into your monitoring

Thursday, 19 May 2011

Bringing together Monitoring and Elastic behaviour into one

beautiful whole!

Thursday, 19 May 2011

Add some intelligence to add / remove servers as needed based

on current information.

Thursday, 19 May 2011

This is why good monitoring is essential or this wouldn’t be

possible

Thursday, 19 May 2011

Just make sure it doesn’t turn into...

Thursday, 19 May 2011

Skynet!!Thursday, 19 May 2011

High Availability

Thursday, 19 May 2011

Get a highly available and resilient setup by following a few

of those recommendations

Thursday, 19 May 2011

Remember, even Google has outages

Thursday, 19 May 2011

Benefits

Thursday, 19 May 2011

Benefits

Easy management

Thursday, 19 May 2011

Benefits

Easy management

Ability to stop / start servers quickly

Thursday, 19 May 2011

Benefits

Easy management

Ability to stop / start servers quickly

Responsibilities are separate

Thursday, 19 May 2011

Benefits

Easy management

Ability to stop / start servers quickly

Responsibilities are separate

Quickly move to a new cluster

Thursday, 19 May 2011

Benefits

Easy management

Ability to stop / start servers quickly

Responsibilities are separate

Quickly move to a new cluster

Reduced risk

Thursday, 19 May 2011

What to avoid

Thursday, 19 May 2011

Local Sessions

Thursday, 19 May 2011

Store sessions in DB / Memcache

Solution

Thursday, 19 May 2011

Local Memory

Thursday, 19 May 2011

Networked Memcache

Solution

Thursday, 19 May 2011

Local Files

Thursday, 19 May 2011

Local Uploads

Thursday, 19 May 2011

Writing to /tmp

Thursday, 19 May 2011

Store on S3 or a networked FS

Solution

Thursday, 19 May 2011

Serve up static files from CDNs

Solution

Thursday, 19 May 2011

Servers can vanish at any given time

Thursday, 19 May 2011

Internal APIs

Thursday, 19 May 2011

Application

S3GFS FS

Internal Storage API

Thursday, 19 May 2011

Application

MySQLMongo Cache

Internal DB API

Thursday, 19 May 2011

SOA

Thursday, 19 May 2011

Service Oriented Architecture

Thursday, 19 May 2011

Sort of :-)

Thursday, 19 May 2011

Eventually Consistent

Thursday, 19 May 2011

CAP Therom

Thursday, 19 May 2011

Consistency

Availability

Partition Tolerance

Thursday, 19 May 2011

Consistency

All nodes see the same data at the same time

Thursday, 19 May 2011

Availability

Node failures do not prevent survivors from continuing to

operate

Thursday, 19 May 2011

Partition Tolerance

The system continues to operate despite arbitrary message loss

Thursday, 19 May 2011

Consistency

Availability

Partition Tolerance

Thursday, 19 May 2011

Queue Systems

Thursday, 19 May 2011

Good for

Thursday, 19 May 2011

Good forImage Processing

Thursday, 19 May 2011

Good forImage Processing

Distributed Logs

Thursday, 19 May 2011

Good forImage Processing

Distributed Logs

Data Mining

Thursday, 19 May 2011

Good forImage Processing

Distributed Logs

Data Mining

Mass Emails

Thursday, 19 May 2011

Good forImage Processing

Distributed Logs

Data Mining

Mass Emails

Intensive transformation

Thursday, 19 May 2011

Good forImage Processing

Distributed Logs

Data Mining

Mass Emails

Intensive transformation

Search

Thursday, 19 May 2011

Common Tools

Thursday, 19 May 2011

Common Tools

Gearman

Thursday, 19 May 2011

Common Tools

Gearman

Hadoop

Thursday, 19 May 2011

Common Tools

Gearman

Hadoop

Zero MQ (0MQ)

Thursday, 19 May 2011

Common Tools

Gearman

Hadoop

Zero MQ (0MQ)

RabbitMQ

Thursday, 19 May 2011

Common Tools

Gearman

Hadoop

Zero MQ (0MQ)

RabbitMQ

And many others!

Thursday, 19 May 2011

Gearman

Thursday, 19 May 2011

Your Client Code

Gearman Client API(C, PHP, Perl, MySQL UDF, ...)

Gearman Job Servergearmand

Gearman Worker API(C, PHP, Perl, Python, ...)

Your Worker Code

Your App Gearman

Thursday, 19 May 2011

A Story!

Thursday, 19 May 2011

Financial Software

Thursday, 19 May 2011

3000+ Clients

Thursday, 19 May 2011

Each one has 5 external data sources

Thursday, 19 May 2011

Each data source is a web service

Thursday, 19 May 2011

Ran every 6 hours every day

Thursday, 19 May 2011

Cron

1

2

3

4

5

Job 1

Gearman

1

43

2

5

Web Services Processing

Thursday, 19 May 2011

But! That wasn’t enough

Thursday, 19 May 2011

Job kicked off on login

Thursday, 19 May 2011

Another Story!

Thursday, 19 May 2011

CloudSplit

Thursday, 19 May 2011

Near Real Time Cloud Analytics

Thursday, 19 May 2011

Clients install logging agent locally

Thursday, 19 May 2011

syslogd

Thursday, 19 May 2011

Public API

Thursday, 19 May 2011

Multiple Persistent Gearman Servers

Thursday, 19 May 2011

Internal DB API

Thursday, 19 May 2011

Agent

Thursday, 19 May 2011

Agent syslogd

Thursday, 19 May 2011

Agent syslogd

API

Thursday, 19 May 2011

Agent syslogd

API

Load Balanced

Thursday, 19 May 2011

Agent syslogd

API

Gearman

Gearman

Load Balanced

Thursday, 19 May 2011

Agent syslogd

API

Gearman

Gearman

Load Balanced

PersistentThursday, 19 May 2011

Agent syslogd

API

Gearman

Gearman

Worker

Worker

Worker

Load Balanced

PersistentThursday, 19 May 2011

Agent syslogd

API

Gearman

Gearman

Worker

Worker

Worker

Internal API

Load Balanced

PersistentThursday, 19 May 2011

Agent syslogd

API

Gearman

Gearman

Worker

Worker

Worker

Internal API

Load Balanced

Load Balanced

PersistentThursday, 19 May 2011

Agent syslogd

API

Gearman

Gearman

CouchDB

Worker

Worker

Worker

Internal API

Load Balanced

Load Balanced

PersistentThursday, 19 May 2011

CouchDB Setup

Thursday, 19 May 2011

Write vs Read

Thursday, 19 May 2011

Writes

Thursday, 19 May 2011

Writes

Multi Master setup

Thursday, 19 May 2011

Writes

Multi Master setup

Replicated

Thursday, 19 May 2011

Writes

Multi Master setup

Replicated

Deals with writes only

Thursday, 19 May 2011

Writes

Multi Master setup

Replicated

Deals with writes only

Thursday, 19 May 2011

Reads

Thursday, 19 May 2011

Reads

Multi Master setup

Thursday, 19 May 2011

Reads

Multi Master setup

Replicated from write cluster

Thursday, 19 May 2011

Reads

Multi Master setup

Replicated from write cluster

Slaves handle website requests

Thursday, 19 May 2011

Reads

Multi Master setup

Replicated from write cluster

Slaves handle website requests

Thursday, 19 May 2011

Heavy Map/Reduce usage for data

Thursday, 19 May 2011

Supervisord

Thursday, 19 May 2011

Map/Reduce

Thursday, 19 May 2011

Map

Thursday, 19 May 2011

Map

Master gets a problem to solve

Thursday, 19 May 2011

Map

Master gets a problem to solve

Breaks into multiple sub-problems

Thursday, 19 May 2011

Map

Master gets a problem to solve

Breaks into multiple sub-problems

Distributed to multiple workers

Thursday, 19 May 2011

Map

Master gets a problem to solve

Breaks into multiple sub-problems

Distributed to multiple workers

A worker can take the same steps

Thursday, 19 May 2011

Map

Master gets a problem to solve

Breaks into multiple sub-problems

Distributed to multiple workers

A worker can take the same steps

Answer passed back to Master

Thursday, 19 May 2011

Reduce

Thursday, 19 May 2011

Reduce

Takes in answers from the map workers

Thursday, 19 May 2011

Reduce

Takes in answers from the map workers

Combines together to get an answer

Thursday, 19 May 2011

Reduce

Takes in answers from the map workers

Combines together to get an answer

There can be multiple reducers

Thursday, 19 May 2011

process petabytes of data in few hours on commodity server farm

Thursday, 19 May 2011

CouchDB

Thursday, 19 May 2011

CouchDB

Thursday, 19 May 2011

CouchDB

Highly Concurrent

Thursday, 19 May 2011

CouchDB

Highly Concurrent

Schema free, document based

Thursday, 19 May 2011

CouchDB

Highly Concurrent

Schema free, document based

RESTful API

Thursday, 19 May 2011

CouchDB

Highly Concurrent

Schema free, document based

RESTful API

Map/Reduce Views

Thursday, 19 May 2011

CouchDB

Highly Concurrent

Schema free, document based

RESTful API

Map/Reduce Views

Easy Replication

Thursday, 19 May 2011

Hadoop

Thursday, 19 May 2011

Hadoop is a framework for running applications on large clusters of commodity hardware.

Thursday, 19 May 2011

Thursday, 19 May 2011

The Hadoop framework transparently provides applications both reliability and data motion

Thursday, 19 May 2011

Thursday, 19 May 2011

Uses Map/Reduce concept to farm out work

Thursday, 19 May 2011

Thursday, 19 May 2011

Distributed FS to handled node failure automagically

Thursday, 19 May 2011

Thursday, 19 May 2011

Join 2 datasets together of a significant size

Thursday, 19 May 2011

Thursday, 19 May 2011

500 GB worth of log files with a large location dataset

Thursday, 19 May 2011

ØMQ

Thursday, 19 May 2011

ØMQ

Thursday, 19 May 2011

ØMQ

Async Message System

Thursday, 19 May 2011

ØMQ

Async Message System

Thin and lightweight

Thursday, 19 May 2011

ØMQ

Async Message System

Thin and lightweight

High Performance

Thursday, 19 May 2011

ØMQ

Async Message System

Thin and lightweight

High Performance

Simple

Thursday, 19 May 2011

ØMQ

Async Message System

Thin and lightweight

High Performance

Simple

Scalable

Thursday, 19 May 2011

Thursday, 19 May 2011

One socket can load balance to multiple end points

Thursday, 19 May 2011

Thursday, 19 May 2011

Multiple end points can be funnelled into a single socket

Thursday, 19 May 2011

Thursday, 19 May 2011

Handle deployments to multiple servers

Thursday, 19 May 2011

Thursday, 19 May 2011

Scale is an example of that

Thursday, 19 May 2011

Thursday, 19 May 2011

Mongrel2 is a web server that uses it in a similar way as fastcgi

Thursday, 19 May 2011

Thursday, 19 May 2011

Move around text (JSON) and Binary data for real time communication

Thursday, 19 May 2011

Thursday, 19 May 2011

Could have replaced syslogd and the external API in my previous example

Thursday, 19 May 2011

Code time? :-)

Thursday, 19 May 2011

Questions?

helgi@orchestra.ioTwitter: @h

Joind.in: http://joind.in/3212

Thursday, 19 May 2011