Top 10 Scalability Mistakes

45
Copyright © 2006, Zend Technologies Inc. Top 10 Scalability Mistakes John Coggeshall

Transcript of Top 10 Scalability Mistakes

Page 1: Top 10 Scalability Mistakes

Copyright © 2006, Zend Technologies Inc.

Top 10 Scalability Mistakes

John Coggeshall

Page 2: Top 10 Scalability Mistakes

Oct. 18, 2005 # 2

Welcome!

• Who am I: John Coggeshall Sr. Technical Consultant, Zend Technologies Author PHP 5 Unleashed Zend Educational Advisory Board Speaker on PHP-related topics worldwide Geek

Page 3: Top 10 Scalability Mistakes

Oct. 18, 2005 # 3

What is Scalability?

• Define:Scalability The ability and flexibility of an application to meet

growth requirements of an organization More then making a site go fast(er)

• Scalability in human resources, for example

• The “fastest” approach isn’t always the most scalable OO is slower, but more scalable from a code

maintence and reuse standpoint Failure to consider future needs during architectural

stages leading to failure of the application’s API to scale

Page 4: Top 10 Scalability Mistakes

Oct. 18, 2005 # 4

The secret to scalability is the ability to design, code, and maintain your applications using the same process again and again regardless of size

Page 5: Top 10 Scalability Mistakes

Oct. 18, 2005 # 5

Mistake #1: Network file systems

• Problem: We have a server farm of 10 servers and we need to deploy our codebase Very common problem Many people look to a technology like NFS

• Share one codebase

• At least 90% of the time, this is a bad idea NFS/GFS is really slow NFS /GFS has tons of locking issues

Page 6: Top 10 Scalability Mistakes

Oct. 18, 2005 # 6

Mistake #1: Network file systems

• So how do we deploy our codebase? You should always depoly your codebase locally on

the machine serving it Rsync is your friend

• What about run-time updates? Accepting File uploads

• Need to be available to all servers simutaneously Solutions vary depending on needs

• NFS may be an option for this small portion of the site• Database is also an option

Page 7: Top 10 Scalability Mistakes

Oct. 18, 2005 # 7

Mistake #2: Blocking calls

• Blocking I/O can always be a problem in an application I.e. attempting to open a remote URL from within

your PHP scripts

• If the resource is locked / slow / unavailable your script hangs while we wait for a timeout Might as well try to scale an application that has a

sleep(30) in it Very bad

Page 8: Top 10 Scalability Mistakes

Oct. 18, 2005 # 8

Mistake #2: Blocking calls

• Solutions Don’t use blocking calls in your application Don’t use blocking calls in the heavy-load aspects of

your application Have out-of-process scripts responsible for pulling

down data which aren’t connected to the web server

Page 9: Top 10 Scalability Mistakes

Oct. 18, 2005 # 9

Mistake #3: Poor database design

• Database design is almost always the most important thing in your application PHP can be used completely properly, but if you

mess up the databsae you’re hosed anyway

• Take the time to really think about your design Read books on designing relational databases Understand how Indexes work, and use them

Page 10: Top 10 Scalability Mistakes

Oct. 18, 2005 # 10

Mistake #3: Poor database design

• For example.. Using MySQL MyISAM tables all the time

• Use InnoDB instead if you can Use MyISAM tables only if you plan on doing fulltext

searching• Even then, they shouldn’t be primary tables

Page 11: Top 10 Scalability Mistakes

Oct. 18, 2005 # 11

Mistake #4: Failure to understand The web server

• When designing an application, it’s very important that you understand how PHP works in the bigger picture Know how PHP interacts and responds to your web

server For instance – How’s PHP really work with Apache

1.3.x?

Page 12: Top 10 Scalability Mistakes

Oct. 18, 2005 # 12

Mistake #4: Failure to understand The web server

• Apache 1.3.x works on a pre-fork model One parent process spawns a whole lot of child

processes Each process handles a single HTTP request at a

time• May handle a finite or infinite number of requests

before being destroyed PHP exists in the context of an Apache Child process

• This means this like “persistent” resources are only persistent to the individual child process

• Database connections total = Process total

Page 13: Top 10 Scalability Mistakes

Oct. 18, 2005 # 13

Mistake #5: Hanging up Apache

• When scaling an application, requests per second is key You should have an idea how long a single request

will take You should know how many of those requests your

server farm can handle at once without dying You should know you’re requests-per-second figures

• Too often, people let Apache handle things that it really shouldn’t I.e. Large file downloads, streamed media, etc.

Page 14: Top 10 Scalability Mistakes

Oct. 18, 2005 # 14

Mistake #5: Hanging up Apache

• When Apache is sending a 10 megabyte file, that means that one of your HTTP children is wasting it’s time shuffling data down the pipe This is definitely something that can be handled by

something else• A different HTTP server (tHttpd)• Zend Download Server

At any given point in time, you should try to design thing so that your primary server function (serving PHP scripts) is the only thing being done by Apache

Page 15: Top 10 Scalability Mistakes

Oct. 18, 2005 # 15

Mistake 5a: Letting Apache do any static handling

• On the same note, you can use something like thttpd to serve all static content Set up a subdomain static.example.com Put all of your images, flash files, javascript libs,

stylesheets, etc. on that server

Page 16: Top 10 Scalability Mistakes

Oct. 18, 2005 # 16

Tricks of the Trade

• If you're web application has a lot of semi-static content Content that could change so it has to be stored

in the DB, but almost never does

• .. And you're running on Apache

• This Design Pattern is killer!

Page 17: Top 10 Scalability Mistakes

Oct. 18, 2005 # 17

Tricks of the Trade

• Most people in PHP would implement a page like this:http://www.example.com/show_article.php?id=5

• This would be responsible for generating the semi-static page HTML for the browser

Page 18: Top 10 Scalability Mistakes

Oct. 18, 2005 # 18

Tricks of the Trade

• Instead of generating the HTML for the browser, make this script generate another PHP script that contains mostly static content Keep things like personalization code, but make the

actual article itself static in the file Write the file to disk in a public folder under

document root

Page 19: Top 10 Scalability Mistakes

Oct. 18, 2005 # 19

Tricks of the Trade

• If you put them in this directoryhttp://www.example.com/articles/5.php

• You can create a mod_rewrite rule such thathttp://www.example.com/articles/5.php maps tohttp://www.example.com/show_article.php?id=5

• Since show_article.php writes files to articles, once it's been generated no more DB reads!

Page 20: Top 10 Scalability Mistakes

Oct. 18, 2005 # 20

Tricks of the Trade

• Simple and Elegant Solution

• Allows you to keep pages “personalized”

• Very easy to Maintain

Page 21: Top 10 Scalability Mistakes

Oct. 18, 2005 # 21

Mistake #6: Designing without Scalability

• When designing your application, you should assume it needs to scale Quick and dirty prototypes often are exactly what

gets to production

• It’s easy to make sure your applications have a decent chance of scaling MySQL: Design assuming someday you’ll need

master/server replication

• Don’t write an application you’ll need three years from now, write an application you need today Just think about what you might need in three years

Page 22: Top 10 Scalability Mistakes

Oct. 18, 2005 # 22

Mistake #7: Improperly dealing with database connections

• Improperly using persistent database connections Know your database, MySQL has a relatively light

handshake process compared to Oracle

• Using PHP to deal with database fail over It’s not PHP’s Job, don’t do it. Design your PHP applications to work with hostname

aliases instead of real addresses• i.e. mysql-r, mysql-w

Have external processes responsible for switching the /etc/hosts file in the event something blows up

Page 23: Top 10 Scalability Mistakes

Oct. 18, 2005 # 23

Tricks of the Trade

• For those of us using MySQL, here’s a great replication trick from our friends at flickr InnoDB is under most circumstances considerably

faster then MyISAM MyISAM is considerably better suited for full-text

searches Trick: During a master/slave replication, the slave

table type can change• Set up a separate MyISAM fulltext search farm• Connect to those servers when performing full-text

searches

Page 24: Top 10 Scalability Mistakes

Oct. 18, 2005 # 24

Page 25: Top 10 Scalability Mistakes

Oct. 18, 2005 # 25

Mistake #8: Development Infrastructure

• Every time a client has been in real trouble, they consistently fail to have a development infrastructure More then just CVS (although that’s a good start) Establishing a development release process early-on

is critical to the overall stability of your apps• Things will go wrong at 3am in production• You need a process to release code to prevent the

very-tempting cowboy-coding

Page 26: Top 10 Scalability Mistakes

Oct. 18, 2005 # 26

Development Infrastructure

• Maintaining an existing code base is often the most costly endeavor of any application As an application grows, the complexity of it’s release

process must scale Testing becomes more and more important Your release process must be able to scale with your

application!• Staging environments• Coding Standards

“Scalability marginally impacts procedure, procedure grossly impacts scalability”

- Theo Schlossnagle

Page 27: Top 10 Scalability Mistakes

Oct. 18, 2005 # 27

Mistake #9: Failing to Cache

• Caching is one of the most important things you can do when writing a scalable application A lot of people don’t realize how much they can

cache

• You’ve already seen one form of caching in a previous trick of the trade

• What about other techniques?

Page 28: Top 10 Scalability Mistakes

Oct. 18, 2005 # 28

Mistake #9: Failing to Cache

• Improving the speed of PHP can be done very easily using an opcode cache

• PHP 6 will have this ability built-in to the engine

Page 29: Top 10 Scalability Mistakes

Oct. 18, 2005 # 29

Mistake 10: Not Knowing where to optimize

• Sooner or later, people worry about scalability

• When trying to make scalability decisions, knowledge is the most important thing you can have

• PHP has both closed source and open source profilers which do an excellent job of identifying the bottlenecks in your application Optimize where it counts

Page 30: Top 10 Scalability Mistakes

Oct. 18, 2005 # 30

• Instrumentation of your applications is key to determining what matters most when optimizing If you’re not logging, you’re shooting in the dark White-box monitoring of your applications via tools

like Zend Platform are enormously helpful in understanding what is going on

You can’t make good process (or business) decisions unless you understand how your web site is being used and by whom.

Mistake 10: Not Knowing where to optimize

Page 31: Top 10 Scalability Mistakes

Oct. 18, 2005 # 31

• Amdahl’s Law: Improving code execution time by 50% when the

code executes only 2% of the time will result in a 1% overall improvement

Improving code execution time by 10% when the code executes 80% of the time will result in a 8% overall improvement

Mistake 10: Not Knowing where to optimize

Page 32: Top 10 Scalability Mistakes

Oct. 18, 2005 # 32

• Let’s imagine that each request sent over the wire is like a car driving from point A (the client) to point B (the server)

• Roads are Networks

Mistake 11: Because I give 110%

Page 33: Top 10 Scalability Mistakes

Oct. 18, 2005 # 33

One of the biggest problems with AJAX

Page 34: Top 10 Scalability Mistakes

Oct. 18, 2005 # 34

One of the biggest problems with AJAX

• Simple requests seem to work just fine…

Page 35: Top 10 Scalability Mistakes

Oct. 18, 2005 # 35

One of the biggest problems with AJAX

Page 36: Top 10 Scalability Mistakes

Oct. 18, 2005 # 36

One of the biggest problems with AJAX

Page 37: Top 10 Scalability Mistakes

Oct. 18, 2005 # 37

One of the biggest problems with AJAX

Page 38: Top 10 Scalability Mistakes

Oct. 18, 2005 # 38

One of the biggest problems with AJAX

• The problem with AJAX has to do with multiple dependent asynchronous requests You can’t rely on any order of operations in classical

AJAX models

Page 39: Top 10 Scalability Mistakes

Oct. 18, 2005 # 39

One of the biggest problems with AJAX

Page 40: Top 10 Scalability Mistakes

Oct. 18, 2005 # 40

One of the biggest problems with AJAX

Page 41: Top 10 Scalability Mistakes

Oct. 18, 2005 # 41

One of the biggest problems with AJAX

Page 42: Top 10 Scalability Mistakes

Oct. 18, 2005 # 42

One of the biggest problems with AJAX

Page 43: Top 10 Scalability Mistakes

Oct. 18, 2005 # 43

Some requests will happen faster

• When working with AJAX, always know you cannot rely on one request finishing before the next is triggered

• Requests can take different lengths of time based on a huge array of factors Server load and Network load come to mind

• Can really mess up your application

• Bad news: None of the current AJAX toolkits account for this latency

Page 44: Top 10 Scalability Mistakes

Oct. 18, 2005 # 44

Developing with Latency in mind

• A number of tools exist for developing AJAX applications with latency in mind AJAX Proxy is a good example

• http://ajaxblog.com/archives/2005/08/08/ajax-proxy-02• Allows you to simulate latency in your requests

You can use it in conjunction with “SwitchProxy” to point your browser at a different proxy server to use it

• http://www.roundtwo.com/product/switchproxy

• Not a true solution, but at least let’s you test for the problem.

Page 45: Top 10 Scalability Mistakes

Oct. 18, 2005 # 45

Final ThoughtsFinal Thoughts

• Ultimately the secret of scalability is developing applications and procedures which scale both UP AND DOWN

• You have to be able to afford to make the application to begin with

• You have to be able to afford to make the application ten times bigger then it is

• Without process, you will fail.

Questions?