Web Speed And Scalability

24
Jason Ragsdale – 01/08/2008

description

Web Speed And Scalability

Transcript of Web Speed And Scalability

Page 1: Web Speed And Scalability

Jason Ragsdale – 01/08/2008

Page 2: Web Speed And Scalability

How to build a bigger, faster, and more reliable website

You will learn the concepts of Speed and Scalability

Specific Examples of Caching, Load Balancing and testing tools.

Page 3: Web Speed And Scalability

What is Scalability? Avoiding Failure High Availability?!?!?

Monitoring Release Cycles Fault Tolerence Load Balancing

Static Content Caching Yslow (Let it be your friend)

Page 4: Web Speed And Scalability

Horizontal Scalability Capacity can be increased just by adding more

hardware/software Best solution Does not guarntee that you are safe

Up (Vertical) Scalability Capacity can be increased by adding more Disk

Storage, RAM , Processors Expensive Should only be used if Horizontal will not work for you Difficult to move to Horizontal if you run out of capacity

in your hardware

Page 5: Web Speed And Scalability

Capital investment will be made The system will be more complex Maintenance costs will increase Time will be required to act

Page 6: Web Speed And Scalability

Good Planning Have a plan for whatever you are about to do to your

system, and most importantly, have a roll-back plan if and when things do not work the way you expected.

Functional and Unit Testing Automated test do not catch everything that can go

wrong, but they are very good at catching bugs introduced by changes elsewhere in your code base

Unit Testing (PHPUnit, Simpletest) Function Testing (selenium)

Control Change (Version Control) USE IT!!!! There is no better way even as a single

developer to keep your codebase safe from bad changes

Page 7: Web Speed And Scalability

Version Control in Action /trunk/

Used for all mainline development /production/

Only stable and production ready code from trunk is contained in here. Only make fix severe bug fixes in this branch

/tags/ Holds copies of production ready code

Do not use Version Control as a backup solution, backup your VCS seperately

Page 8: Web Speed And Scalability

High Availablity?!?!?! What is “five nines” 99.999%?

Do the math, 60 seconds * 60 minutes * 24 hours * 365 days 31,536,000 seconds of uptime a year

99.999 * 31536000 = 315.36 seconds of downtime a year

Understand the goodness of “Planned maintence periods” There are things you will need to do to your systems

on a peridoic basis I.E. Database Cleanup, Disk Defrag, Software/Hardware Upgrades

You can stagger your maintence periods if you have enough servers so you have no custmomer downtime, just a reduction in capacity

Page 9: Web Speed And Scalability

Monitoring No matter how stable your code is or how

reliable your hardware, you will have failure Monitoring Methods

Top Down (Business Monitors) Monitor the application as the customer interacts with it

Bottom Up (System Monitors) Most commonly used Monitors the base components of your application like

Disk Space Network speed Database Statistics

By no means bad, but without Business Monitoring you will not be able to catch all failures

Page 10: Web Speed And Scalability

Criteria For A Monitoring System SNMP Support

Can support most systems out there Extensibility

Ability to plugin custom monitoring packages Flexible notifications

Handle notifing operators and escaliting issues if they are not looked into Custom reaction

In the event of errors that can not be diagnosed by computers, need to be able to notify a human to do further investigation

Complex scheduling Ability to set the monitoring frequency and timing per monitoring item

Maintenance scheduling Monitors should never be taken offline, they need to be smart enough to know

when a maintence period is in effect Event acnowledgement

Ability to understand when a event needs to be paged to a human at 2am, and when it shouldent

Service dependencies You need to monitor all points between your monitoring system and the client. This

includes Firewalls, Routers, Switches

Page 11: Web Speed And Scalability

Release Cycles Basic Release Cycle

Development Things are expected to break

Staging QA and bug fixing a build before release

Production Only serious bug fixes are pushed

Keep in mind that reality has priority over “Best Practice” You can and will have to release from

development… it happens

Page 12: Web Speed And Scalability

Fault Tolerence

router

switch

www-1-1

www-1-2

Intertubes

router

switch

www-1-1

www-1-2

Intertubes

router

switch

Page 13: Web Speed And Scalability

Load Balancing Load Balancing is NOT HA Balancing is meant to spread the workload of requests

across the cluster Balancing Approaches

Round robin One request per server in a uniform rotation

Least connections The faster the machine processes requests the more it will receive

Perdictive Useally based on Round robin or Least connections with some

custom code Available resources

Not a good choice, bad performance Random

Pure random distribution of requests Weighted random

Random with a preference to specific machines

Page 14: Web Speed And Scalability

Static Content Static content is

Images CSS JS Any non dynamic element

Serving these items from a dedicated server fees up your web process for actual dynamic code, intern increasing your capacity and response speed

On you static server you can use lightHTTP, which is very quick at serving static content compaired to apache (Although apache 2.2.x is much better than 1.3.x)

Page 15: Web Speed And Scalability

Types of Caching Layered / Transport Cache

“Transparent” Placed infront of your hardware and caches requests before they

hit your webserver Intergrated (Look-Aside) Cache

Computational Reuse technique Used where the cost of storing the results of a computation and later

finding them again is less expensive than performing the computation again

Write-Thru Caches Application is responsible for updating the Cache and Datastore

when changes are made Write-Back Caches

All data changes are made to the cache Cache layer is responsible for modifing the backend datastore

Distrubuted Cache Using several machines to cache data, distrubiting the data and

load Memcached can do this very simply

Page 16: Web Speed And Scalability

Memcahed It is a high-performance, distributed object caching

system It is simple to setup and use

# ./memcached -d -m 2048 -l 10.0.0.40 -p 11211 It is not designed to be redudant

If you loose data you memcache will repopulate the data as it is accessed

It provides no security to your cache “Memcached is the soft, doughy underbelly of your

application. Part of what makes the clients and server lightweight is the complete lack of authentication. New connections are fast, and server configuration is nonexistent. If you wish to restrict access, you may use a firewall, or have memcached listen via unix domain sockets.”

Limitations Key size limited to 250 characters Data size limited to 1MB

Page 17: Web Speed And Scalability

APC and why it’s your friend Alternative PHP Cache

The Alternative PHP Cache (APC) is a free and open opcode cache for PHP. It was conceived of to provide a free, open, and robust framework for caching and optimizing PHP intermediate code.

Just enabling APC will transparently cache your code as you use it, no code changes required on your side

Provides a cheap caching layer that can be shared on a between all apache processes on one machine

Page 18: Web Speed And Scalability

YSlow? Based on 13 princables from

http://developer.yahoo.com/performance/rules.html 1.) Make fewer HTTP requests

80% of the end-user response time is spent on the front-end. Most of this time is tied up in downloading all the components in the page: images, stylesheets, scripts, Flash, etc. Reducing the number of components in turn reduces the number of HTTP requests required to render the page. This is the key to faster pages.

2.) Use a CDN The user's proximity to your web server has an impact on response times.

Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user's perspective. But where should you start?

3.) Add an Expires header Web page designs are getting richer and richer, which means more scripts,

stylesheets, images, and Flash in the page. A first-time visitor to your page may have to make several HTTP requests, but by using the Expires header you make those components cacheable. This avoids unnecessary HTTP requests on subsequent page views. Expires headers are most often used with images, but they should be used on all components including scripts, stylesheets, and Flash components.

4.) Gzip components The time it takes to transfer an HTTP request and response across the network can

be significantly reduced by decisions made by front-end engineers. It's true that the end-user's bandwidth speed, Internet service provider, proximity to peering exchange points, etc. are beyond the control of the development team. But there are other variables that affect response times. Compression reduces response times by reducing the size of the HTTP response.

Page 19: Web Speed And Scalability

YSlow? 5.) Put CSS at the top

While researching performance at Yahoo!, we discovered that moving stylesheets to the document HEAD makes pages load faster. This is because putting stylesheets in the HEAD allows the page to render progressively.

6.) Put JS at the bottom Rule 5 described how stylesheets near the bottom of the page prohibit

progressive rendering, and how moving them to the document HEAD eliminates the problem. Scripts (external JavaScript files) pose a similar problem, but the solution is just the opposite: it's better to move scripts from the top to as low in the page as possible. One reason is to enable progressive rendering, but another is to achieve greater download parallelization.

7.) Avoid CSS expressions CSS expressions are a powerful (and dangerous) way to set CSS

properties dynamically. They're supported in Internet Explorer, starting with version 5. As an example, the background color could be set to alternate every hour using CSS expressions.

8.) Make JS and CSS External Many of these performance rules deal with how external components

are managed. However, before these considerations arise you should ask a more basic question: Should JavaScript and CSS be contained in external files, or inlined in the page itself?

Page 20: Web Speed And Scalability

YSlow? 9.) Reduce DNS lookups

The Domain Name System (DNS) maps hostnames to IP addresses, just as phonebooks map people's names to their phone numbers. When you type www.yahoo.com into your browser, a DNS resolver contacted by the browser returns that server's IP address. DNS has a cost. It typically takes 20-120 milliseconds for DNS to lookup the IP address for a given hostname. The browser can't download anything from this hostname until the DNS lookup is completed.

10.) Minify JS Minification is the practice of removing unnecessary characters from code to reduce its size thereby

improving load times. When code is minified all comments are removed, as well as unneeded white space characters (space, newline, and tab). In the case of JavaScript, this improves response time performance because the size of the downloaded file is reduced. Two popular tools for minifying JavaScript code are JSMin and YUI Compressor.

11.) Avoid redirects Redirects are accomplished using the 301 and 302 status codes.

12.) Remove duplicate scripts It hurts performance to include the same JavaScript file twice in one page. This isn't as unusual as you

might think. A review of the ten top U.S. web sites shows that two of them contain a duplicated script. Two main factors increase the odds of a script being duplicated in a single web page: team size and number of scripts. When it does happen, duplicate scripts hurt performance by creating unnecessary HTTP requests and wasted JavaScript execution.

13.) Configure Etags Entity tags (ETags) are a mechanism that web servers and browsers use to determine whether the

component in the browser's cache matches the one on the origin server. (An "entity" is another word for what I've been calling a "component": images, scripts, stylesheets, etc.) ETags were added to provide a mechanism for validating entities that is more flexible than the last-modified date. An ETag is a string that uniquely identifies a specific version of a component. The only format constraints are that the string be quoted. The origin server specifies the component's ETag using the ETag response header.

14.) Make AJAX cachable People ask whether these performance rules apply to Web 2.0 applications. They definitely do! This

rule is the first rule that resulted from working with Web 2.0 applications at Yahoo!.

Page 21: Web Speed And Scalability

Example apache 2.x performace config

# enable expirationsExpiresActive On# expire GIF images after a month in the client's cacheExpiresByType image/gif A2592000ExpiresByType image/jpeg A2592000ExpiresByType text/css A2592000ExpiresByType application/x-javascript A2592000

# disable ETagsFileETag None

Page 22: Web Speed And Scalability

Example apache 2.x performace config

# Gzip Compression

# Insert filterSetOutputFilter DEFLATE

# Netscape 4.x has some problems...BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problemsBrowserMatch ^Mozilla/4\.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fineBrowserMatch \bMSIE !no-gzip !gzip-only-text/html

# NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48# the above regex won't work. You can use the following# workaround to get the desired effect:BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html

# Don't compress imagesSetEnvIfNoCase Request_URI \\.(?:gif|jpe?g|png|mp3)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong contentHeader append Vary User-Agent env=!dont-vary

Page 23: Web Speed And Scalability
Page 24: Web Speed And Scalability

YSlow: http://developer.yahoo.com/yslow/ Rules:

http://developer.yahoo.com/performance/rules.html Scalable Internet Architectures

By Theo Schlossnagle APC: http://us3.php.net/apc Memcahed: http://www.danga.com/memcached/ Selenium: http://www.openqa.org/selenium/ Simpletest: http://simpletest.org/ PHPUnit: http://www.phpunit.de/