HCLT Whitepaper: Accelerated Web Content Delivery

12
Accelerated Web Content Delivery Sanjeet Joshi Architecture Technology Services HCL Technologies Ltd.

description

In 1995 when the Internet was still in its infancy there were about 16 million users worldwide using it, compared to about 2 billion worldwide users today1. Over the last fifteen years not only has the number of people using the Internet grown exponentially, but we have also witnessed an evolution of technology standards, protocols, and information consumption patterns. The Internet is no longer limited to desktop/laptop computers. An increasing number of people on the go are using handheld devices to access their preferred websites. The easy access of websites has resulted in a significant increase in Web traffic. Today while designing a Web application or a website that is expected to generate a lot of interest, one has to ensure that the Web application has the right design and infrastructure to handle the extra load, failing which websites are likely to experience difficulties. For instance, the highly popular micro-blogging website twitter.com faced stability issues for a long time after its launch, since it was not designed to handle a large amount of traffic. The performance of a Web application is determined by multiple factors such as design and application architecture, quality of code and hardware infrastructure. Performance needs to be built at every layer of the technology stack to get a solid finished product. This paper focuses on the Web content caching aspect of website performance. Purpose

Transcript of HCLT Whitepaper: Accelerated Web Content Delivery

Page 1: HCLT Whitepaper: Accelerated Web Content Delivery

Accelerated Web Content Delivery

Sanjeet Joshi

Architecture Technology Services

HCL Technologies Ltd.

Page 2: HCLT Whitepaper: Accelerated Web Content Delivery

Page 2 of 12

Accelerated Web Content Delivery

© 2010, HCL Technologies Ltd.

November 2010

The author would like to thank Dr. Usha Thakur of ATS for her valuable help in

content formatting and content enhancement.

NON-DISCLOSURE OBLIGATIONS AND DISCLAIMER

The data, information or material provided herein is

confidential and proprietary to HCL and shall not be

disclosed, duplicated or used in whole or in part for any

purpose other than as approved by an authorized official of

HCL in writing. The recipient agrees to maintain complete

confidentiality of the information; data received and shall

take all reasonable precautions/steps in maintaining

confidentiality of the same, however in any event not less

than the precautions/steps taken for its own confidential

material. If you are not the intended recipient of this

information, you are not authorized to read, forward, print,

retain, copy or disseminate this document or any part of it.

Any statements in this presentation that are not historical

facts may include forward-looking statements that involve

risks and uncertainties; actual results may differ from the

forward-looking statements.

Page 3: HCLT Whitepaper: Accelerated Web Content Delivery

Page 3 of 12

Background

In 1995 when the Internet was still in its infancy there were about 16 million users worldwide

using it, compared to about 2 billion worldwide users today1. Over the last fifteen years not only

has the number of people using the Internet grown exponentially, but we have also witnessed

an evolution of technology standards, protocols, and information consumption patterns. The

Internet is no longer limited to desktop/laptop computers. An increasing number of people on

the go are using handheld devices to access their preferred websites. The easy access of

websites has resulted in a significant increase in Web traffic.

Today while designing a Web application or a website that is expected to generate a lot of

interest, one has to ensure that the Web application has the right design and infrastructure to

handle the extra load, failing which websites are likely to experience difficulties. For instance,

the highly popular micro-blogging website twitter.com faced stability issues for a long time after

its launch, since it was not designed to handle a large amount of traffic.

The performance of a Web application is determined by multiple factors such as design and

application architecture, quality of code and hardware infrastructure. Performance needs to be

built at every layer of the technology stack to get a solid finished product.

This paper focuses on the Web content caching aspect of website performance.

Purpose

Web caching is not a new idea. It has been in use for quite some time and current browsers,

caching proxies, and Web servers provide support for it. However Web caching is most often an

ignored aspect while designing a technology stack of a Web application.

Web content caching can be implemented by content consumers (end users) to improve their

Internet browsing experience or by content providers to reduce the load on their origin

infrastructure, as well as to give their customers a better Web surfing experience.

Caching at content consumer’s end is handled by Web browsers such as Internet Explorer,

Firefox etc. This is done automatically and end users have limited control over how and what

will be cached. Some organizations also install caching proxies to cache incoming Web content

and to apply security policies.

This paper will focus on caching solutions from a content provider’s point of view and the

various ways in which content caching can enhance a website’s performance. The paper

1 See http://www.internetworldstats.com/stats.htm [November 2010] -> indicates when this site was

accessed

Page 4: HCLT Whitepaper: Accelerated Web Content Delivery

Page 4 of 12

assumes that the reader is familiar with Web standards like HTTP, HTML and is technical in

nature. It is targeted towards technology architects and solution designers.

Web Caching Concepts

The concept of caching has been widely used since the early days of computing and

implemented at various layers in a technology stack. For example the processor chip layer has

a hardware cache that is used for storing most frequently accessed instructions. Irrespective of

where a cache is used, its main function is to store the most frequently accessed data

(information or instructions) and its main goal is to improve performance by reducing

read/computation cycle times.

It is common knowledge that application level caching can be extremely beneficial in saving

multiple expensive database reads or expensive repetitive computations thereby improving the

overall application performance.

HTTP caching or Web caching goes one layer above and caches entire static Web resources

(e.g., HTML pages, CSS files etc) either at the client side (browser cache) or at the server side

(origin cache infrastructure).

Let us take a quick look at some of the common terms used with respect to caching in general

and Web caching in particular.

Origin server or origin infrastructure is the server infrastructure where Web servers or

application servers are hosted. These servers are responsible for serving fresh content upon

request.

Time to live (TTL) Cacheable data has a validity period beyond which it is considered stale.

This is referred to as TTL. It is a critical parameter because a very low TTL makes caching

ineffective and a very high TTL results in stale data being served to clients.

Cache hit occurs each time an HTTP request is served from cache.

Cache hit ratio is the percentage of all requests that result in cache hits.

A cache miss occurs when a request cannot be served from cache.

Page 5: HCLT Whitepaper: Accelerated Web Content Delivery

Page 5 of 12

Controlling Caching Behavior of Your Content

Web browsers and caching proxies depend on the HTML and HTTP headers of the delivered

content for determining if the content can be cached, and if so, for how long it can be cached.

These cache headers can be tuned to define the cache behavior of a Web application/website.

Cache Headers

HTML authors can use tags in the <HEAD> section of the HTML page to dictate the caching

behavior of that page. However, header tags for caching do not have defined standards and

hence not all browsers or caches honor them. For example using <Pragma: no-cache> does

not guarantee that the content will never be cached. Hence it is not advisable to use HTML

cache headers.

A more reliable approach is to use HTTP headers. HTTP headers are created by the Web

server and sent in response to a request. The headers help the caching layer decide if the

content can be cached, for how long it can be served, and when it needs to be refreshed from

the origin server. Some important HTTP headers that control caching are as follows:

Expires: Gives the date and time after which response is considered stale. For

example,

Expires: Sun, 06 Aug 2011 10:00:00 GMT.

Cache-Control: Provides multiple options for controlling cache mechanism. They are

as follows

max-age=[seconds] — specifies the maximum time for which a resource will be

considered fresh. Similar to Expire, this directive is relative to the time of the

request, and not absolute.

s-maxage=[seconds] — similar to max-age, except that it only applies to shared

(e.g., proxy) caches.

public — marks authenticated responses as cacheable; normally, if HTTP

authentication is required, responses are automatically private.

private — allows caches that are specific to one user (e.g., in a browser) to

store the response; shared caches (e.g., in a proxy) may not.

no-cache — forces cache to submit each request back to the origin server for

validation before releasing a cached copy. This is useful for ensuring that authentication has been respected (in combination with public) and for maintaining freshness without sacrificing all of the benefits of caching.

no-store — instructs caches not to keep a copy of the representation under any

conditions.

Page 6: HCLT Whitepaper: Accelerated Web Content Delivery

Page 6 of 12

must-revalidate — tells cache that it must obey any freshness information

user gives about a representation. HTTP allows cache to serve stale representations under special conditions.

proxy-revalidate — similar to must-revalidate, except that it only applies to

proxy caches.

Note: One important point to remember here is that not all type of content can be

cached. For instance, dynamic content generated using server side scripting cannot be

cached under normal conditions. However, dynamically assembled content that does not

change frequently can be cached by making those scripts return valid cache headers.

Content Delivery Networks

Content Delivery Networks (CDN) are established commercial solutions on the market that

provide a Web content caching layer. These networks provide a transparent caching layer

between Web clients and the origin infrastructure, and intercept every request going to the

origin server. Typically CDNs have their cache servers distributed around the world and have

smart algorithms for delivering cached content from the nearest (in terms of network hops)

cache location. CDNs take a major chunk of content serving load away from the origin

infrastructure thus reducing its load. CDNs are also used for delivering rich multimedia content

such as audio and video files.

Figure 1 illustrates where a CDN fits in the overall workflow.

CDN

httphttp

www

http

Web clients

Origin Server Infrastructure

Figure 1: Positioning a CDN

Although CDNs deliver huge value they may not be suitable for small organizations with limited

budget because they are expensive to hire. CDNs are recommended mostly for organizations

that want more control over the caching behavior of their content. In such cases, a custom CDN

not only works out to be cheaper to implement but also gives immense control over caching.

Page 7: HCLT Whitepaper: Accelerated Web Content Delivery

Page 7 of 12

SQUID Proxy in Server Acceleration Mode

Squid is an open source caching proxy product licensed under the GNU GPL. It is one of the

most widely used, robust and feature-rich open source products available on the market. Squid

is used by websites such as Wikipedia.org that witness very high traffic volumes.

Squid can be installed as a proxy to improve client side Web surfing performance, apply security

and filtering mechanism and apply organizational policies by monitoring outgoing requests.

Squid can also be installed in a reverse proxy mode to improve server side content delivery

performance. This is also known as server accelerator mode. A reverse proxy is setup close to

the origin Web servers to serve incoming requests rather than outgoing requests.

http

www

http

Web clients

Origin Server InfrastructureSquid

Reverse

Proxies

Figure 2: Squid as Reverse Proxy

A reverse proxy acts as an intermediary between a Web client and the origin Web server(s). It

receives all content requests and delivers valid content available in cache. If the requested

content is not available, the reverse proxy requests the origin server for the content. This

reduces TCP connection and content rendering load on the origin servers making them

available for other important tasks.

Some key benefits of the afore-mentioned architecture are as follows.

1. LOAD BALANCING: If the Web server infrastructure requires expensive server hardware,

Squid can be installed on a number of inexpensive commodity hardware boxes, thereby

reducing the number of expensive origin servers.

2. SECURITY: This can also provide an effective security solution because the origin server

infrastructure is hidden behind the Squid infrastructure layer. Hence any attack on the

website is limited to the squid infrastructure, and any damage is limited to the cached

content.

3. PERFORMANCE: A correctly tuned Squid installation can provide significant performance

gains as the proxy is meant for serving cached content at very high speeds. It uses in-

memory caching for better performance. Squid also provides various cache replacement

policies that play a major role in determining the performance of a Squid server.

Page 8: HCLT Whitepaper: Accelerated Web Content Delivery

Page 8 of 12

Squid Cache Replacement Policies

Cache replacement policy determines which objects in the cache can be replaced by other new

objects that are most likely to be served and thereby improve the cache hit ratio. This is an

important choice because it helps in disk and memory usage optimization. For example, the

most popular objects should not be removed from the cache and least accessed cached objects

should be replaced by more popular objects.

There are various replacement policies offered by Squid. Below we provide a brief introduction

to all of them. There is no single recommended or best policy. The right policy is chosen after

studying the content and how it is accessed.

LRU (Least Recently Used)

LRU is a common and effective choice for most cache implementations. It removes objects with

the greatest last accessed timestamp i.e. cached objects that are not accessed for a long time

are the prime candidates for replacement. LRU works well when objects that are most recently

accessed have a greater likelihood of being accessed again in the near future.

LFUDA (Least Frequently Used with Dynamic Aging)

LFU is another commonly used policy that keeps count of object references and then removes

the least used objects.

LFUDA is a variant of LFU that uses a dynamic aging policy to accommodate shifts in the set of

popular objects. In the dynamic aging policy, the cache age factor is added to the reference

count when an object is added to the cache or an existing object is modified. This prevents

previously popular documents from polluting the cache.

GDSF (Greedy Dual-Size Frequency)

GDSF is an enhancement of GDS which takes into account the size of the cached object and

the cost associated to retrieve it. GDFS takes into account frequency of reference. This policy is

optimized for more popular, smaller objects in order to maximize object hit rate.

Squid Deployment Topologies

Multiple Squid servers can be configured to work together to improve cache hit ratios or to

handle additional load. Squid caches, when installed in such a group, share either a sibling

relationship or a parent relationship. Squid servers running as parents can have multiple sibling

nodes communicating with it essentially forming a hierarchy. A flat topology may include Squid

servers with only sibling relationships.

If a request results in cache miss on a sibling node, it is transferred to the parent node. If parent

also returns a cache miss then the parent contacts the origin server for fresh content.

Page 9: HCLT Whitepaper: Accelerated Web Content Delivery

Page 9 of 12

Squid Capacity Planning

Squid's hardware requirements are generally modest. Memory is often the most important

resource. A memory shortage significantly reduces performance. Higher hit ratios are obtained

by caching more objects. Caching more objects requires more disk space. Therefore disk space

is also an important factor that needs to be considered. Fast disks and interfaces are also

beneficial in improving disk access time. SCSI performs better than ATA, and may be chosen if

the higher cost can be justified. While fast CPUs are nice, they are not critical to good

performance.

Squid allocates a small amount of memory for each cached resource (up to 24 bytes per

resource). As a rule of thumb it requires 32MB RAM for each GB disk space. So a server with

512MB RAM can serve a disk cache of 16GB, or for a 300GB disk cache, approximately 10GB

RAM will be needed.

Conclusion

Using reverse proxies for Web caching is a non-intrusive way of improving content

delivery performance.

Reverse proxy based Web caching can be implemented as a cost effective replacement

for commercial CDNs.

A customized CDN gives better control over the caching infrastructure and helps meet

the specific performance needs of an enterprise as compared to an expensive

commercial CDN which may provide limited configuration options.

CDNs can reduce considerable load from the origin servers thus freeing up the origin

server resources for other tasks.

Page 10: HCLT Whitepaper: Accelerated Web Content Delivery

Page 10 of 12

Appendix A – Case Study

“Squid Implementation for a Leading Global Entertainment Content

Company”

The customer uses Akamai Edge Server Platform for improved content delivery. Edge Server

Platform’s design helps in improving content availability and reducing request response time.

This ideally translates into less Web traffic coming directly to the Web servers (origin servers)

thus improving the overall efficiency of the infrastructure and reducing infrastructure costs.

Ironically though, it was observed that origin servers are receiving increased Web traffic from

Akamai Edge servers themselves. A solution had to be put in place to tackle that problem with

minimal impact on existing applications and content.

Problem Context

The Akamai Edge Platform offers a robust design for highly efficient content delivery across the

globe. This is achieved by deploying several thousand servers at data centers all over the world

(edge servers) and then replicating the content to be delivered on appropriate servers. The key

then is to route all content requests from clients to the nearest (in terms of network hops)

available server resulting in minimal response time and higher availability. Here the edge server

act as a caching proxy that requests content from the origin server and then serves the cached

copy until its expiry, at which point a fresh copy is again requested from the origin server.

Akamai uses a hierarchical architecture for its edge platform to avoid thousands of edge servers

making multiple refresh requests to the origin server.

The problem is that the ‘innermost’ edge servers still need to make a refresh request to get the

new content from the origin server. This results in the origin server having to serve each of the

requests separately. This was the root cause of the problem.

Akamai CDN

Origin Server Infrastructure

Foo.htm

Foo.htm

Foo.htm

Foo.htm

Figure 3 – High-level Problem Representation

Page 11: HCLT Whitepaper: Accelerated Web Content Delivery

Page 11 of 12

The customer summarized the problem at hand thus:

- High traffic documents such as home pages were being requested from their origin

servers as many as 70 times within a single TTL interval. This meant that there were that

many innermost Akamai servers in the hierarchy.

- Far too many requests were being received for pages, XML documents, dynamically

generated JS, CSS etc.

Customer felt that if the above-mentioned problems were addressed, the availability of the origin

servers would rise close to to 99.99%.

Solution Approaches Considered by HCL

Below is a brief summary of the approaches evaluated by the HCL team and its assessment of

those approaches.

Approach 1: Custom Solution - Application Server Side

The first approach called for intercepting incoming content refresh requests from the Akamai

servers to the origin servers, queuing and prioritizing them, and then rendering the highest

priority content.

HCL Assessment of Approach 1

Solution was a workable one but complex and many race conditions would have to be

considered before the solution’s effectiveness became known.

Robustness and performance of such a solution was not obvious.

Solution mandated changes to the application layer which could have resulted in a

cascading effect on the underlying layers.

Approach 2: Using Pre-fetch Settings Provided by Akamai

The second approach called for asynchronous content refresh. When this feature is enabled in

Akamai, the content refresh requests are sent even before the content becomes stale. Akamai

servers continue to serve the existing content even after sending refresh requests, thereby

refreshing content asynchronously.

HCL Assessment of Approach 2

Solution seemed like it was a perfect fit for the problem at hand, but it would not provide

a complete solution.

Solution would work well only when content was requested during the threshold set by

pre-fetch settings. For example if pre-fetch was set to 90%, Akamai servers would send

refresh requests to origin after 90% of TTL were over.

Core problem of receiving multiple requests for the same content would remain

unaddressed

Page 12: HCLT Whitepaper: Accelerated Web Content Delivery

Page 12 of 12

HCL’s Squid Reverse Proxy-Based Solution

The HCL solution was based on the following design principles:

1. Minimal or no changes to the application layer

2. No rework for content producers or brand owners

3. Once installed, solution should work transparently (without any other layers being aware

of its existence)

4. Solution should be repeatable/reusable

Using Squid as a Reverse Proxy

The goal of HCL’s solution was to minimize the number of requests going to the origin servers

while still serving as fresh content as possible.

As a first step, the HCL team proposed the installation of Squid in the reverse proxy mode on a

separate infrastructure. This introduced an additional caching layer between Akamai servers

and the origin servers. Upon setup, it cached all the relevant content and served it whenever

requested by Akamai. The team used advanced cache control setting provided by Squid (v 2.7)

to control the number of redundant requests for a single resource and to also support

asynchronous refresh.

Goals Achieved

The solution proposed by the HCL team passed the rigorous performance checks with over

90% load reduction.