Download - 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin ApacheCon 2004 November 17, 2004.

Transcript
Page 1: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

1

HTTP Caching & Cache-Bustingfor Content Publishers

Michael J. Radwinhttp://public.yahoo.com/~radwin/

ApacheCon 2004November 17, 2004

Page 2: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

2

Hi, I’m Michael J. Radwin

• Engineering Manager, Yahoo! Inc.– Internal FAMP dev & support

• FreeBSD, Apache, MySQL, PHP• Also, some C/C++ libs (networking, data storage)

– Web Services infrastructure– Developer Tools

• CVS, Bugzilla, package mgmt, i18n workflow

• Slides online– http://public.yahoo.com/~radwin/

Page 3: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

3

Why you’re here today

• Publishers have a lot of web content– HTML, images, Flash, movies

• Speed is important part of user experience• Bandwidth is expensive

– Use what you need, but avoid unnecessary extra

• Personalization differentiates– Show timely data (stock quotes, news stories)– Get accurate advertising statistics– Protect sensitive info (e-mail, account balances)

Page 4: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

4

Not covered in this talk

• Proxy deployment– Configuring proxy cache servers (i.e. Squid)– Configuring browsers to use proxy caches– Transparent/interception proxy caching– Intercache protocols (ICP, HTCP)

• HTTP acceleration (a k a reverse proxies)• Database query results caching

Page 5: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

5

HTTP Review

Server

ClientInterne

t

(1) Client connects to www.example.com port 80

Server

Client

(2) Client sends HTTP GET request

Internet

Page 6: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

6

HTTP Review (cont’d)

Server

ClientInterne

t

(3) Client reads HTTP response from server

Server

ClientInterne

t

(4) Client and Server close connection

Page 7: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

7

HTTP Example

mradwin@machshav:~$ telnet www.example.com 80Trying 192.168.37.203...Connected to w6.example.com.Escape character is '^]'.GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKDate: Wed, 28 Jul 2004 23:36:12 GMTLast-Modified: Fri, 23 Jul 2004 01:52:37 GMTContent-Length: 3688Connection: closeContent-Type: text/html

<html><head><title>Hello World</title>...

Page 8: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

8

Browsers use private caches

Server

ClientInterne

t

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul 2004 01:52:37 GMTContent-Length: 3688Content-Type: text/html

Client stores copy of http://www.example.com/foo/index.html on its hard disk with timestamp.

Page 9: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

9

Revalidation (Conditional GET)

Server

ClientInterne

t

GET /foo/index.html HTTP/1.1Host: www.example.comIf-Modified-Since: Fri, 23 Jul 2004 01:52:37 GMT

HTTP/1.1 304 Not Modified

Page 10: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

10

Non-Caching Proxy

Server

ClientInternet

Proxy

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul ...Content-Length: 3688Content-Type: text/html

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul ...Content-Length: 3688Content-Type: text/html

Page 11: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

11

Proxy Cache Miss

Server

ClientInternet

Proxy

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul ...Content-Length: 3688Content-Type: text/html

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul ...Content-Length: 3688Content-Type: text/html

Page 12: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

12

Proxy Cache Hit

Server

ClientInternet

Proxy

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul ...Content-Length: 3688Content-Type: text/html

Page 13: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

13

Proxy Cache Revalidation Hit

Server

ClientInternet

Proxy

GET /foo/index.html HTTP/1.1Host: www.example.comIf-Modified-Since: Fri, 23 Jul ...

HTTP/1.1 304 Not Modified

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul ...Content-Length: 3688Content-Type: text/html

Page 14: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

14

Assumptions about content types

Rate of change once published Frequently Occasionally Rarely/Never

HTML CSS

JavaScript

Images

Flash

PDF

Dynamic Content Static ContentPersonalized Same for everyone

Page 15: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

15

Top 5 techniques for publishers

1. Use “Cache-Control: private” for personalized content

2. Implement “Images Never Expire” policy

3. Use a cookie-free TLD for static content

4. Use Apache defaults for CSS & JavaScript

5. Use random strings in URL for accurate hit metering or very sensitive content

Page 16: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

16

1. Use “Cache-Control: private”for personalized content

Rate of change once published Frequently Occasionally Rarely/Never

HTML CSS

JavaScript

Images

Flash

PDF

Dynamic Content Static ContentPersonalized Same for everyone

Page 17: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

17

Shared caching gone awry (1)

WebmailServer

Client 1Internet

Proxy

GET /msg3.html HTTP/1.1Host: webmail.example.comCookie: user=jane

Jane’s e-mail message

GET /msg3.html HTTP/1.1Host: webmail.example.comCookie: user=jane

Jane’s e-mail message

Page 18: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

18

Shared caching gone awry (2)

WebmailServer

Client 1

msg3.html

Internet

Proxy

GET /msg3.html HTTP/1.1Host: webmail.example.comCookie: user=jane

Jane’s e-mail message

Page 19: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

19

Shared caching gone awry (3)

WebmailServer

Client 2 msg3.html

Internet

Proxy

GET /msg3.html HTTP/1.1Host: webmail.example.comCookie: user=mary

Jane’s

e-mail

message

Page 20: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

20

What’s cacheable?

• HTTP/1.1 allows caching anything by default– Unless explicit Cache-Control header

• In practice, most caches avoid anything with– Cache-Control/Pragma header– Cookie/Set-Cookie headers– WWW-Authenticate/Authorization

header– POST/PUT method– 302/307 status code

Page 21: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

21

Cache-Control: private

• Shared caches bad for shared content– Mary shouldn’t be able to read Jane’s mail

• Private caches perfectly OK– Speed up web browsing experience

• Avoid personalization leakage with single line in httpd.conf or .htaccessHeader set Cache-Control private

Page 22: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

22

2. “Images Never Expire” policy

Rate of change once published Frequently Occasionally Rarely/Never

HTML CSS

JavaScript

Images

Flash

PDF

Dynamic Content Static ContentPersonalized Same for everyone

Page 23: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

23

“Images Never Expire” Policy

• Encourage caching of icons & logos– Forever ≈ 10 years in Internet biz

• Must change URL when you change img– http://us.yimg.com/i/new.gif– http://us.yimg.com/i/new2.gif

• Tradeoff– More difficult for designers– Bandwidth savings, faster user experience

Page 24: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

24

Imgs Never Expire: mod_expires

# Works with both HTTP/1.0 and HTTP/1.1ExpiresActive OnExpiresByType image/gif A315360000ExpiresByType image/jpeg A315360000ExpiresByType image/png A315360000

Page 25: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

25

Imgs Never Expire: mod_headers

# Works with HTTP/1.1 only<FilesMatch "\.(gif|jpe?g|png)$"> Header set Cache-Control \ "max-age=315360000"

</FilesMatch># Works with both HTTP/1.0 and HTTP/1.1<FilesMatch "\.(gif|jpe?g|png)$"> Header set Expires \ "Mon, 28 Jul 2014 23:30:00 GMT"

</FilesMatch>

Page 26: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

26

mod_images_never_expire

/* Enforce policy with module that runs at URI translation hook */static int translate_imgexpire(request_rec *r) { const char *ext; if ((ext = strrchr(r->uri, '.')) != NULL) { if (strcasecmp(ext,".gif") == 0 || strcasecmp(ext,".jpg") == 0 || strcasecmp(ext,".png") == 0 || strcasecmp(ext,".jpeg") == 0) { if (ap_table_get(r->headers_in,"If-Modified-Since") != NULL || ap_table_get(r->headers_in,"If-None-Match") != NULL) { /* Don't bother checking filesystem, just hand back a 304 */ return HTTP_NOT_MODIFIED; } } } return DECLINED;}

Page 27: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

27

3. Cookie-free static content

Rate of change once published Frequently Occasionally Rarely/Never

HTML CSS

JavaScript

Images

Flash

PDF

Dynamic Content Static ContentPersonalized Same for everyone

Page 28: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

28

Use a cookie-free Top Level Domain for static content

• For maximum efficiency use two domains– www.example.com for HTML– img.example.net for images

• Some proxies won’t cache Cookie reqs– But: multimedia is never personalized– Cookies irrelevant for images

Page 29: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

29

Typical GET request w/Cookies

GET /i/foo/bar/quux.gif HTTP/1.1Host: www.example.comUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040707

Firefox/0.8Accept: application/x-shockwave-flash,text/xml,application/xml,application/

xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1

Cookie: U=mt=vtC1tp2MhYv9RL5BlpxYRFN_P8DpMJoamllEcA--&ux=IIr.AB&un=42vnticvufc8v; brandflash=1; B=amfco1503sgp8&b=2; F=a=NC184LcsvfX96G.JR27qSjCHu7bII3s. tXa44psMLliFtVoJB_m5wecWY_.7&b=K1It; LYC=l_v=2&l_lv=7&l_l=h03m8d50c8bo &l_s=3yu2qxz5zvwquwwuzv22wrwr5t3w1zsr&l_lid=14rsb76&l_r=a8&l_um=1_0_1_0_0; GTSessionID835990899023=83599089902340645635; Y=v=1&n=6eecgejj7012f &l=h03m8d50c8bo/o&p=m012o33013000007&jb=16|47|&r=a8&lg=us&intl=us&np=1; PROMO=SOURCE=fp5; YGCV=d=; T=z=iTu.ABiZD/AB6dPWoqXibIcTzc0BjY3TzI3NTY0MzQ-&a=YAE&sk=DAAwRz5HlDUN2T&d=c2wBT0RBekFURXdPRFV3TWpFek5ETS0BYQFZQUUBb2sBWlcwLQF0aXABWUhaTVBBAXp6AWlUdS5BQmdXQQ--&af=QUFBQ0FDQURCOUFIQUJBQ0FEQUtBTE FNSDAmdHM9MTA5MDE4NDQxOCZwcz1lOG83MUVYcTYxOVouT2Ftc1ZFZUhBLS0-; LYS=l_fh=0&l_vo=myla; PA=p0=dg13DX4Ndgk-&p1=6L5qmg--&e=xMv.AB; YP.us=v=2&m=addr&d=1525+S+Robertson+Blvd%01Los+Angeles%01CA%0190035-4231%014480%0134.051590%01-118.384342%019%01a%0190035

Referer: http://www.example.com/foo/bar.php?abc=123&def=456Accept-Language: en-us,en;q=0.7,he;q=0.3Accept-Encoding: gzip,deflateAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7Keep-Alive: 300Connection: keep-alive

Page 30: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

30

Same request, no Cookies

GET /i/foo/bar/quux.gif HTTP/1.1Host: img.example.netUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040707

Firefox/0.8Accept: application/x-shockwave-flash,text/xml,application/xml,application/

xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1

Referer: http://www.example.com/foo/bar.php?abc=123&def=456Accept-Language: en-us,en;q=0.7,he;q=0.3Accept-Encoding: gzip,deflateAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7Keep-Alive: 300Connection: keep-alive

• Added bonus: much smaller GET request– Dial-up MTU size 576 bytes, PPPoE 1492– 1450 bytes reduced to 550

Page 31: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

31

4. Apache defaults for static, occasionally-changing content

Rate of change once published Frequently Occasionally Rarely/Never

HTML CSS

JavaScript

Images

Flash

PDF

Dynamic Content Static ContentPersonalized Same for everyone

Page 32: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

32

Revalidation works pretty well

• Revalidation default behavior for static content– Browser sends If-Modified-Since request– Server replies with short 304 Not Modified– No fancy Apache config needed

• Use if you can’t predict when content will change– Page designers can change immediately– No renaming necessary

• Cost: extra HTTP transaction for 304– Small with Keep-Alive, but large sites disable

Page 33: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

33

5. Random URL strings for hit metering, sensitive content

Rate of change once published Frequently Occasionally Rarely/Never

HTML CSS

JavaScript

Images

Flash

PDF

Dynamic Content Static ContentPersonalized Same for everyone

Page 34: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

34

Accurate advertising statistics

• If you trust proxies– Send Cache-Control: must-revalidate– Count 304 Not Modified log entries as hits

• If you don’t– Ask client to fetch uncacheable image URL– Return 302 to highly cacheable image file– Count 302s as hits– Don’t bother to look at cacheable server log

Page 35: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

35

Hit-metering for ads (1)

<script type="text/javascript">var r = Math.random();var t = new Date();document.write("<img width='109' height='52'

src='http://ads.example.com/ad/foo/bar.gif?t=" + t.getTime() + ";r=" + r + "'>");

</script><noscript><img width="109" height="52" src=

"http://ads.example.com/ad/foo/bar.gif?js=0"></noscript>

Page 36: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

36

Hit-metering for ads (2)

GET /ad/foo/bar.gif?t=1090538707;r=0.510772917234983 HTTP/1.1Host: ads.example.comUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7)

Gecko/20040707 Firefox/0.8Referer: http://www.example.com/foo/bar.php?abc=123&def=456Cookie: uid=C50DF33E-E202-4206-B1F3-946AEDF9308B

HTTP/1.1 302 Moved TemporarilyDate: Wed, 28 Jul 2004 23:45:06 GMTCache-Control: max-age=0,no-cache,no-storeExpires: Tue, 11 Oct 1977, 01:23:45 GMTPragma: no-cacheLocation: http://img.example.net/i/foo/bar.gifContent-Type: text/html

<a href="http://img.example.net/i/foo/bar.gif">Moved</a>

Page 37: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

37

Hit-metering for ads (3)

GET /i/foo/bar.gif HTTP/1.1Host: img.example.netUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7)

Gecko/20040707 Firefox/0.8Referer: http://www.example.com/foo/bar.php?abc=123&def=456

HTTP/1.1 200 OKDate: Wed, 28 Jul 2004 23:45:07 GMTLast-Modified: Mon, 05 Oct 1998 18:32:51 GMTETag: "69079e-ad91-40212cc8"Cache-Control: public,max-age=315360000Expires: Mon, 28 Jul 2014 23:45:07 GMTContent-Length: 6096Content-Type: image/gif

GIF89a...

Page 38: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

38

Defeating proxies: turning public caches into private caches

• Use distinct tokens in URL– No two users use same token– Defeats shared proxy caches– Works well with private caches

• Doesn’t break the back button• May break visited-link highlighting

– e.g. JavaScript timestamps/random numbers– Every link is blue, no purple

Page 39: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

39

Breaking the Back button

• When users click browser Back button– Expect to go back one page instantly– Private cache enables this behavior

• Aggressive cache-busting breaks Back button– Server sends Pragma: no-cache or Expires in past– Browser must re-visit server to re-fetch page– Hitting network much slower than hitting disk

• Use very sparingly– Compromising user experience is A Bad Thing

Page 40: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

40

Review: Top 5 techniques

1. Use “Cache-Control: private” for personalized content

2. Implement “Images Never Expire” policy

3. Use a cookie-free TLD for static content

4. Use Apache defaults for CSS & JavaScript

5. Use random strings in URL for accurate hit metering or very sensitive content

Page 41: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

41

Review: encouraging caching

• Send explicit Cache-Control or Expires

• Generate “static content” headers– Last-Modified, ETag– Content-Length

• Avoid “cgi-bin”, “.cgi” or “?” in URLs– Some proxies (e.g. Squid) won’t cache– Workaround: use PATH_INFO instead

Page 42: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

42

Review: discouraging caching

• Use POST instead of GET• Use random strings and “?” char in URL• Omit Content-Length & Last-Modified• Send explicit headers on response

– Breaks the back button– Only as a last resort

Cache-Control: max-age=0,no-cache,no-storeExpires: Thu, 01 Jan 1970 12:34:56 GMTPragma: no-cache

Page 43: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

43

Recommended Reading

• Web Caching and Replication– Michael Rabinovich &

Oliver Spatscheck– Addison-Wesley, 2001

• Web Caching– Duane Wessels– O'Reilly, 2001

Page 44: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

44

Wrapping Up

• Please fill out Session Eval Form– Session: WE16– Title: HTTP Caching & Cache-busting– Speaker: Michael Radwin

• Slides online– http://public.yahoo.com/~radwin/

Page 45: 1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin  ApacheCon 2004 November 17, 2004.

45