Advanced HTTP Caching
-
Upload
martin-breest -
Category
Engineering
-
view
89 -
download
3
Transcript of Advanced HTTP Caching
Spreadshirt
Advanced HTTP CachingMartin Breest
Spreadshirt 2
Agenda
• Recap HTTP Caching Basics Expiration Revalidation Variation
• Advanced HTTP Caching Decomposition Stale Content Delivery Purging Caching User Data
Spreadshirt 3
Recap HTTP Caching Basics
Spreadshirt 4
Expiration with Cache-Control
Browser Origin Server
Resource/index.html
CachedRepresentation
RequestGET /index.html HTTP/1.1
ResponseHTTP/1.1 200 OKDate: Fri, 16 Sep 2016 12:15:00 GMTCache-Control: max-age=2700
Spreadshirt 5
Revalidation with Last-Modified and/or ETag
Browser Origin Server
Resource/index.html
CachedRepresentation
RequestGET /index.html HTTP/1.1
ResponseHTTP/1.1 200 OKDate: Fri, 16 Sep 2016 12:15:00 GMTCache-Control: max-age=2700Last-Modified: Fri, 15 Sep 2016 12:00:00 GMT
Browser Origin Server
Resource/index.html
CachedRepresentation
ResponseHTTP/1.1 304 Not ModifiedDate: Fri, 16 Sep 2016 13:00:00 GMTCache-Control: max-age=2700Last-Modified: Fri, 15 Sep 2016 12:00:00 GMT
RequestGET /index.html HTTP/1.1If-Modified-Since: Fri, 15 Sep 2016 12:00:00 GMT
45 minutes later …
Spreadshirt 6
ResponseHTTP/1.1 200 OKVary: Accept-Encoding
Variation with Vary
Browser Origin Server
Resource/index.html
Representation
RequestGET /index.html HTTP/1.1Accept-Encoding: gzip
ResponseHTTP/1.1 200 OKVary: Accept-Encoding
Intermediate Cache
RequestGET /index.html HTTP/1.1Accept-Encoding: gzip
Resource/index.html
Representation
CachedRepresentation
Accept-Encoding: gzip Empty Accept-Encoding
Spreadshirt 7
Advanced HTTP Caching
Spreadshirt 8
Can this page be cached?
Spreadshirt 9
Hard to say, right?
Spreadshirt 10
Problem 1 - User data Login, basket and wish list are user-specific and make it difficult to cache page
Spreadshirt 11
Problem 2 – Many moving parts CMS Header
Breadcrumb
Article Detail
Related Articles
ProductType Details
Design Details
Related Designs
Related Tags
CMS Footer
Page contains many different parts from different sources with different TTLs
Spreadshirt 12
Decomposition
Spreadshirt 13
Problem description
• Hard to determine good cache time
• No reusable, cacheable parts on edge node
• Full page needs to always be fetched from source (latency)
• Page gets delivered by one service that glues everything together (which might be a bottleneck)
• Javascript on certain mobile devices costly
• Use of Javascript can create SEO problems
RequestGET /regenbogenbaer-A103766209 HTTP/1.1
Spreadshirt 14
Solution: Divide and conquerDecompose page into more manageable parts
without using Javascript
Spreadshirt 15
Decomposing a pageRequestGET /regenbogenbaer-A103766209 HTTP/1.1
RequestGET /cms/header HTTP/1.1
RequestGET /regenbogenbaer-A103766209 HTTP/1.1
… prefix ...
… suffix ...
RequestGET /breadcrumb/t-shirts HTTP/1.1
RequestGET /relatedArticles/103766209 HTTP/1.1
template
part
part
part
Spreadshirt 16
<html> <head> … </head> <body> <esi:include src="/cms/header"/> <esi:include src="/breadcrumb/t-shirts"/> <div> ... the article html ... </div> <esi:include src="/relatedArticles/103766209"/> ... </body></html>
Glueing it together again - Edge Side Includes (ESI)
RequestGET /cms/header HTTP/1.1
RequestGET /regenbogenbaer-A103766209 HTTP/1.1
… prefix ...
… suffix ...
RequestGET /breadcrumb/t-shirts
RequestGET /relatedArticles/103766209 HTTP/1.1
GET /regenbogenbaer-A103766209 HTTP/1.1
see https://www.w3.org/TR/esi-lang
Load template
Include part
Include part
Include part
Spreadshirt 17
Many cache/CDN providers implement ESI
• Varnish Cache supports a minimal ESI set (include only)
• Fastly CDN supports ESI (include, comment, remove)
• AKAMAI CDN supports full ESI set
Spreadshirt 18
Pros & Cons
Pros• Template and parts are own resources with own URL and
response headers
• Parts can be reused
• Parts can be purged individually
• Cache times can be configured separately
• Template and parts get cached at the edge (low latency)
Cons• ESI include executed sequentially usually
• Error handling is a problem
• Javascript and CSS for parts not combined
Spreadshirt 19
Stale Content Delivery
Spreadshirt 20
Problem description
ResponseHTTP/1.1 200 OK
Browser Origin Server
Resource/index.html
ResponseHTTP/1.1 200 OK
Intermediate Cache
Resource/index.html
RequestGET /index.html HTTP/1.1
RequestGET /index.html HTTP/1.1
ResponseHTTP/1.1 503 Service Unavailable
Browser Origin Server
Resource/index.htmlResponse
HTTP/1.1 503 Service Unavailable
Intermediate Cache
Resource/index.html
RequestGET /index.html HTTP/1.1
RequestGET /index.html HTTP/1.1
Problem 1: Full server processing time on revalidation
Problem 2: Error on revalidation if origin is down
Origin server response time determines cache response time
If origin is down cache delivers errors as well
Spreadshirt 21
Fresh vs. stale
Fresh Stale
T_Origin TTL GraceReceived response and added response representation to cache
Time until response representation can be served from cache
Time until response representation that requires revalidation canbe served from cache
KeepTime representationmight stay in cache
Spreadshirt 22
Solution: Stale now is better than fresh
later or temporarily downDeliver stale content temporarily to bridge
cache refresh and origin outages
Spreadshirt 23
Deliver stale content on revalidation with stale-while-revalidate
ResponseHTTP/1.1 200 OKCache-Control: max-age=60, s-maxage=600, stale-while-revalidate=600
Browser Origin Server
Resource/index.htmlResponse
HTTP/1.1 200 OKCache-Control: max-age=60
Intermediate Cache
(e.g. Varnish)Resource/index.html
RequestGET /index.html HTTP/1.1
RequestGET /index.html HTTP/1.1
Browser
ResponseHTTP/1.1 200 OKCache-Control: max-age=60
Intermediate Cache
(e.g. Varnish)Resource/index.html
RequestGET /index.html HTTP/1.1
ResponseHTTP/1.1 200 OKCache-Control: max-age=60, s-maxage=600, stale-while-revalidate=600
Origin Server
Resource/index.html
Intermediate Cache
(e.g. Varnish)
RequestGET /index.html HTTP/1.1
Spawn asynchronous request process and return stale content for request that triggered it
15 Minutes LaterCache 1 minute in browser, 10 minutes in intermediate cache and allow to deliver stale content for 10 minutes
see https://tools.ietf.org/html/rfc5861
Spreadshirt 24
Deliver stale content on origin problems with stale-if-error
ResponseHTTP/1.1 200 OKCache-Control: max-age=60, s-maxage=600, stale-if-error=600
Browser Origin Server
Resource/index.htmlResponse
HTTP/1.1 200 OKCache-Control: max-age=60
Intermediate Cache
(e.g. Varnish)Resource/index.html
RequestGET /index.html HTTP/1.1
RequestGET /index.html HTTP/1.1
15 Minutes Later
Cache 1 minute in browser, 10 minutes in intermediate cache and allow to deliver stale content on origin error for 10 minutes
see https://tools.ietf.org/html/rfc5861
ResponseHTTP/1.1 503 Service Unavailable
Browser Origin Server
Resource/index.htmlResponse
HTTP/1.1 200 OKCache-Control: max-age=60
Intermediate Cache
(e.g. Varnish)Resource/index.html
RequestGET /index.html HTTP/1.1
RequestGET /index.html HTTP/1.1
Because of stale-if-error config return stale content instead of error
Spreadshirt 25
Different implementations per cache/CDN
• Varnish cache Supports stale-while-revalidate and stale-if-error
• Fastly CDN Uses Varnish Supports stale-while-revalidate and stale-if-error
• AKAMAI CDN Supports similar behavior to stale-while-revalidate via “Cache
Prefreshing” feature, although this is an active refresh Supports similar behavior to stale-if-error via “Force
Revalidation of Stale Objects” configuration to ”Serve stale if unable to validate” value
Spreadshirt 26
Pros & Cons
Pros• Decouple browser requests from actual revalidation with
origin
• Improve response times in general
• Bridge origin outages
• Improve overall resilience
Cons• Might deliver stale content to browser (clients)
• Might not notice errors when they occur
Spreadshirt 27
Purging
Spreadshirt 28
Problem description
• We actually choose a short cache time, because we do not know when modification occurs
• In most cases requested page does not change
• Create useless basic load on our system
• It would be better to inform cache about page changes proactively
RequestGET /regenbogenbaer-A103766209 HTTP/1.1ResponseHTTP/1.1 200 OKCache-Control: max-age=60, s-maxage=600ETag: "6e35-240-2672fbbc"
… prefix ...
… suffix ...
Start
10 Minutes LaterRequestGET /regenbogenbaer-A103766209 HTTP/1.1If-None-Match: "6e35-240-2672fbbc"ResponseHTTP/1.1 304 Not ModifiedCache-Control: max-age=60, s-maxage=600ETag: "6e35-240-2672fbbc"
10 Minutes LaterRequestGET /regenbogenbaer-A103766209 HTTP/1.1If-None-Match: "6e35-240-2672fbbc"ResponseHTTP/1.1 304 Not ModifiedCache-Control: max-age=60, s-maxage=600ETag: "6e35-240-2672fbbc"
…
Short browser cache time
Short intermediate cache time
Spreadshirt 29
Solution: Don’t call me, I call you
Invert expiration mechanism
through replacing pull through push
Spreadshirt 30
RequestGET /regenbogenbaer-A103766209 HTTP/1.1
Tag content and purge individually if required
30
IntermediateCache
(e.g. Varnish with XKEY)
Origin Server
ResponseHTTP/1.1 200 OKCache-Control: max-age=60, s-maxage=86400XKey: a103766209;XKey: articlePage;
RequestGET /regenbogenbaer-A103766209 HTTP/1.1
RequestXKEY / HTTP/1.1XKey-Purge: a103766209;
Browser
ResponseHTTP/1.1 200 OK
Purge content on actual content modification …
RequestGET /regenbogenbaer-A103766209 HTTP/1.1
ResponseHTTP/1.1 200 OKCache-Control: max-age=60
BrowserResponseHTTP/1.1 200 OKCache-Control: max-age=60
Browser
RequestGET /regenbogenbaer-A103766209 HTTP/1.1
RequestGET /regenbogenbaer-A103766209 HTTP/1.1
Long intermediate cache time
Content tag
Purge tag
2 hours later
…
Spreadshirt 31
Different implementations per cache/CDN
• Varnish cache Supports purging on content tags via XKey module and XKey
header Instant purge time
• Fastly CDN Uses Varnish and supports it as well via Surrogate-Key header ~500ms purge time
• AKAMAI CDN Has announced to support content tags via Edge-Content-Tag
header and purging based on that via FastPurge in Q1/2017 ~5sec purge time
31
Spreadshirt 32
Invalidation is better than removal
• Purge usually has two modes invalidation and removal Varnish XKey supports that with purge and softpurge Fastly CDN supports it with purge and softpurge as well AKAMAI CDN’s FastPurge supports removal and invalidation
mode on staging and production environment
• Removal physically removes content from cache (one useful use case is a removal for legal reasons)
• Invalidation sets TTL to 0 and marks content for revalidation
• Invalidation usually preferred solution Invalidation request might lead to overload on origin Cache can still serve stale content even if origin is down
Spreadshirt 33
Pros & Cons
Pros• Cache times can be increased to much higher TTLs
• Cache hit rates improve
• Response times improve as most responses can be served from cache
• Scalability improves as most content can be served from cache and traffic peaks can be handled by cache
• Basic load on service due to continuous “polling” gets reduced
• More control over cache state
Cons• Need to implement scalable purge service
• Complexity might increase
Spreadshirt 34
Caching User Data
Spreadshirt 35
Problem descriptionRequestGET /regenbogenbaer-A103766209 HTTP/1.1
• User-specific data usually makes pages uncacheable
• Workaround is often to use Javascript to include user-specific parts on the client-side
• Problem is that it requires Javascript
• Most user-specific parts would actually be cacheable with a high hit rate
• ESI actually allows to include user-specific parts
Spreadshirt 36
Solution: Make user data cacheable through smart variation
Use Vary header and versioning mechanism to make user-specific data cacheable
Spreadshirt 37
Login and logout creates or removes security session and Cookie
ResponseHTTP/1.1 200 OKSet-Cookie: sprd_auth_token=12345678;
Browser Origin Server
Resource/auth/loginResponse
HTTP/1.1 200 OKSet-Cookie: sprd_auth_token=12345678;
Intermediate Cache
(e.g. Varnish)Resource/auth/login
RequestPOST /auth/login HTTP/1.1
RequestPOST /auth/login HTTP/1.1
ResponseHTTP/1.1 200 OKSet-Cookie: sprd_auth_token=; Expires=Tuesday, 13-Dec-16 09:00:00 GMT
Browser Origin Server
Resource/auth/logoutResponse
HTTP/1.1 200 OKSet-Cookie: sprd_auth_token=; Expires=Tuesday, 13-Dec-16 09:14:57 GMT
Intermediate Cache
(e.g. Varnish)Resource/auth/logout
RequestPOST /auth/logout HTTP/1.1
RequestPOST /auth/logout HTTP/1.1
Login creates Cookie
Logout removes Cookie
Spreadshirt 38
Making login state cacheable with VarnishRequestGET /auth/loginstate HTTP/1.1
ResponseHTTP/1.1 200 OKCache-Control: no-cache, s-maxage=600Vary: CookieXKey: session123;ETag: “123”
Browser Origin Server
Resource/auth/loginstate
ResponseHTTP/1.1 200 OKCache-Control: private, no-cacheETag: “123”
Intermediate Cache
(e.g. Varnish)Resource/auth/loginstate
RequestGET /auth/loginstate HTTP/1.1Cookie: sprd_auth_token=123; ….
RequestGET /auth/loginstate HTTP/1.1Cookie: sprd_auth_token=123;
CachedRepresentation
Cookie: sprd_auth_token=123 Empty Cookie
cookie sprd_auth_token exists?
login state for user
no loginno
yes
Spreadshirt 39
Node.js implementation
router.get('/auth/loginstate.html', function (req, res, next) { res.setHeader('Cache-Control', ’no-cache, max-age=600'); res.setHeader('Vary', 'Cookie'); var session = authService.getSession(req.cookies.sprd_auth_token); if (session) { res.setHeader('XKey', sessionTag(session.sessionId)); res.render('loginstate', session); } else { res.render('nologin'); }});
Spreadshirt 40
// remove cookie for everything but auth contextif (req.url !~ "/auth/") { unset req.http.Cookie;} // filter sprd_auth_tokenelse { if (req.http.Cookie) { set req.http.Cookie = ";" + req.http.Cookie; set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";"); set req.http.Cookie = regsuball(req.http.Cookie, ";(sprd_auth_token)=", "; \1="); set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", ""); set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", ""); if (req.http.Cookie == "") { unset req.http.Cookie; } }}
Varnish VCL configuration
Spreadshirt 41
Caching baskets works in a similar way using versioning
ResponseHTTP/1.1 200 OKSet-Cookie: basket_id=12345678/v1;
Browser Origin Server
Resource/basketResponse
HTTP/1.1 200 OKSet-Cookie: basket_id=12345678/v1;
Intermediate Cache
(e.g. Varnish)Resource/basket
RequestPOST /basket HTTP/1.1
RequestPOST /basket HTTP/1.1
ResponseHTTP/1.1 200 OKSet-Cookie: basket_id=12345678/v2;
Browser Origin Server
Resource/basketResponse
HTTP/1.1 200 OKSet-Cookie: basket_id=12345678/v2;
Intermediate Cache
(e.g. Varnish)Resource/basket
RequestPOST /basket HTTP/1.1Cookie: basket_id=12345678/v1;
RequestPOST /basket HTTP/1.1Cookie: basket_id=12345678/v1;
Create basket
Update basketStart with version 1 on basket creation
Update to next version on basket modification
Spreadshirt 42
Fetch different cached baskets
ResponseHTTP/1.1 200 OKCache-Control: no-cache, s-maxage=600Vary: CookieETag: “12345678/v1”
Browser Origin Server
Resource/basketResponse
HTTP/1.1 200 OKCache-Control: private, no-cacheETag: “12345678/v1”
Intermediate Cache
(e.g. Varnish)Resource/basket
RequestGET /basket HTTP/1.1Cookie: basket_id=12345678/v1; …
RequestGET /basket HTTP/1.1Cookie: basket_id=12345678/v1;
ResponseHTTP/1.1 200 OKCache-Control: no-cache, s-maxage=600Vary: CookieETag: “12345678/v2”
Browser Origin Server
Resource/basketResponse
HTTP/1.1 200 OKCache-Control: private, no-cacheETag: “12345678/v2”
Intermediate Cache
(e.g. Varnish)Resource/basket
RequestGET /basket HTTP/1.1Cookie: basket_id=12345678/v2; …
RequestGET /basket HTTP/1.1Cookie: basket_id=12345678/v2;
Get first version
Get second versionCache basket versionfor 10 minutes in Varnish but not in browser
Cache new basket versionfor 10 minutes in Varnish but not in browser
Spreadshirt 43
Different implementations per cache/CDN
• Varnish Cache Supports Vary header Implementation will require custom VCL
• Fastly CDN Uses Varnish Supports Vary header Implementation will require custom VCL
• AKAMAI CDN Does not support Vary header But allows to configure custom cache id modifications (cid)
Spreadshirt 44
Pros & Cons
Pros• Allows to cache pages that contain user-specific data without
using Javascript
Cons• Might cache data and make it available to the public via CDN
that is confidential
Spreadshirt 45
Conclusion
Spreadshirt 46
Conclusion
• Everything can be made cacheable
• Edge Side Includes (ESI) allow to decompose pages into templates and reusable and cacheable parts
• Stale content delivery allows to improve response time and handle origin outages
• Purging allows to further increase cache times and proactively remove items from cache
• Caching user data allows to cache even pages with user-specific data without using Javascript
Spreadshirt 47
Q&A