Post on 17-May-2015
description
Mike Willbanks | Barnes & Noble
Varnish Cache
Housekeeping… • Talk
– Slides will be posted after the talk. • Me
– Sr. Web Architect Manager at NOOK Developer
– Prior MNPHP Organizer – Open Source Contributor – Where you can find me:
• Twitter: mwillbanks G+: Mike Willbanks • IRC (freenode): mwillbanks Blog:
http://blog.digitalstruct.com • GitHub: https://github.com/mwillbanks
Agenda • Varnish? • The Good : Getting Started • The Awesome : General Usage • The Crazy : Advanced Usage • Gotchas
WHAT IS VARNISH?
Official Statement What it does General use case
Official Statement “Varnish is a web application accelerator. You install it in front of your web application and it will speed it
up significantly.”
You can cache… Both dynamic and static files and contents.
A Scenario • System Status Server
– Mobile apps check current status. – If the system is down do we communicate? – If there are problems do we communicate? – The apps and mobile site rely on an API
• Trouble in paradise? Few and far in between.
The Graph - AWS
0
10000
20000
30000
40000
50000
60000
70000
80000
Small X-Large Small Varnish
Requests
Requests
0 50
100 150 200 250 300 350 400 450 500
Small X-Large Small Varnish
Time
Time
0
2
4
6
8
10
12
14
Small X-Large Small Varnish
Peak Load
Peak Load
0
100
200
300
400
500
600
700
Small X-Large Small Varnish
Req/s
Req/s
The Raw Data Small X-‐Large Small Varnish
Concurrency 10 150 150 Requests 5000 55558 75000 Time 438 347 36 Req/s 11.42 58 585 Peak Load 11.91 8.44 0.35
Comments 19,442 failed requests
Traditional LAMP Stack
HTTP Server Cluster
Database
Load Balancer
LAMP + Varnish * Varnish can act as a load balancer.
HTTP Server Cluster
Database
Varnish Cache
Load Balancer
Cache Hit
Yes
No
THE GOOD – JUMP START
Installation General Information Default VCL
Installation rpm --nosignature -i http://repo.varnish-cache.org/redhat/varnish-3.0/el5/noarch/varnish-release-3.0-1.noarch.rpm yum install varnish
curl http://repo.varnish-cache.org/debian/GPG-key.txt | sudo apt-key add - echo "deb http://repo.varnish-cache.org/ubuntu/ lucid varnish-3.0" | sudo tee -a /etc/apt/sources.list sudo apt-get update sudo apt-get install varnish
git clone git://git.varnish-cache.org/varnish-cache cd varnish-cache sh autogen.sh ./configure make && make install
Varnish Daemon • varnishd
– -a address[:port] listen for client – -b address[:port] backend requests – -T address[:port] administration http – -s type[,options] storage type (malloc, file,
persistence) – -P /path/to/file PID file – Many others; these are generally the most
important. Generally the defaults will do with just modification of the default VCL (more on it later).
• varnishd -a :80 \ -T localhost:6082 \ -f /path/to/default.vcl \ -s malloc,512mb
• Web server to listen on port 8080
General Configuration
Setup a backend! backend default {
.host = “127.0.0.1”
.port = “8080” }
So what’s actually caching? • Any requests containing
– GET / HEAD – TTL > 0
• What cause it to miss? – Cookies – Authentication Headers – Vary “*” – Cache-control: private
Request
Response
vcl_recv
req.
vcl_hash
vcl_hit
vcl_miss
vcl_fetch
vcl_deliver
req.
req.obj.
resp.
req.bereq.beresp.
req.bereq.
vcl_pipe
req.bereq.
vcl_pass
req.bereq.
HTTP Caching • RFC 2616 HTTP/1.1 Headers
– Expiration • Cache-Control • Expires
– Validation • Last Modified • If-Modified-Since • ETag • If-None-Match
TTL Priority • VCL
– beresp.ttl • Headers
– Cache-control: s-max-age – Cache-control: max-age – Expires – Validation
Use Wordpress? backend default {
.host = "127.0.0.1“; .port = "8080"; }
sub vcl_recv { if (!(req.url ~ "wp-(login|admin)")) { unset req.http.cookie; } }
sub vcl_fetch { if (!(req.url ~ "wp-(login|admin)")) { unset beresp.http.set-cookie; } }
THE AWESOME – VCL, DIRECTORS AND MORE
VCL Directors Verifying VCL
Varnish Configuration Language
• VCL State Engine – Each Request is Processed Separately &
Independently – States are Isolated but are Related – Return statements exit one state and start another – VCL defaults are ALWAYS appended below your own
VCL • VCL can be complex, but…
– Two main subroutines; vcl_recv and vcl_fetch – Common actions: pass, hit_for_pass, lookup, pipe,
deliver – Common variables: req, beresp and obj – More subroutines, functions and complexity can arise
dependent on condition.
Request
Response
vcl_recv
req.
vcl_hash
vcl_hit
vcl_miss
vcl_fetch
vcl_deliver
req.
req.obj.
resp.
req.bereq.beresp.
req.bereq.
vcl_pipe
req.bereq.
vcl_pass
req.bereq.
VCL - Process VCL Process Description vcl_init Startup routine (VCL loaded, VMOD init) vcl_recv Beginning of request, req is in scope vcl_pipe Client & backend data passed unaltered vcl_pass Request goes to backend and not cached vcl_hash Creates cache hash, call hash_data for custom hashes vcl_hit Called when hash found in cache vcl_miss Called when hash not found in cache vcl_fetch Called to fetch data from backend vcl_deliver Called prior to delivery of response (excluding pipe) vcl_error Called when an error occurs vcl_fini Shutdown routine (VCL unload, VMOD cleanup)
VCL – Variables • Always Available
– now – epoch time • Backend Declarations
– .host – hostname / IP – .port – port number
• Request Processing – client – ip & identity – server – ip & port – req – request information
• Backend – bereq – backend request – beresp – backend response
• Cached Object – obj – Cached object, can
only change .ttl • Response
– resp – response information
VCL - Functions VCL Function Description hash_data(string) Adds a string to the hash input regsub(string, regex, sub) Substitution on first occurrence regsuball(string, regex, sub) Substitution on all occurrences ban(expression) Ban all items that match expression ban(regex) Ban all items that match regular expression
DEFAULT VCL Walking through the noteworthy items.
Request
Response
vcl_recv
req.
vcl_hash
vcl_hit
vcl_miss
vcl_fetch
vcl_deliver
req.
req.obj.
resp.
req.bereq.beresp.
req.bereq.
vcl_pipe
req.bereq.
vcl_pass
req.bereq.
vcl_recv • Received Request • Only GET & HEAD by default
– Safest way to cache! • Will use HTTP cache headers. • Cookies or Authentication Headers will
bust out of the cache.
vcl_hash • Hash is what we look for in the cache. • Default is URL + Host
– Server IP used if host header was not set; in a load balanced environment ensure you set this header!
vcl_fetch • Fetch retrieves the response from the
backend. • No Cache if…
– TTL is not set or not greater than 0. – Vary headers exist. – Hit-For-Pass means we will cache a pass
through.
GENERAL ADJUSTMENTS Common adjustments to make.
Cache Static Content No reason that static content should not be cached.
Remove GA Cookies GA cookies will cause a miss; remove them prior to going to the backend.
Allow Purging Only allow from localhost or trusted server network.
DIRECTORS Leveraging backend servers
Directors – The Types Director Type Description Random Picks based on random and weight. Client Picks based on client identity. Hash Picks based on hash value. Round Robin Goes in order and starts over DNS Picks based on incoming DNS host,
random OR round robin. Fallback Picks the first “healthy” server.
Director - Probing • Backend Probing • Variables
– .url – .request – .window – .threshold – .intial – .expected_response – .interval – .timeout
Load Balancing Implementing a simple varnish load balancer. Varnish does not handle SSL termination.
Grace Mode Request already pending for update; serve grace content. Backend is unhealthy. Probes as seen earlier must be implemented.
Saint Mode Backend may be sick for a particular piece of content Saint mode makes sure that the backend will not request the object again for a specific period of time.
Purging • The various ways of purging
– varnishadm – command line utility – Sockets (port 6082) – HTTP – now that is the sexiness
Purging Examples varnishadm -T 127.0.0.1:6082 purge req.url == "/foo/bar“
telnet localhost 6082
purge req.url == "/foo/bar
telnet localhost 80
Response:
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
PURGE /foo/bar HTTP/1.0
Host: bacon.org
curl –X PURGE http://bacon.org/foo/bar
Distributed Purging • curl multi-request (in php) • Use a message queue
– Use workers to do the leg work for you
• You will need to store a list of servers “somewhere”
Logging • Many times people want to log the
requests to a file – By default Varnish only stores these in
shared memory. – Apache Style Logs
• varnishncsa –D –a –w log.txt – This will run as a daemon to log all of your
requests on a separate thread.
Logging Apache style logging using: varnishncsa -O -a -w log.txt
VERIFY YOUR VCL
You likely want to ensure that your cache is: 1. Working Properly 2. Caching Effectively
What is Varnish doing… Varnishtop will show you real time information on your system. • Use -i to filter on specific tags. • Use -x to exclude specific tags.
Checking Statistics… Varnishstat will give you statistics you need to know how you’re doing.
THE CRAZY
ESI – Edge-Side Includes Varnish Administration VMOD
ESI – Edge Side Includes • ESI is a small markup language much like
SSI (server side includes) to include fragments (or dynamic content for that matter).
• Think of it as replacing regions inside of a page as if you were using XHR (AJAX) but single threaded.
• Three Statements can be utilized. – esi:include – Include a page – esi:remove – Remove content – <!-- esi --> - ESI disabled, execute normally
ESI Diagram
Page Content
<esi:include src="header.php" />
Backend
Varnish
Varnish detects ESI, requests from backend OR checks cached state.
Using ESI • In vcl_fetch, you must set ESI to be on
– set beresp.do_esi = true; – Varnish refuses to parse content for ESI if
it does not look like XML • This is by default; so check varnishstat and
varnishlog to ensure that it is functioning like normal.
ESI Usage <html> <head><title>Rock it with ESI</title></head>
<body>
<header>
<esi:include src=”header.php" />
</header>
<section id="main">...</section>
<footer></footer>
</body>
</html>
Embedding C in VCL • Before getting into VMOD; did you know
you can embed C into the VCL for varnish?
• Want to do something crazy fast or leverage a C library for pre or post processing?
• I know… you’re thinking that’s useless.. – On to the example; and a good one from
the Varnish WIKI!
Embedded C for syslog C{ #include <syslog.h>
}C
sub vcl_something {
C{
syslog(LOG_INFO, "Something happened at VCL line XX.");
}C
}
# Example with using varnish variables
C{
syslog(LOG_ERR, "Spurious response from backend: xid %s request %s %s \"%s\" %d \"%s\" \"%s\"", VRT_r_req_xid(sp), VRT_r_req_request(sp), VRT_GetHdr(sp, HDR_REQ, "\005host:"), VRT_r_req_url(sp), VRT_r_obj_status(sp), VRT_r_obj_response(sp), VRT_GetHdr(sp, HDR_OBJ, "\011Location:"));
}C
Varnish Modules / Extensions • Taking VCL embedded C to the next
level • Allows you to extend varnish and create
new functions • You could link to libraries to provide
additional functionality
VMOD - std • toupper • tolower • set_up_tos • random • log
• syslog • fileread • duration • integer • collect
ADMINISTERING VARNISH
Management Console Cache Warm up
Management Console • varnishadm –T localhost:6062
– vcl.list – see all loaded configuration – vcl.load – load new configuration – vcl.use – select configuration to use – vcl.discard – remove configuration
Cache Warmup • Need to warm up your cache before
putting a sever in the queue or load test an environment? – varnishreplay –r log.txt
GOTCHAS
Having Keep-Alive off No SSL Termination No persistent cache ESI multiple fragments Cookies*
QUESTIONS?
These slides will be posted to SlideShare & SpeakerDeck. SpeakerDeck: http://speakerdeck.com/u/mwillbanks Slideshare: http://www.slideshare.net/mwillbanks Twitter: mwillbanks G+: Mike Willbanks IRC (freenode): mwillbanks Blog: http://blog.digitalstruct.com GitHub: https://github.com/mwillbanks