Take My Logs. Please!

63
Take my logs. Please. Mike Brittain Director of Engineering, Infrastructure Etsy.com [email protected] @mikebrittain

description

Details on how we capture application data in our access and error logs, as well as how to generate quick reports and graphs from these logs. This talk was presented at O'Reilly's Velocity Online Conference on October 26, 2011.

Transcript of Take My Logs. Please!

Page 1: Take My Logs. Please!

Take my logs. Please.

Mike BrittainDirector of Engineering, InfrastructureEtsy.com

[email protected] @mikebrittain

Page 2: Take My Logs. Please!

(hello?)

Page 3: Take My Logs. Please!

This sounds boooooorrrrring...No, no... hang in there!

Page 4: Take My Logs. Please!
Page 5: Take My Logs. Please!

25 MM uniques/month150 Countries$300 MM+ sales last year

Page 6: Take My Logs. Please!

Apache, PHP, MySQL, PostgreSQL,Memcache, Gearman,Solr, etc.

Page 7: Take My Logs. Please!

What’s working?

Page 8: Take My Logs. Please!

What’s working?Performance

Page 9: Take My Logs. Please!

What’s working?PerformanceOperability

Page 10: Take My Logs. Please!

What’s working?PerformanceOperabilitySimplicity

Page 11: Take My Logs. Please!

Logging + Trending

Page 12: Take My Logs. Please!

App logging(Apache access and error logs)

Page 13: Take My Logs. Please!

LogFormat "%h %l %u %t

\"%r\" %>s %b

“Common”

Page 14: Take My Logs. Please!

LogFormat "%h %l %u %t

\"%r\" %>s %b

\"%{Referer}i\"

\"%{User-agent}i\""

“Combined”

Page 15: Take My Logs. Please!

mod_log_config%f Filename requested

%k # of keepalive requests served on this connection

%T Time taken to serve the request, in seconds

Page 16: Take My Logs. Please!

%f Filename requested

%k # of keepalive requests served on this connection

%D Time taken to serve the request, in microseconds

mod_log_config

Page 17: Take My Logs. Please!

%f Filename requested

%k # of keepalive requests served on this connection

%D Time taken to serve the request, in microseconds

%{foobar}n Contents of “note” foobar from another module

mod_log_config

Page 18: Take My Logs. Please!

apache_note(“foobar”, $whatever);

apache_note()

Page 19: Take My Logs. Please!

LogFormat %{True-Client-IP}i %l %t \"%r\"

%>s %b \"%{Referer}i\"

\"%{User-Agent}i\" %V

%{user_id}n %{shop_id}n %{uaid}n

%{ab_selections}n %{request_uid}n

%{api_consumer_key}n

%{api_method_name}n

%{php_bytes}n %{php_microsec}n %D

“Steroids”

Page 20: Take My Logs. Please!

LogFormat %{True-Client-IP}i %l %t \"%r\"

%>s %b \"%{Referer}i\"

\"%{User-Agent}i\" %V

%{user_id}n %{shop_id}n %{uaid}n

%{ab_selections}n %{request_uid}n

%{api_consumer_key}n

%{api_method_name}n

%{php_bytes}n %{php_microsec}n %D

“Steroids”

Page 21: Take My Logs. Please!

$GLOBALS['timer'] = microtime(true) * 1000000;

Page 22: Take My Logs. Please!

$GLOBALS['timer'] = microtime(true) * 1000000;

register_shutdown_function('pageStats');

function pageStats() {

}

Page 23: Take My Logs. Please!

$GLOBALS['timer'] = microtime(true) * 1000000;

register_shutdown_function('pageStats');

function pageStats() {

$timer_end = microtime(true) * 1000000;

$diff = $timer_end - $GLOBALS['timer'];

}

Page 24: Take My Logs. Please!

$GLOBALS['timer'] = microtime(true) * 1000000;

register_shutdown_function('pageStats');

function pageStats() {

$timer_end = microtime(true) * 1000000;

$diff = $timer_end - $GLOBALS['timer'];

apache_note('php_microsec', $diff);

apache_note('php_bytes',

memory_get_peak_usage());

}

Page 25: Take My Logs. Please!

What about “%D”?

Page 26: Take My Logs. Please!

LogFormat %{True-Client-IP}i %l %t \"%r\"

%>s %b \"%{Referer}i\"

\"%{User-Agent}i\" %V

%{user_id}n %{shop_id}n %{uaid}n

%{ab_selections}n %{request_uid}n

%{api_consumer_key}n

%{api_method_name}n

%{php_bytes}n %{php_microsec}n %D

“Steroids”

Page 27: Take My Logs. Please!

LogFormat %{True-Client-IP}i %l %t \"%r\"

%>s %b \"%{Referer}i\"

\"%{User-Agent}i\" %V

%{user_id}n %{shop_id}n %{uaid}n

%{ab_selections}n %{request_uid}n

%{api_consumer_key}n

%{api_method_name}n

%{php_bytes}n %{php_microsec}n %D

“Steroids”

Page 28: Take My Logs. Please!

LogFormat %{True-Client-IP}i %l %t \"%r\"

%>s %b \"%{Referer}i\"

\"%{User-Agent}i\" %V

%{user_id}n %{shop_id}n %{uaid}n

%{ab_selections}n %{request_uid}n

%{api_consumer_key}n

%{api_method_name}n

%{php_bytes}n %{php_microsec}n %D

“Steroids”

Page 29: Take My Logs. Please!

LogFormat %{True-Client-IP}i %l %t \"%r\"

%>s %b \"%{Referer}i\"

\"%{User-Agent}i\" %V

%{user_id}n %{shop_id}n %{uaid}n

%{ab_selections}n ...

easy_reg=1; personalize_widget=0;

icon_in_cornflower_blue=1;

“Steroids”

Page 30: Take My Logs. Please!

Coming soon...%{locale}n (i18n)

%{platform}n (desktop vs. mobile)

Page 31: Take My Logs. Please!

%{locale}n (i18n)

%{platform}n (desktop vs. mobile)

OPS-1805, OPS-1827

etsy.com/careers

Coming soon...

Page 32: Take My Logs. Please!

Using something else?time, http method, request uri, response code, referer, user-agent, response time, response memory, custom segmentation fields...

Page 33: Take My Logs. Please!

Quick averagesgrep "GET /listing/" access.log | \

awk '{sum=sum+$(NF-1)} END {print sum/NR}'

Page 34: Take My Logs. Please!

Quick graphsgrep "GET /listing/" access.log | \

perl -pe "s/.*\[.*\d{4}:(\d{2}):(\d{2}):\d{2}.*\]/\1:\2/" | \

awk '{print $1, $(NF-1)}' > /tmp/pagetimes.dat

gives you...

Page 35: Take My Logs. Please!

Quick graphs# /tmp/pagetimes.dat

18:37 251.018:38 252.118:39 253.518:40 251.018:45 250.0

and then...

Page 36: Take My Logs. Please!

Quick graphs# GNUPLOT

set terminal png

set output 'listings.png'

set yrange [0:2000]

set xdata time

set timefmt "%d/%B/%Y:%H:%M:%S"

set format x "%H:%M"

plot '/tmp/pagetimes.dat' using 1:2 with points

Page 37: Take My Logs. Please!

Quick graphs

Page 38: Take My Logs. Please!

Error logsPHP + Apache errors in one fileSimple logging interface

Page 39: Take My Logs. Please!

Error logsLevels: error, info, debugNamespace: perf, sql, __class__

Page 40: Take My Logs. Please!

Logger::error("Query exceeded 5 sec: $query",

“sql_long_query”);

Page 41: Take My Logs. Please!

web0054 [Fri Mar 04 16:27:48 2011] [error]

[sql_long_query] [mk04gw1p71] Query exceeded

5 sec: SELECT * FROM ...

Page 42: Take My Logs. Please!

web0054 [Fri Mar 04 16:27:48 2011] [error]

[sql_long_query] [mk04gw1p71] Query exceeded

5 sec: SELECT * FROM ...

Page 43: Take My Logs. Please!

$ grep "16:27:48" access.log | wc -l

1527

Page 44: Take My Logs. Please!

web0054 [Fri Mar 04 16:27:48 2011] [error]

[sql_long_query] [mk04gw1p71] Query exceeded

5 sec: SELECT * FROM ...

Page 45: Take My Logs. Please!

iowerror.log -> request_uid -> access.log

request uri, ab selections, user id, locale, platform, api key, etc.

Page 46: Take My Logs. Please!

Filteringtail -f error.log | grep -v “sql_long_query” | ...

Page 47: Take My Logs. Please!

web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda.web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue.web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling.web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue.web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is fallingweb0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.

Page 48: Take My Logs. Please!

Trendingfatals errors warnings

Page 49: Take My Logs. Please!

LogsterRun by cronMaintains a cursor on log filesSimple parsing & aggregationOutput to Ganglia or Graphite

github.com/etsy

Page 50: Take My Logs. Please!

web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed.

Reason: wrong password for ...

Page 51: Take My Logs. Please!

^.+ \[.+\] \[(?P<log_level>.+)\]

Page 52: Take My Logs. Please!

if (fields['log_level'] == “fatal”): self.fatals += 1

elif (fields['log_level'] == “error”): self.errors += 1

elif (fields['log_level'] == “warning”): self.warnings += 1

...

Page 53: Take My Logs. Please!

MetricObject("fatals", (self.fatals / self.duration), "per sec")

MetricObject("errors", (self.errors / self.duration), "per sec")

MetricObject("warning", (self.warnings / self.duration), "per sec")

Page 54: Take My Logs. Please!

fatals errors warnings

Page 55: Take My Logs. Please!

Logster

Signed-in vs. Signed-out

Page 56: Take My Logs. Please!

github.com/etsy

Page 57: Take My Logs. Please!

Log a plethora of data.Don’t be afraid to use one file.

Page 58: Take My Logs. Please!

Use custom fields to segment data.

Page 59: Take My Logs. Please!

Correlate errors to specific requests.

Page 60: Take My Logs. Please!

Make f#@k!ng graphs.

Page 61: Take My Logs. Please!

Convert rates to trend lines.

Page 62: Take My Logs. Please!

Take my logs. Please!

Page 63: Take My Logs. Please!

Mike BrittainDirector of Engineering, InfrastructureEtsy.com

[email protected] @mikebrittain

codeascraft.etsy.comgithub.com/etsy

Thank you.