Mobile Game Architectures on AWS (MBL201) | AWS re:Invent 2013
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
-
Upload
amazon-web-services -
Category
Technology
-
view
1.330 -
download
2
description
Transcript of (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Chris Munns - @chrismunns
https://secure.flickr.com/photos/psd/4389135567/
https://secure.flickr.com/photos/iangbl/338035861
client
mobile client
CloudFront
region
VPC
Amazon S3
MySQL DB
instance
Web
instances
Elastic Load
BalancingApp
instances
Elastic Load
Balancing
client
mobile client
CloudFront
region
VPC
Amazon S3
MySQL DB
instance
Web
instances
Elastic Load
BalancingApp
instances
Elastic Load
Balancing
https://secure.flickr.com/photos/hk_brian/5753530941
Is this one
important?
https://secure.flickr.com/photos/hk_brian/5753530941
Is this one
important?
What about
this one?https://secure.flickr.com/photos/hk_brian/5753530941
Is this one
important?
Is this one
important?
What about
this one?
Let’s go back to the
beginning
https://secure.flickr.com/photos/paukrus/9826882836
Numerical code Facility
0 kernel messages
1 user-level messages
2 mail system
3 system daemons
4 security/authorization
messages
…
23 local use 7 (local7)
Numerical code Severity
0 Emergency
1 Alert
2 Critical
3 Error
4 Warning
…
7 Debug
<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47
- BOM'su root' failed for lonvick on /dev/pts/8
Easy, right?
https://secure.flickr.com/photos/21734563@N04/2225069096
66.249.64.XXX - - [07/Sep/2014:08:33:43 +0000] "GET / HTTP/1.1"
200 819 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Thanks for the
history lesson,
Chris!
Now what do I do?https://secure.flickr.com/photos/decade_null/142235888
*Each step has several moving pieces
*
https://secure.flickr.com/photos/james_wheeler/9619984584
Apache LogFormat:
173.248.147.XXX - - [16/Sep/2014:15:36:31 +0000] "GET / HTTP/1.1"
200 819 "-"
"Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""
Customize log data
%D = The time taken to serve the request, in microseconds
%T = The time taken to serve the request, in seconds
%v = The canonical ServerName of the server serving the request
%{Foobar}C = The contents of cookie Foobar in the request sent to
the server
%{Foobar}n = The contents of note Foobar from another module
Source: https://httpd.apache.org/docs/2.2/mod/mod_log_config.html
Apache LogFormat:
Apache LogFormat:
64.237.55.3 php-app1 [16/Sep/2014:16:21:31 +0000] "GET / HTTP/1.1"
200 819 23765 "-"
"Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""
"%h %v %t \"%r\" %>s %b %D \"%{Referer}i\" \"%{User-agent}i\""
173.248.147.XXX - - [16/Sep/2014:15:36:31 +0000] "GET / HTTP/1.1"
200 819 "-"
"Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"
Apache vs. Nginx CLF patterns:
'$remote_addr - $remote_user [$time_local] ' '"$request" $status
$body_bytes_sent ' '"$http_referer" "$http_user_agent"'
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""
Either way!!!
Get that data off your
host ASAP!!
https://secure.flickr.com/photos/foresthistory/3663382060
Why?
Instance failure.
Filled disks.
Auto Scaling actions.
https://secure.flickr.com/photos/eurleif/186807023
syslog-ng, rsyslog, nxlog
Pros:
• Open source
– Linux, Windows, and almost everything else!
• Both variants of syslogd
– Add filtering, flexible configuration,
TCP as a transport
• Runs as an OS process
• Typically take the centralized data and
feed into another analytics tool
• Can often accept logs from third-party
sources like network devices
Central
logging
instance
virtual private cloud
App
instances
Etc.
instances
Web
instances
syslog-ng, rsyslog, nxlog
Cons:
• No built-in analytics/dashboard abilities
• Typically centralized host can become
a single point of failure
• Potentially more difficult to scale
– Federate logs to different
centralized hosts?Central
logging
instance
virtual private cloud
App
instances
Etc.
instances
Web
instances
Splunk
Pros:
• Enterprise grade
• Extremely scalable
• Fault tolerance and load balancing
built in
• Security of data built in
• Can technically accept data from
other third-party sources as well
• Full log forwarding, analyzing,
dashboarding stack + third-party apps
Splunk
indexer
virtual private cloud
Splunk
indexer
App
instances
Etc.
instances
Web
instances
Splunk
Cons:
• Enterprise-grade pricing
• Enterprise-grade licensing
• Indexer resources become an
important part of capacity planning
A great option for Enterprises and
large shops!Splunk
indexer
virtual private cloud
Splunk
indexer
App
instances
Etc.
instances
Web
instances
Logstash
Pros:
• Open source
• Extremely scalable
• Fault tolerance built in
• Support offerings from Elasticsearch!
• Active code base and ecosystem
• Pluggable
• Ties in with other tools for
dashboarding/analytics
virtual private cloud
App
instances
Etc.
instances
Web
instances
Redis
Elasticsearch ElasticsearchElasticsearch
Logstash
indexer
Logstash
indexer
Logstash
Cons:
• “ELK Stack” has many moving pieces
• Lot of DIY to getting it set up
• Very quickly changing/improving
technology stack
Most popular open source option today!
virtual private cloud
App
instances
Etc.
instances
Web
instances
Redis
Elasticsearch ElasticsearchElasticsearch
Logstash
indexer
Logstash
indexer
SaaS options
Pros:
• Hosted
• Very easy to get started with
• No concerns about scaling yourself
• Flexible pricing methods
• Support
• Either their agents or syslog to them
• Built-in dashboards/analytics tools
• Constantly adding features/capabilities
virtual private cloud
NAT
instance
App
instances
Etc.
instances
Web
instances
SaaS options
Cons:
• Data leaving your control/infrastructure
• Some restrictiveness in flexibility of the
dashboards, collection agents, archive
limits
SaaS makes a lot of sense if you are small
and trying to move fast and should be
focusing on product first!
virtual private cloud
NAT
instance
App
instances
Etc.
instances
Web
instances
Part of Amazon CloudWatch serviceCloudWatch
# yum install awslogs
# tail -n 7 /etc/awslogs/awslogs.conf
# aws logs put-metric-filter
--log-group-name
--filter-name
--filter-pattern
--metric-transformations
CloudWatch
https://github.com/etsy/logster
[root@php-app1 logster-master]# /usr/bin/logster --dry-run --output=ganglia SampleLogster /var/log/httpd/access_log.../usr/bin/gmetric -d 180 -c /etc/ganglia/gmond.conf --name http_2xx --value 0.533333333333 --type float --units "Responses per sec”...
• Can process log files on the fly
outputting metric data to numerous
services:– CloudWatch
– Ganglia
– Graphite via statsd
– Boundary
– DataDog
– many others!
• Runs as a constantly running
daemon
• Little bit easier than Logster
• Can do metric output and full log
centralization at the same time!
input {
file {
path => "/var/log/apache/access.log"
type => "apache-access” }
}
filter {
grok {
type => "apache-access"
pattern => "%{COMBINEDAPACHELOG}” }
}
output {
statsd {
# Count one hit every event by response
increment => "apache.response.%{response}” }
} (from: http://logstash.net/docs/1.4.2/tutorials/metrics-from-logs)
Logstash
Dashboards
https://secure.flickr.com/photos/joeross/6544781203
Dashboard for Logstash
Each of these examples
took less than an hour
to set up!
Focus first on what affects your customers:
Then on important technical issues:
Alarming:
Send alarms with:
Log backup
& archiving
https://secure.flickr.com/photos/ant-ti/6016877003
How to do it right:
1. Get data into Amazon S3
2. Get data into Amazon Glacier
Amazon S3 Amazon
Glacier
Amazon S3 Amazon
Glacier
Sounds easy!Amazon S3 Amazon
Glacier
"MyLoggingBucket": {
"Type": "AWS::S3::Bucket",
"Properties": {
"BucketName": "MyLoggingBucket"
"LifecycleConfiguration": {
"Rules": [
{
"Id": "GlacierRule"
"Prefix": "logs",
"Status": "Enabled",
"ExpirationInDays": "365",
"Transition": {
"TransitionInDays": ”30",
"StorageClass": "Glacier"
}
}
]
}
}
}
Given the importance of log data, securing them
properly is also important:IAM
Don’t do this by hand! Make
use of tools:
Build basic log centralization
into every AMI!
directory "/opt/aws/cloudwatch" do
recursive true
end
remote_file "/opt/aws/cloudwatch/awslogs-agent-
setup.py" do
source "https://s3.amazonaws.com/aws-cloudwatch/
downloads/latest/awslogs-agent-setup.py"
mode "0755"
end
execute "Install CloudWatch Logs agent" do
command "/opt/aws/cloudwatch/awslogs-agent-
setup.py -n -r us-west-2 -c /etc/cwlogs.cfg"
not_if { system "pgrep -f aws-logs-agent-setup" }
end
https://secure.flickr.com/photos/pfly/1537122018
In Closing
Logs ARE important!
https://secure.flickr.com/photos/ocarchives/5333790414
Logs are fun!
Spend the time to do
log analysis right!
https://secure.flickr.com/photos/dullhunk/202872717/
Please give us your feedback on this session.
Complete session evaluations and earn re:Invent swag.
http://bit.ly/awsevals