Post on 21-Jan-2018
Advancing the Elastic Stack -It’s more than just log aggregation!
Introduction
Mike ClarkeDevOps Engineer/SA
Mike KeithSenior Software Engineer
Agenda● Project/Problem Overview
○ Our environment and problem we were solving○ Initially to solve distributed log problem
● Elastic Stack Overview● Kibana and ElasticSearch Demo
Architecture Overview● Our Environment
○ Multiple Geographical Regions/Zones○ Ingest processing application○ Webservice application
■ Our webservice application logs tell us a lot about what is going on with customers sending us information.
○ Access logs for JBOSS○ Data archive application
JBossWebservice
JBossWebservice
JBossWebservice
JBossWebservice
JBossUI
JBossUI
JBossUI
JBossUI
Architecture Overview
RDBMS
NoSQL DB
Project / Problem● Log aggregation is hard● No historical reference, as logs age off● Obtaining stats was painful
○ Realistically when all your service stats are in your logs what do you do?● Cluster SSH only helps so much
Obtaining stats was painful ?!?!?!
cat log | grep "someword" | awk '{print $8}' | paste -sd+ | bc
host@me$: cat log | grep "someword" | awk '{print $8}' | paste -sd+ | bc
5234
host@me$: cat log | grep "someword" | awk '{print $8}' | paste -sd+ | bc...………
host@me$: cat log | grep "someword" | awk '{print $8}' | paste -sd+ | bc...………
host@me$: cat log | grep "someword" | awk '{print $8}' | paste -sd+ | bc
20host@me$: cat log | grep "someword" | awk '{print $8}' | paste -sd+ | bc1240
host@me$: cat log | grep "someword" | awk '{print $8}' | paste -sd+ | bc
650
Technical Overview● For the most part restricted to FOSS products● Needed to be easily obtainable● Available options
○ GrayLog○ Grafana○ Airbrake○ Splunk○ Elastic Stack
Elastic Stack (formerly ELK) Overview
Elasticsearch - Distributed, RESTful search and analytics engine
Logstash - Server-side data processing pipeline
Kibana - Powerful visualization UI
Beats - Single-purpose, lightweight data shippers
X-Pack - Powerful features which enhance the Elastic Stack
Elastic Stack (formerly ELK) Overview
Initial Solution - Log Aggregation● Single node servers● Installed Elastic Stack and began shipping all application server logs to a
centralized server.● Near Realtime● Raw log message transitioned into a fielded log message● Grok parsing (text pattern matching)● Filters etc.
Elasticsearch
Logstash
Filebeat
Filebeat
Filebeat
Filebeat
KibanaFilebeat
Filebeat
Filebeat
Filebeat
Architecture Overview
Filebeatfilebeat.prospectors:
- input_type: log
paths:- /data/logs/apache/*.log
fields:type: apache
fields_under_root: true
#----------------------------- Logstash output --------------------------------output.logstash:
hosts: ["localhost:5443"]bulk_max_size: 1024
Logstash - Input & Outputinput {
beats {port => 5443ssl => truessl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"ssl_key => "/etc/pki/tls/private/logstash-forwarder.key"
}}
output {elasticsearch {
hosts => ["localhost:9200"]index => "%{[@metadata][beat]}-%{[@metadata][type]}-%{+YYYY.MM.dd}"document_type => "%{type}"user => "elastic"password => "*******"
}}
Kibana - Discover
Kibana - Discover
Logstash - Filters
filter {grok {
match => { "message" => "%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:time}\] \"%{WORD:method} %{DATA:url} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code:int} %{NUMBER:bytes:int} "}
}mutate {
add_field => { "read_timestamp" => "%{@timestamp}" }}date {
match => [ "time", "dd/MMM/YYYY:H:m:s Z" ]remove_field => "time"
}}
Kibana
● We change from looking at who is talking to us, to what they are talking to us about.
○ We kept adding more to our logs just so we could see it in Kibana.○ Our data was already in Avro format, which made it easy to convert to JSON ○ Then we used the JSON Codec for logstash to input directly into elasticsearch.
● Considered Accumulo○ But there was just too much we had to build to get it to a usable state.
Evolution of the solution
Kibana Twitter Demo● Let’s take a look at some interesting things you can see in kibana● Counting very easily across different fields in your data (makes aggregating
and histograms very easy)● Data changes over time, sometimes you need to go back and update
something you already stored?○ State changes or updates of some kind to the original document.
Twitter Data DemoBasic twitter JSON:
{ screen_name, text, retweeted_status.user.screen_name, retweeted_status.retweet_count, retweeted_status.text, ... }
{ screen_name, text, retweeted_status.user.screen_name, retweeted_status.retweet_count, retweeted_status.text, ... }
Data Storage Elastic Stack Architecture
ElasticsearchData Node 1
Logstash Node 1
Kibana
Filebeat
Filebeat
Logstash Node 4
ElasticsearchData Node 20
... ElasticsearchClient Node
ElasticsearchMaster Node 1
ElasticsearchMaster Node 2
... ...
Conclusion & Takeaways● Low Barrier to Entry● Quickly Search Across Data● Horizontally Scalable● Easily Visualize Data
About Clarity Business Solutions● We are a team of Software and System Engineers● Customer focused and mission driven● For more about us, please visit: www.claritybizsol.com
● Follow us:
@claritybizsol