logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current...

12
logentries.com 1

Transcript of logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current...

Page 2: logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current preferred stack for do-it-yourself (DIY) logging. It is generally thought to be composed

logentries.com

2

Table of ContentsIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

What is the ELK Stack? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

The ELKeBMWS Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Beats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Marvel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Watcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Shield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Cluster/Tribe Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

The High Cost Of Low Cost Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Hardware Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Scaling Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Cloud Hosting Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Data Storage Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Resource Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Page 3: logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current preferred stack for do-it-yourself (DIY) logging. It is generally thought to be composed

logentries.com

3

About the author, David PosinDavid has been involved in the Information Technology Industry for 2 decades. Fifteen years of that time was spent consulting with many companies in a wide range of industries to build solid technology stacks and robust application architectures. David has watched the Cloud and the World Wide Web grow from their infancy, and now spends every day fully entrenched in those worlds. Currently, David builds high-performance web applications and offers professional technical writing services.

About LogentriesLogentries is a leading SaaS-based log management tool used for real-time log centralization, search and analysis. DevOps, Security & IT professionals use Logentries to manage both logs and unstructured machine data for immediate visibility into their IT environments. Logentries makes it easy to get insights from your log data without building, maintaining or supporting your own log management stack.

Page 4: logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current preferred stack for do-it-yourself (DIY) logging. It is generally thought to be composed

logentries.com

4

Introduction

The ELK Stack is the current preferred stack for do-it-yourself (DIY) logging. It is generally

thought to be composed of three software packages: Elasticsearch, Logstash, and Kibana. The

truth is that a successful ELK Stack implementation requires a great deal more than those three

technologies. Even with the best of community support, DIY logging with the ELK Stack will have

surprises and unexpected costs. This paper will point out some of the less well understood

requirements of a robust DIY ELK Stack.

Page 5: logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current preferred stack for do-it-yourself (DIY) logging. It is generally thought to be composed

logentries.com

5

The ELK Stack, also called the Elastic Stack, starts with a combination of three separate technologies configured to work together. Each piece in the ELK Stack handles one part of the general logging equation:

• Elasticsearch - Data storage and searching

• Logstash - Gathering and formatting

• Kibana - Reporting and analyzing

These three technologies are a good start but do not encompass the full services required for a robust production ready logging stack. There are additional technologies needed to maintain the health and security of the stack, as well as mechanisms to collect and disseminate information.

This white paper focuses on the production environment. It explores the costs and requirements for a reliable, robust, and scalable Stack. Outside of production, Elasticsearch, Kibana, and Logstash are capable of being run on the same machine. While that is true for a development environment, running a production grade stack on only one server is not advisable.

What is the ELK Stack?

Adding Context

Page 6: logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current preferred stack for do-it-yourself (DIY) logging. It is generally thought to be composed

logentries.com

6

Production environments need to be reliable and fault tolerant. Elasticsearch, Logstash, and Kibana will need to be supported by other packages. They will need monitoring and redundancy like all software packages used in production.

Beats Having Logstash run decentralized with installations on separate machines may not be ideal. In an enterprise network, it might be preferable to have a central point to process and filter log data. It is possible to install Logstash on one server and have data shipped to it.

To centralize log information in this way requires software called, Beats, on every machine being logged. Beats defines and controls the process of sending data from different log types to Logstash. Some example Beats are Packetbeat, Filebeat, and Winlogbeat. All of these are designed to ship the specific log type they are familiar with.

Marvel Like all services in a production environment, the Elastic Stack services need to be monitored. This responsibility is accomplished with a software package called Marvel. Marvel is designed to monitor and report the health of all of your Elastic Stack components. The importance of Marvel only increases as the Stack grows. Clusters and Tribes (discussed below) can mean there are lots of independent components that all need to be monitored.

Watcher One of the core responsibilities of any logging solution is to make people aware of critical events. The Elastic Stack has a tool called Watcher to provide this essential function. Watcher observes incoming log entries and sends notifications when certain events occur. Kibana can report on the event and Logstash can disperse it; for your support staff to be notified immediately of problems before they grow requires Watcher. Notifications can be sent via email and through other services based on the configuration.

Shield Security is always going to be a consideration when installing a service. Shield was created to meet this need for the Elastic Stack and to centralize security amongst the different Stack components. It is recommended to use this product over non-Elastic security methods. Nginx is sometimes suggested to help limit access but as this updated blog post, “Restricting Users for Kibana with Filtered Aliases” shows, using technology outside the Elastic stack can have unexpected consequences.

The ELKeBMWS Stack

Page 7: logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current preferred stack for do-it-yourself (DIY) logging. It is generally thought to be composed

logentries.com

7

Another important security consideration is the use of HTTP communication by default. This should be changed when moving to production. Updating the stack for HTTPS can be done by using Shield. It will require some additional configuration and the appropriate certificates.

Cluster/Tribe Nodes Finally, there are scaling issues to plan for. Elasticsearch is built using clusters to help handle and distribute Elasticsearch queries around the network. Clusters are comprised of master and data nodes, and potentially, clients. Clusters can fill up with data over time, and it will be necessary to scale. As the Elasticsearch documentation states on the “Scale is Not Infinite” page,

“ Most scaling problems can be solved by adding more nodes [servers].

It’s important to prepare for adding nodes (servers) to your network as the Elasticsearch index grows.

Eventually, even clusters won’t be sufficient to store all the data Elasticsearch encompasses.

Every Elasticsearch node and/or client (master nodes, data nodes, and clients) stores information about an Elasticsearch cluster for proper routing called the cluster state.

Eventually, the cluster state will be large enough to slow down performance. When that occurs, it will be time to introduce Tribe Nodes to the network. Tribe nodes allow searching across Elasticsearch clusters.

Installing a robust and scalable production-ready Elastic Stack is more than Elasticsearch, Logstash, and Kibana. A full accounting of the services required are:

• Elasticsearch

• Logstash

• Kibana (with an Elasticsearch client)

• Beats (per server and data-type being logged)

• Marvel

• Watcher

• Shield

• Clusters

• Tribe nodes (not initially, but eventually over time)

A well put together Elastic Stack will require all of these pieces before it can come close to the full functionality provided by a SaaS Logging service (like Logentries).

The ELK Stack is really the ELKeBMWSC(Tn) Stack.

Page 8: logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current preferred stack for do-it-yourself (DIY) logging. It is generally thought to be composed

logentries.com

8

Logstash, Kibana, and Elastic are free open source solutions. There is no cost to using the software in a self-hosted environment. Being open source is one of the biggest attractions of the Elastic Stack. Free is a compelling price point. Although the software is free, running it is not. There are several costs to be aware of.

The High Cost Of Free Solutions

Hardware Costs The Elastic Stack is not an “install and go solution”. The number of servers required will depend on your needs. At a minimum, for any production environment, you will have to install software on three servers. Elasticsearch and Kibana will each have their own servers, plus adding Logstash to at least one host server.

There are performance reasons to consider having users connect to Kibana from a machine separate than Elasticsearch. Elasticsearch can require a lot of CPU and memory depending on the operations being run. If Kibana is sharing those resources, the result is added latency and slow performance for the user. Running Kibana on its own server is recommended by the Elastic documentation. “While Kibana isn’t terribly resource intensive, we still recommend running Kibana separate from your Elasticsearch data or master nodes.” *

Maintaining Elasticsearch performance is a careful balance between number of servers and amount of data. To realize the full capabilities of Elasticsearch it is necessary to distributes pieces of the searchable data amongst its servers. Expect to add servers over time to keep the search performant.

Like all services, Elasticsearch will fail on occasion so planning for failover is important. The recommendation is to have a one to one ratio between server and a replicated backup. Each primary server should keep a complete copy of its data on at least one replica server. In the event of a hardware failure, Elasticsearch will automatically switch to the replica. This is the recommendation of the Elasticsearch documentation as well, “It provides high availability in case a shard/node [server] fails. For this reason, it is important to note that a replica shard is never allocated on the same node [server] as the original/primary shard that it was copied from.”

*(https://www.elastic.co/guide/en/kibana/current/production.html).

Page 9: logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current preferred stack for do-it-yourself (DIY) logging. It is generally thought to be composed

logentries.com

9

This is especially important for not losing data in Kibana. If data is unreachable, Kibana can’t indicate its absence. Reports and graphs will simply be incorrect until the problem is realized and fixed.

Servers may also be required to support the various tools mentioned above. Marvel and Watcher may require their own hardware for performance and logic reasons. Centralizing Logstash is also recommended so there will be a need for at least one Logstash server to receive log data and to send it to Elasticsearch. Therefore, the absolute minimum number of servers is 5:

• Elasticsearch primary server

• Elasticsearch replica server

• Kibana

• Logstash

• Marvel, Watcher, etc.

Scaling Costs Installing the Elastic Stack is only beginning. It will grow over time and will need monitoring and scaling to keep it healthy. New indexes will require new servers. Growing logs will require more disk space. Changes in your data or logging structure will require reindexing your data.

Unfortunately, there will be problems that can’t be solved by throwing more disk space or servers at Elasticsearch. Scrunch.com’s

blog post, “Lessons Learned From A Year Of Elasticsearch In Production”, mentions several potential performance affecting issues to monitor. They suggest monitoring thread pools and heap memory, both of which can cause significant performance issues if their sizes are not monitored. Marvel can help with this, as well as a regular schedule of pruning and archiving.

Costs in Lost Opportunity Setting up indexes is as much an art as it is a science. There will be a definite learning curve that will affect the quality of the data gathered. Indexes not being configured correctly can mean important data is lost until the issue is rectified. It is important to be vigilant about what is being logged and comparing it to what should be logged.

Cloud Hosting Costs Self-hosting will help limit cost but may not be practical or desirable. In that case, cloud hosting is the most likely option. Cloud computing will incur costs for:

• Hardware

• Data stored

• Data transferred between servers

Data transfer costs in particular can vary wildly. A major event or issue could cause a burst of activity that results in much higher than usual costs. Major bursts of activity could result in overage fees at best, and data loss at worst.

Page 10: logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current preferred stack for do-it-yourself (DIY) logging. It is generally thought to be composed

logentries.com

10

Data Storage Costs The amount of space needed for data storage requires careful consideration. Elasticsearch works by storing independent indexes of data. Data can be indexed more than once depending on how the indexes are configured. Additional fields added to documents for indexing purposes can also add to data size. Furthermore, storage needs will increase over time as the data being indexed, and the indexes grow. It is best to prepare for storage requirements to increase.

“ Data storage is probably the biggest cost you will experience over time.

Documentation Costs Meticulous documentation is essential for every Elastic Stack implementation. The institutional knowledge gained from building an Elastic Stack can’t be recreated.

This information will be extremely valuable for long-term maintenance and support. As your Stack matures and ages over time, it is important to keep the documentation current.

Page 11: logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current preferred stack for do-it-yourself (DIY) logging. It is generally thought to be composed

logentries.com

11

The ELK Stack is most useful when having full control over the environment is important and the needed resources are available. As illustrated here, the Elastic Stack is not a shortcut to avoid the costs of proper logging. The Elastic Stack may not have a monthly fee and may not have software licenses, but that cost is still there in the form of rigor, resources, and scaling. The decision of whether or not to use the Elastic Stack for DIY logging is not about how much it costs compared to managed services, but rather where you want to allocate your resources and funds.

Conclusion

Page 12: logentries - content.ebulletins.comlogentries.com 4 Introduction The ELK Stack is the current preferred stack for do-it-yourself (DIY) logging. It is generally thought to be composed

logentries.com

12

Start your 30-Day Logentries Free Trial Today. Save yourself and your team from the headache of standing up and maintaining the ELKeBMWSC(Tn) stack. Logentries makes it easy to manage all of your machine data.

Get started for free at logentries.com

4 Unlimited log centralization

4 Secure data transmission

4 Protection from log manipulation

4 Easy search for known events & patterns

4 Full RegEx Support

4 Affordable plans

4 Real-time Alerts

4 Inactivity Alerts

4 Anomaly Detection

4 Data filtering & obfuscation

4 Custom tagging of known events

4 Custom retention policies

Figure: Customizable Dashboard view