WHITEPAPER CHOOSING THE RIGHT STORAGE … · linkedin.com/company/red-hat INTRODUCTION ... (the...

6
redhat.com facebook.com/redhatinc @redhatnews linkedin.com/company/red-hat INTRODUCTION Savvy enterprises are investing in operational analytics to help manage increasing business and technological complexity. In doing so, they are able to drive greater efficiency, enhanced customer satisfaction, increased transparency, and superior resilience. Deploying and managing operational analytics at scale is not without challenges, however, particularly with regards to data storage. In order to deliver the insights that operational analytics users demand, large amounts of information must be collected and stored. As a result, IT organizations commonly find that storage for enterprise-scale operational analytics is either too difficult to manage, or too expensive, or both. To help businesses easily and cost-effectively realize the benefits of operational analytics, Red Hat has integrated Splunk ® Enterprise, an industry-leading platform for delivering real-time operational intelligence, with Red Hat ® Gluster Storage, a software-defined storage platform for files, objects, and machine-to-machine data. Using these products together helps enterprise solve the cost and scale problems of explosive analytic data growth. The use of Red Hat software-defined storage with Splunk creates an important new opportu- nity for enterprises deploying Splunk Enterprise: as opposed to using inexpensive but difficult to manage direct-attached storage (DAS), or expensive and high-latency network-attached storage (NAS), Red Hat is pioneering a hybrid storage model that allows Splunk Enterprise to use a combination of DAS and software-defined storage to achieve a high-performance, highly manageable, and cost-effective system for operational analytics. WHITEPAPER CHOOSING THE RIGHT STORAGE PLATFORM FOR SPLUNK ENTERPRISE

Transcript of WHITEPAPER CHOOSING THE RIGHT STORAGE … · linkedin.com/company/red-hat INTRODUCTION ... (the...

redhat.com

facebook.com/redhatinc @redhatnews

linkedin.com/company/red-hat

INTRODUCTION

Savvy enterprises are investing in operational analytics to help manage increasing business

and technological complexity. In doing so, they are able to drive greater efficiency, enhanced

customer satisfaction, increased transparency, and superior resilience.

Deploying and managing operational analytics at scale is not without challenges, however,

particularly with regards to data storage. In order to deliver the insights that operational

analytics users demand, large amounts of information must be collected and stored. As a

result, IT organizations commonly find that storage for enterprise-scale operational analytics

is either too difficult to manage, or too expensive, or both.

To help businesses easily and cost-effectively realize the benefits of operational analytics,

Red Hat has integrated Splunk® Enterprise, an industry-leading platform for delivering

real-time operational intelligence, with Red Hat® Gluster Storage, a software-defined storage

platform for files, objects, and machine-to-machine data. Using these products together

helps enterprise solve the cost and scale problems of explosive analytic data growth.

The use of Red Hat software-defined storage with Splunk creates an important new opportu-

nity for enterprises deploying Splunk Enterprise: as opposed to using inexpensive but difficult

to manage direct-attached storage (DAS), or expensive and high-latency network-attached

storage (NAS), Red Hat is pioneering a hybrid storage model that allows Splunk Enterprise to

use a combination of DAS and software-defined storage to achieve a high-performance, highly

manageable, and cost-effective system for operational analytics.

WHITEPAPER

CHOOSING THE RIGHT STORAGE PLATFORM FOR SPLUNK ENTERPRISE

2redhat.com WHITEPAPER Choosing the right storage platform for Splunk Enterprise

DATA REQUIREMENTS FOR OPERATIONAL ANALYTICS

To more rapidly identify trends, patterns, and behaviors in operational data, or to facilitate regula-

tory compliance, enterprises retain the data indexed by Splunk Enterprise for extended periods of

time. This is because Splunk’s data-hungry analytical algorithms produce more insightful results

when fed more data, both in terms of the number of unique data sources as well as the number of

retained data points from each source.

According to Splunk documentation, daily indexing volumes for medium and large enterprises

are typically:

• 100–300GB per day for a medium enterprise with tens to low hundreds of users.

• 300GB–1TB per day for a large enterprise with up to five hundred or more users.

Figure 2 illustrates the aggregate amount of storage required as the ingest rate and data retention

period vary. As can be seen in the figure, a large enterprise ingesting a moderate 500GB of data

per day will accumulate approximately 1PB of data if that data is to be retained for four years, while

an enterprise indexing 1TB of data per day will require the same 1PB of data retaining that data for

only two years.

Figure 1. Data retention aids pattern recognition in operational analytics. © 2013 Richard Candy

3redhat.com WHITEPAPER Choosing the right storage platform for Splunk Enterprise

With such large data storage requirements, enterprises are faced with selecting a storage archi-

tecture offering scalability, manageability, and low cost, while not compromising on performance.

STORAGE OPTIONS FOR SPLUNK ENTERPRISE

Enterprises have several options when it comes to architecting storage for Splunk Enterprise,

each offering a unique combination of operating characteristics at a given price point.

LOCAL DIRECT-ATTACHED STORAGE

The default storage option for Splunk is local, direct-attached storage. Local DAS has the advan-

tage of simplicity, allowing enterprises to get started quickly, using storage already available in

their Splunk servers.

In addition to simplicity, DAS also offers high performance. Because DAS is connected locally to

the Splunk Indexer via a high-bandwidth and low-latency SATA bus, operations such as indexing

and search can be very fast. Local storage is also extremely cost-effective, since the drives them-

selves are commoditized.

In spite of its short-term advantages, local storage presents significant manageability challenges

in the long term, as storage requirements grow. These include:

• Poor expandability. Upgrading local disks is time consuming and generally requires that nodes

be taken out of service for the duration of the upgrade.

• Reduced efficiency. Because compute and storage must be scaled together, direct-attached

storage results in lower overall resource utilization.

• Lower availability. With direct-attached storage, disk failures can result in data loss and system

downtime.

Thus, while all-local storage can be effective for some Splunk deployments, enterprises with large

data sets can quickly grow out of DAS.

Figure 2. Splunk storage required vs retention period and ingestion rate

4redhat.com WHITEPAPER Choosing the right storage platform for Splunk Enterprise

SHARED ENTERPRISE STORAGE

With the manageability limitations of local storage providing the motivation, traditional enterprise

storage vendors suggest that a NAS cluster be deployed by Splunk Enterprise users in an all-shared

storage architecture.

Deploying a NAS cluster in a shared manner means that all Splunk Indexers store indexed data

on the NAS devices, where it may be accessed directly via Splunk Search Heads.

The shared nature of NAS storage does improve upon the manageability challenges presented by

large amounts of DAS. However, because shared storage is accessed over a network, it imposes

performance and latency penalties not present with DAS, resulting in reduced indexer ingest

throughput and longer search times.

In addition to diminished performance, cost is a significant challenge with a NAS cluster. Due to their

closed, proprietary nature, traditional NAS devices can cost many times as much as the equivalent

amount of storage obtained via commodity disk drives.

Beyond performance and cost, traditional NAS is also:

• Hardware-based. Traditional NAS seeks to deliver reliability through expensive hardware

redundancy and requires additional software or hardware to deliver the disaster recovery

required for operational analytics projects.

• Monolithic. The monolithic nature of traditional NAS makes it difficult to expand incrementally.

This presents challenges for operational analytics projects, which typically start small, but

expand broadly across the enterprise as they mature.

• Proprietary. Traditional NAS locks customers in and dramatically adds to the total cost of

ownership (TCO) of operational analytics projects, especially at scale.

• Rigid. NAS supports a single, on-premise deployment model and makes it difficult to deploy

a cloud-based operational analytics system.

These characteristics make traditional enterprise NAS devices a weak fit for operational analytics

projects.

HYBRID SOFTWARE-DEFINED STORAGE

To help enterprises to overcome the manageability and scalability challenges of local storage,

while avoiding the performance and cost shortcomings of NAS, Red Hat has integrated Red Hat

Gluster Storage with Splunk Enterprise using a hybrid storage model.

Splunk Enterprise enables the hybrid storage by segmenting indexed data into “hot,” “warm,”

“cold,” and “frozen” repositories called “buckets.” Splunk’s data placement policies control the

distribution of data across buckets, based on the size of the indexes or the age of the data they

contain. Buckets allow enterprises to maximize efficiency, performance and value by utilizing a

tiered approach to managing the lifecycle of ingested data.

5redhat.com WHITEPAPER Choosing the right storage platform for Splunk Enterprise

In the hybrid storage model, Splunk Enterprise stores recently indexed data on DAS, maximizing

performance, and moves older data to a storage system selected to ensure scalability and

manageability at a low total cost of ownership (TCO).

Red Hat Gluster Storage is particularly well suited for housing Splunk Enterprise data in cold

and frozen buckets because it is:

• Software-defined. Red Hat Gluster Storage provides reliability inexpensively, via software,

and requires no additional hardware or software to ensure data protection and disaster

recovery for operational analytics or other workloads.

• Cost-effective. Red Hat Gluster Storage environments are based on open-source software

(the proven GlusterFS file system and Red Hat Enterprise Linux®) running across industry-

standard servers and disk drives, eliminating storage vendor lock-in and delivering low

TCO for operational analytics projects.

• Expandable. Red Hat Gluster Storage is easily expanded, with no downtime, allowing

operational analytics projects to start small and grow as needed without disruption.

• Flexible. Red Hat Gluster Storage is easily deployed wherever Linux runs, facilitating

operational analytics both on-premise and in the cloud.

The open, storage-defined nature of Red Hat Gluster Storage and its tight integration with

Splunk Enterprise make it an ideal choice for supporting enterprise operational analytics.

Hybrid storage using DAS and Red Hat Gluster Storage addresses the shortcomings of both

DAS- and NAS-based approaches.

HYBRID STORAGE IS THE BEST OF BOTH WORLDS

DAS-ONLY

Hot/Warm and Cold Data

NAS-ONLY

Hot/Warm and Cold Data

HYBRID

Hot/Warm on DAS,

Cold on Red Hat Storage

Pros• Performance

• Cost• Manageability

• Cost

• Manageability

• Performance

• Scalability

Cons• Scalability

• Manageability

• Cost

• Performance

Figure 3. Buckets in Splunk Enterprise allow indexed data to be segmented.

Copyright © 2015 Red Hat, Inc. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, and JBoss are trademarks of Red Hat, Inc., registered in the U.S. and other countries. Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.

facebook.com/redhatinc @redhatnews

linkedin.com/company/red-hat

NORTH AMERICA 1 888 REDHAT1

ABOUT RED HAT

Red Hat is the world’s leading provider of open source solutions, using a community-powered approach to provide reliable and high-performing cloud, virtualization, storage, Linux, and middleware technologies. Red Hat also offers award-winning support, training, and consulting services. Red Hat is an S&P company with more than 80 offices spanning the globe, empowering its customers’ businesses.

EUROPE, MIDDLE EAST, AND AFRICA 00800 7334 2835 [email protected]

ASIA PACIFIC +65 6490 4200 [email protected]

LATIN AMERICA +54 11 4329 7300 [email protected]

redhat.com #12350037_INC0210625_v2_0215

CONCLUSION

The Red Hat and Splunk partnership has resulted in an important new deployment alternative

for enterprises deploying Splunk Enterprise for operational analytics. With the hybrid deploy-

ment model, enterprises can deploy Splunk Enterprise using direct-attached storage for hot

and warm data, and Red Hat Gluster Storage for cold and frozen data.

The hybrid storage configuration has the advantage of offering highest-performance ingest

and search on the most recent data, and strong performance search on older data, while

minimizing overall cost and complexity.

For more information on the Red Hat Gluster Storage and Splunk, visit redhat.com/storage/.

WHITEPAPER Choosing the right storage platform for Splunk Enterprise