Building Data Pipelines with SMACK: Designing Storage Strategies for Scale and Performance

Building Data Pipelines with SMACK: Storage Strategies for Scale & PerformanceJune 8, 2016Jonathan Shook, Solution Architect, DataStax

Allene Jue

Added / DSE to first bullet.

Allene Jue

Combined your two pro/con slides. I put it in the appendix in case you want to reference them as you review.I also added /DSE to the SSD con bullet.

Spark

Mesos

Akka

Cassandra

Kafka

1 Essential Storage Concepts

2 Design Strategies

3 Storage Selection

4 Q & A

3© DataStax, All Rights Reserved.

Essential Storage ConceptsThe Basics

Important Terms

• Topology

• Bandwidth, Throughput, Headroom

• Latency, Minimum Latency

• Concurrency, Parallelism, Contention

© DataStax, All Rights Reserved. 5

Basic System Topology

6

Every modern system is essentially a network of components.

The language of message delivery applies at every level of design.

System Topology Example (high level)

HDD SSD

Term: Bandwidth, Throughput, Headroom

• Bandwidth - Maximum rated transfer speed of a device• Throughput - Measurement of achievable transfer speed• Headroom - Safety margin above normal usage - “reserve

capacity”


Throughput Example: SATA3

Using a popular SSD and an online benchmark...


Bandwidth Throughput Headroom

6Gb/s (750MB/s) 40MB-500MB as tested, depending on operation type

30%, for example. This is a design parameter.

In this case, if you can achieve 200MB throughput on the drive for your operational patterns, headroom of 30% means you should be scaling out before your metrics show 140MB/s.

Term: Latency and Minimum Latency

• Latency - How long it takes to receive a response, once a request is submitted

• Minimum Latency - Latency which is possible on a single node when there is no resource contention


Single Node Replica Set of 3 Nodes and LOCAL_QUORUM

• However fast that node can service the request, uncontended.

• Writes: The fastest 2 of 3 nodes in the replica set to respond.

• Reads: Usually the fastest 2 of 3, based on latency trends.

Latency and Throughput Example:Random reads at different block sizes


SATA HDD has an unavoidable seek time penalty for all op sizes. Throughput tops out at 180MB/s at 16MB read sizes and over 1.5 seconds of latency.

SATA SSD performs well. 550MB is possible, but desirable latencies are found below 1MB read size.

The NVMe drive can push 2 CDs worth of data per second at 128KB read sizes. At 16MB, latency is only .25 seconds.


Latency and Throughput Example:Compared by Drive Type

This shows the same measurements compared between drive types.

Latency & Throughput Example:Comparative Numbers

12

1 block read (512 bytes)

KB/s µs latency iops

NVMe 62006 177 124013

SATA SSD 38700 306 77400

SATA HDD 215 119000 430

256 block read(128 KB)


NVMe 1707520 1160 13339

SATA SSD 549133 2320 4290

SATA HDD 41198 157000 321

32K block read(16 MB)


NVMe 1339596.8 235000 81

SATA SSD 554920 594000 33

SATA HDD 179063 1647000 10

Term: Concurrency, Parallelism, Contention

• Concurrency - Multiple requests in flight• Parallelism - Simultaneous processing of requests• Resource Contention - When work is blocked awaiting

access to a shared resource

Concurrency without parallelism causes resource contention, queueing, latency increases, and unhappy users.


(Storage) Design StrategiesCore Strategies for Going Fast and Staying Fast

Key Design Strategies

1. Design to the Workload

2. Simplify the Storage Path

3. Maintain Headroom

4. Balance Compute and I/O

5. Balance I/O Caching


Strategy #1: Design to the Workload

• Estimate your workloads. Focus on the read patterns.

• Can your users endure effects of resource contention?

• Can they endure disruptive outliers?

• How do you know?© DataStax, All Rights Reserved. 16

Strategy #2: Simplify the Storage Path


• Avoid unnecessary hardware layers. Go directly from your system chipset to the drive when possible.

• Favor JBOD over storage aggregation.• Only use RAID for:

– Datacenter or Operator Standards with HDDs. (Try to avoid RAID with SSDs if possible.)

– Aggregating smaller disks. (Why not just get larger drives for JBOD?)

Strategy #3: Maintain Headroom

• Build-in headroom according to your loading patterns.• Measure your system with bench tools. • Saturate during non-prod testing, and use that as a reference

point in production.


Strategy #4: Balance Compute and I/O


• Databases are not just storage APIs.

• You need to keep your CPU and IO throughput in relative balance.

• Perfection is not required, but extreme imbalances are no fun.

• There will always be a bottleneck.

Strategy #5: Balance I/O Caching


• Understand the potential benefits of caching: best and worst cases.

• “Unused” memory in Linux is available for caching.

• Don’t depend on cache to solve cold read latencies.

• Design around cold-read performance first.

https://www.quora.com/What-is-the-major-difference-between-the-buffer-cache-and-the-page-cache

Storage SelectionBuild for Effect

22

It’s a bad idea.

SANs for distributed databases...

Have strong skepticism when anybody tells you otherwise. Perhaps they haven’t tried it yet, or are ignoring the obvious.

You don’t have to suffer the pains of others in order to learn from their experiences. Still, some insist on trying.

HDD vs. SSD

23

Type Pro Con

HDD ● Cheap? ● All concurrent operations are contended● Random access is slow - drive seek● Power usage● Lower latencies come with much higher

costs● Little room for further improvement

SSD ● Cheap? (1TB ~ $300)● Fast● Low internal contention● Runs cooler / lower

wattage● Faster transport

technology available

● Initial capacities available - encouraged RAID shenanigans → No longer an issue for reasonable data densities with Cassandra/DSE.

● MTBF of earlier designs → No longer an issue as SSDs have made huge strides in reliability and DWPD limits

● Initial cost - No longer an issue

Workload Concurrency & Storage Parallelism


Selecting SSD vs. HDD

Favor modern SSDs by default.

Use HDDs only if you must for:● High-write applications with low read concurrency● Archival or Logging systems with low read concurrency● Commit log storage, if you have the option● Persistent messaging systems● Non-latency sensitive batch/analytics workloads

25

Storage Path


A) Direct SSDB) Direct HDDC) NVMeD) SSDs via HBAE) HDDs via HBAF) Combo via HBA

We’ll come back to this slide if we have time.

HDD SSD

Data Density

• Keep data density in reasonable bounds.

• Every database must deal with the realities of storage traversal.

• Avoid trying to store too much data on a node.


In Conclusion...

• Provision with headroom to avoid unnecessary contention.

• Select hardware to support user and workload requirements.

• Keep the storage path as simple as possible.

• Consider SSDs by default for your data directories.

28

Coming Soon!

● June 23: Top 5 Reasons Why DSE is Game Changing

● July 7: Proofpoint & DataStax Webinar

● For the latest schedule of webinars, check out our Webinars page: http://www.datastax.com/resources/webinars

© 2015 DataStax, All Rights Reserved. 29

http://www.datastax.com/resources/webinars

Get your SMACK on!

Thank You!

Follow me on Twitter: @Shookinator


THANK YOU!


Q & A


Additional Resources

Latency Spectrum for small ops


Math relating to Scale & Performance

Little’s LawRelates latency, concurrency and throughput as averages.

Ahmdahl’s LawRelates latency to improvements in working resources.

Pigeonhole principleStatistics of the pigeonhole principle come up again and again in distributed computing.

Latency numbers every programmer should know.


https://en.wikipedia.org/wiki/Little's_law

https://en.wikipedia.org/wiki/Amdahl's_law

https://en.wikipedia.org/wiki/Pigeonhole_principle

http://i.imgur.com/k0t1e.png

Online Resources

C* Microbench scriptsFio scripts to measure a disk subsystem across many C*-style workloads.https://github.com/jshook/perfscripts

Al’s Tuning Guide: https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html


https://github.com/jshook/perfscripts

https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html

Terms: Concurrency, Parallelism, visually


concurrency only concurrency with parallelism

Addendum: What about RAID?

See IBM Patent 4092732 about a 1978 solution to a 1978 problem: drives were very unreliable, and systems were not resilient to failure. In 1978, parallelism was pronounced “mainframe”. Times have changed.

System topologies of today expose storage parallelism all the way to the drive. Cassandra allows drive failure without cluster failure. Cassandra can make direct use of the parallelism exposed at the storage layer.


http://worldwide.espacenet.com/textdoc?DB=EPODOC&IDX=US4092732

Building Data Pipelines with SMACK: Designing Storage Strategies for Scale and Performance

Technology

Transcript of Building Data Pipelines with SMACK: Designing Storage Strategies for Scale and Performance