Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves...

22
1 | ©2020 Storage Networking Industry Association. All Rights Reserved. Compression: Putting the Squeeze on Storage Live Webcast September 2, 2020 11:00 am PT

Transcript of Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves...

Page 1: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

1 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Compression: Putting the Squeeze on Storage

Live WebcastSeptember 2, 202011:00 am PT

Page 2: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

2 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Today’s Presenters

John KimChair, SNIA Networking Storage Forum

NVIDIA

Brian WillIntel® QuickAssist Technology

Software ArchitectIntel

Ilker CebeliModeratorSamsung

Page 3: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

3 | ©2020 Storage Networking Industry Association. All Rights Reserved.

SNIA-At-A-Glance

3

Page 4: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

4 | ©2020 Storage Networking Industry Association. All Rights Reserved.

NSF Technologies

4

Page 5: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

5 | ©2020 Storage Networking Industry Association. All Rights Reserved.

SNIA Legal Notice§ The material contained in this presentation is copyrighted by the SNIA unless otherwise

noted. § Member companies and individual members may use this material in presentations and

literature under the following conditions:§ Any slide or slides used must be reproduced in their entirety without modification§ The SNIA must be acknowledged as the source of any material used in the body of any document containing

material from these presentations.§ This presentation is a project of the SNIA.§ Neither the author nor the presenter is an attorney and nothing in this presentation is

intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney.

§ The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information.NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.

Page 6: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

6 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Agenda

§Why, Where and What to Compress§Data Compression Techniques

Page 7: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

7 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Click to edit Master title style

Why, Where and What to CompressJohn Kim

Page 8: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

8 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Yes, Compress!§Save space§Reduce network bandwidth §Doesn’t lose data (usually)

No, Work Less!§More time to write/read files§Consumes processing power§ Long-term accessibility risk

Why Compress Data?

Page 9: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

9 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Where to Compress the Data

§ In application – e.g. Photos, videos§At client – e.g. Zip files§On network – e.g. WAN optimization appliance§ In storage – Storage controller, SmartNIC or computational storage§During backup – Backup software or hardware

§Can compress in more than one place§ Especially if data is decompressed whenever it’s received/read

Page 10: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

10 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Where Data Can be Compressed

§Client, server, storage, and/or backup compress data§Can accelerate with software library, FPGA, SmartNIC or DPU

Client Utility or

OS

Storage controller

Storage device

Storage network

Compress/Decompress whenever data

enters/exits network

Backup server,

device, or service

Client App

Server OS

Server App

WAN optimization or SD-WAN devices

Page 11: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

11 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Yes, Compress!§Contains empty space§ Local repetitive sequences

§ At bit/byte level§ Within each file/object§ Locally-repeated blocks

No, Don’t Stress§Already compressed§Non-local repetition

§ At file/object level§ Identical files/objects§ Dispersed block repetition

What Types of Data Should be Compressed

Page 12: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

12 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Easy to compress File Hard to compress files

Good and Bad Compression Candidates

Many repeated characters, large blocks of empty space. Many repeated characters.

Many repeated characters.ABCD ABCD ABCD ABCD

00000000000000000000000000000000005678 5678 5678 5678 5678 5678 5678

My dog has fleas My dog has fleasMy dog has fleas

My dog has fleas

My cat has NVMe SSDs

My cat has NVMe SSDs

My dog has fleas

The skunk smells nice

My dog has fleas

The skunk smells nice

My cat has NVMe SSDs

My dog has fleas

Page 13: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

13 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Click to edit Master title style

Data Compression TechniquesBrian Will

Page 14: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

14 | ©2020 Storage Networking Industry Association. All Rights Reserved.

What is Data Compression

§Definition: “The process of encoding data to reduce its size.”*§What is lossless vs lossy?

§ Lossless: “compression using a technique that preserves the entire content of the original data, and from which the original data can be reconstructed exactly”*

§ Lossy: “compression using a technique in which a portion of the original information is lost.”*

*SNIA: https://www.snia.org/education/online-dictionary/term/data-compression

Page 15: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

15 | ©2020 Storage Networking Industry Association. All Rights Reserved.

LZ Lossless Compression Introduction Dictionary Coding (example LZ77)

Min Match: 3Window: > 6T I C T A C T O E T I C T A C T A C

T I C T A C T O E 9 6 3 3

Char Freq Code (bits)

T 6 00

C 5 01

A 3 100

I 2 101

E 1 110

O 1 111

Distance-Length Pair

T I C T A C T O E T I C T A C T A C

00 101 01 00 100 01 00 111 110 00 101 01 00 100 01 00 100 01

Entropy Encoding (examples Huffman and Arithmetic coding etc.)

Page 16: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

16 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Different Compression Algorithms

Zstandard (2015)

Brotli (2015)

Snappy (2011)

LZ4 (2013)

Xpress (2014)

Facebook

DEFLATE (1996)

Speed

Balanced with High compression options

Google

Open

Microsoft

Google

Open

• To the left are some of the predominant lossless compression algorithms utilized today.

• They are integrated into many workloads/software stacks:• Linux Kernel Crypto Framework(LKCF)• ZFS, BTRFS, SquashFS• MongoDB, RocksDB, Hadoop, MySQL• WinZip, gzip• HTTP encoding (NGINX/HAProxy/Apache

Traffic Server etc.)

lzf lzo lzma Blosclz zopfli crush lzg

lzham lizard rle bzip2 lzw lzss ans

lzr lzj Lzc DEFLATE64 PAQ lzmw lzrw

Techniques employed: • LZ77/LZ78• Huffman Encoding• Finite State Entropy Encoding

*Other names and brands may be claimed as the property of others.

This is only a small subset of the entire landscape..

• Burrows-Wheeler transform• Pre-built dictionaries • etc.

Page 17: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

17 | ©2020 Storage Networking Industry Association. All Rights Reserved.

How Compression Affects Performance

zlib-1

zlib-9

lz4fast-3

lz4hc-12zstd-1

zstd-18 zstd-22

brotli-0

lzma-0 lzma-9

0%

10%

20%

30%

40%

50%

60%

0 500 1000 1500 2000 2500

Ratio

(Low

er is

Bet

ter)

Cycles/Byte

Compression performance vs ratio*

zlib-1

zlib-9

lz4fast-3

lz4hc-12zstd-1

zstd-18

zstd-22

brotli-0

brotli-11

lzma-0

lzma-9

0%

10%

20%

30%

40%

50%

60%

0 10 20 30 40 50 60 70

Ratio

(Low

er is

Bet

ter)

Cycles/Byte

Decompression performance vs ratio*

* Data for these graphs taken directly from lzbench: https://github.com/inikep/lzbench

• Compression performance varies significantly based on the ratio level targeted. • 1-2% at higher ratios can have a significant cycle cost

• Decompression performance is reasonably clustered based on format rather then compression ratio• Much lower cost then compression.

Page 18: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

18 | ©2020 Storage Networking Industry Association. All Rights Reserved.

When to Compress: real-time vs post processing

§ The choice between real-time and post processing comes down to one of performance (compression ratio & throughput) vs cost§ Real Time:

§ File Systems reads (decompression)§ HTTP encoding/decoding

§ Post-Processing/Offline:§ Log files§ Cold Storage Tiers

Page 19: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

19 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Summary

§Compression saves space, bandwidth, and improves total cost of ownership

§Some types of data can be compressed more than others§Data can be compressed while transmitted, stored, and backed up§ There are several propriety and open source compression algorithms

§ Compression performance varies with compression ratio§Choosing the right compression algorithm for workload is critical for

performance and cost

Page 20: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

20 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Related Webcasts in this Series

§Everything You Wanted to Know About Storage But Were Too Proud to Ask: Data Reduction§ BrightTALK on-demand: https://www.brighttalk.com/webcast/663/428420§ SNIAVideo YouTube channel: https://youtu.be/N_y_1frsYtQ

§Not Again! Data Deduplication for Storage Systems – October 2020§ Follow us for date and time @SNIANSF

Page 21: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

21 | ©2020 Storage Networking Industry Association. All Rights Reserved.

After this Webcast

§Please rate this webcast and provide us with your feedback§ This webcast and a copy of the slides will be available at the SNIA

Educational Library https://www.snia.org/educational-library§A Q&A from this webcast, including answers to questions we couldn’t

get to today, will be posted on our blog at https://sniansfblog.org/§ Follow us on Twitter @SNIANSF

Page 22: Compression: Putting the Squeeze on Storage...§Compression saves space, bandwidth, and improves total cost of ownership §Some types of data can be compressed more than others §Data

22 | ©2020 Storage Networking Industry Association. All Rights Reserved.

Thank You