Design Decisions and Repercussions of Compression and Data ...

22
Design Decisions and Repercussions of Compression and Data Reduction in a Storage Array Chris Golden Pure Storage

Transcript of Design Decisions and Repercussions of Compression and Data ...

Page 1: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Design Decisions and Repercussions of Compression and Data Reduction in

a Storage Array

Chris Golden Pure Storage

Page 2: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Data Reduction

Huge performance impact on disk Very little performance impact on flash Table stakes feature for all-flash storage arrays Goals Increase usable capacity Extend media lifetime

Many different ways to implement These design decisions have ripple effects

2

Page 3: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Dedup + Compression – Better Together

3

Page 4: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Dedup vs Compression

Compression – A local process reducing the number of bits required to represent data

Deduplication – A global process ensuring that identical data is only written once

4

Page 5: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Block Size + Alignment

All arrays have 3 block sizes and alignments Client, Array Device

Size mismatches and/or unaligned operations create read / write amplification in the underlying layer

Client > Array > Device

5

Page 6: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Data Protection

Our choice of data protection matters Mirroring, RAID5, RAID6, etc.

Should be designed with compression and deduplication in mind

6

Page 7: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Compression

Reduces the number of bits required to represent the original data

Final representation is variable sized We can apply more CPU for better results

7

Page 8: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Compression – How to Choose

Variables to optimize Compression Speed Decompression Speed Size of representation

Variables that can be controlled Choice of algorithm, level

Variables that change with workload Compressibility of the data

8

Page 9: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Compression – How to Store

Full Precision Nearest Size Bins Large Bins with Many Blocks

9

Page 10: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Full Precision

Find free space on the underlying devices exactly matching the compressed size

Global fragmentation problem Overwrites have 3 outcomes, 2 are bad New data is larger, must find a new place Data is smaller, extra space may be wasted

10

Page 11: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Details – Nearest Sized Bin

11

Page 12: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Details – Large Bins

12

Page 13: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Deduplication

Identify duplicate bits and store them a single time

Cryptographic Hash vs Weak Hash and Verify Fixed Size vs Variable Sized

13

Page 14: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Details – Cryptographic vs Weak Hash

Cryptographic hash More CPU required More metadata required

Weak Hash Collisions must be resolved via device read

14

Page 15: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Details – Fixed vs Variable Size

Fixed size Fewer hashes to calculate Alignment of user writes matters

Variable size More metadata required Choice of compression becomes important

15

Page 16: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Variable Length Dedupe + Compression

Creates read amplification and CPU overhead Must read and decompress entire block

Overwrites also get more complicated

TODO: Add diagram

16

Page 17: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Compression + Dedupe

Order of operations matters Compress first Might require less CPU Works well with fixed sized compression and

dedupe Dedupe first Can find duplicates in the middle of

compressed blocks Works well for variable dedup 17

Page 18: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Space Reporting

Compression and dedup make space reporting complicated

Hard problems Identifying shared space and reporting Predicting future capacity

How much space will I reclaim if I remove this data?

How much additional data can I store?

18

Page 19: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Performance

We’re skimming some performance off the top to enable greater space utilization

Compression eats CPU cycles Dedup eats CPU cycles and may create

hotspots Read / Write amplification

19

Page 20: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

PLACEHOLDER - Analysis of a Sample Data Set

20

Page 21: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Wrap Up

Different ways to implement data reduction Each implementation has implications on the

rest of your system design Performance Economics Ease of implementation

21

Page 22: Design Decisions and Repercussions of Compression and Data ...

2015 Storage Developer Conference. © Pure Storage. All Rights Reserved.

Thank You!

Questions?

22