How Shit Works: Storage
-
Upload
tomer-gabel -
Category
Software
-
view
399 -
download
0
Transcript of How Shit Works: Storage
![Page 1: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/1.jpg)
How Shit Works:
Storage
Tomer Gabel, Wix
@ GeeCON Kraków 2016
![Page 2: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/2.jpg)
Like all good stories…
• We’ll start with a question.
• “What’s wrong with this picture?”
![Page 3: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/3.jpg)
Like all good stories…
• We’ll start with a question.
• “What’s wrong with this picture?”
![Page 4: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/4.jpg)
MY, OH, MY.WHAT COULD IT BE?
![Page 5: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/5.jpg)
Axioms
• Not a trick question
– Servers are properly configured
– System architecture makes sense
– No obvious bugs
– No scheduled jobs
• So what else goes bump in the night?
![Page 6: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/6.jpg)
PROLOGUE
“A LAUGHABLE CLAIM”
![Page 7: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/7.jpg)
I/O is simple
• Just open a file, write, flush, close
• Nothing to it, right?
HDDApplication File
![Page 8: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/8.jpg)
I/O is simple
• A little closer…
HDD
Application File
Kernel
File system(ext4)
Virtual File
System Logical Volume Manager
I/O scheduler
SCSI driver stack
![Page 9: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/9.jpg)
I/O is simple
• But really…
HDD
Application File
Kernel
Hardware
Storage Subsystem
System Bus Drivers
PCI Express Bus
SATA Controller
![Page 10: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/10.jpg)
THE ONION OF ABSTRACTION
![Page 11: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/11.jpg)
ACT ITHESE BOOTSARE MADEFOR WALKIN’
![Page 12: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/12.jpg)
Everybody knows...
• Sequential access is fast
• Random access is slow
• … so what?
![Page 13: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/13.jpg)
Everybody knows…
“Disk seeks are a huge performance bottleneck… When the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.”
-- MySQL Reference Manual (8.12.3)
![Page 14: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/14.jpg)
Everybody knows…
“Disk seeks are a huge performance bottleneck… When the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.”
-- MySQL Reference Manual (8.12.3)
![Page 15: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/15.jpg)
But why?
![Page 16: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/16.jpg)
Rotational Latency
![Page 17: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/17.jpg)
Rotational Latency
![Page 18: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/18.jpg)
Rotational Latency
![Page 19: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/19.jpg)
Rotational Latency
![Page 20: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/20.jpg)
Throughput
• So you understand
latency…
• What about throughput?
• Depends on two factors:
– Areal density
– Newtonian physics
![Page 21: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/21.jpg)
Areal Density
![Page 22: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/22.jpg)
Interlude: Math
• Rotation is fixed
– Constant angular
velocity (CAV)
• Newton tells us that…
v = ω ∙ r
• Throughput increases
with radius!
![Page 23: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/23.jpg)
Interlude: Math
• Commodity drives
are available at:
– 5400-15000 RPM
– Usually 7200 RPM
• What does it mean
for latency?
7200
60
= 120 Revolutions/ Second
1
120
= 0.08333
~ 8.33ms!
![Page 24: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/24.jpg)
In practice?
• Modern drives give you:
200+ MB/s
300 IOPS
• Pure random access nets only 1.2MB/s!
![Page 25: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/25.jpg)
RIGHT.WHAT CAN WE DO ABOUT IT?
![Page 26: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/26.jpg)
Fine-tuning
• Provision more RAM
• Careful index structure
– Represent IPs as UNSIGNED INT for 75% reduction
– Implement better UUIDs¹for 30% reduction
¹ Store UUID in an optimized way, Percona blog
![Page 27: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/27.jpg)
… or use a sledgehammer!
• RAID 0 (and variants)
employ striping
• Data is distributed to
multiple spindles
• If it sounds familiar…
– It is!
– We call it “sharding”
![Page 28: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/28.jpg)
It’s turtles all the way down
• Don’t jump to
conclusions!
– RAID 0 is impractical
– RAID 5 may be slow
– RAID 10 is expensive
– etc.
• Do your homework
• Benchmark!
![Page 29: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/29.jpg)
ACT II: I’LL USE MY CREDIT CARD
![Page 30: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/30.jpg)
Let’s talk SSDs
• Non-volatile RAM
• Lots of IOPS
• Expensive :-)
• Same caveats
apply…
![Page 31: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/31.jpg)
Let’s talk SSDs
• Value starts at “1”
• Electrons accrue in the
floating gate
• After programming,
value becomes “0”
• Electrons are drained
to reset value to “0”
![Page 32: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/32.jpg)
Surprise and Terror
• “Draining” is destructive!
• Limited erases
• Limited lifespan!
![Page 33: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/33.jpg)
Wear Leveling
![Page 34: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/34.jpg)
Caveats, remember?
• Addressing
– Cells (1 bit) – not addressable
![Page 35: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/35.jpg)
Caveats, remember?
• Addressing
– Cells (1 bit) – not addressable
– Pages (0.5-8KB)
![Page 36: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/36.jpg)
Caveats, remember?
• Addressing
– Cells (1 bit) – not addressable
– Pages (0.5-8KB)
– Blocks (32-64 pages)
![Page 37: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/37.jpg)
Caveats, remember?
• Addressing
– Cells (1 bit) – not addressable
– Pages (0.5-8KB)
– Blocks (32-64 pages)
• Why do you care?
– Reads/writes on a page
– But erasure on a block
![Page 38: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/38.jpg)
Write Amplification
1
1
1
1
1
Δ = 1 bit Δ = 1 block!
![Page 39: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/39.jpg)
Surprising Results
• Defragmentation
– Relocates blocks
– Contiguous files
– Lower LBAs
– Background job
• Bad, bad, bad!
– No benefit with SSDs
– Major write load!
![Page 40: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/40.jpg)
Background GC
7
5
6
1
2
Block A Block B
Block C Block D
1 2 5
6 7
Block A Block B
Block C Block D
![Page 41: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/41.jpg)
Surprising Results
• What happens when
you delete file?
– Not much
– Bit flip on file table
– Space is not reclaimed
• Result?
– SATA TRIM command
7
5
6
1
2
Block A Block B
Block C Block D
![Page 42: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/42.jpg)
SSD Takeaways
• A moving target
–File systems
–Data structures
–Longevity
• As usual:
–Benchmark
–Monitor
![Page 43: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/43.jpg)
EPILOGUE
“LET ME EMBRACE THEE, SOUR ADVERSITY,FOR WISE MEN SAYIT IS THE WISEST COURSE.”
![Page 44: How Shit Works: Storage](https://reader031.fdocuments.us/reader031/viewer/2022022203/587259e51a28ab31498b4bc9/html5/thumbnails/44.jpg)
WE’RE DONE HERE!… AND YES, WE’RE HIRING :-)
Thank you for listening
@tomerg
http://il.linkedin.com/in/tomergabel
Wix Engineering blog:
http://engineering.wix.com