Hybrid Storage Pools (Now with the benefit of hindsight!)
description
Transcript of Hybrid Storage Pools (Now with the benefit of hindsight!)
Adam Leventhal @ahl
Hybrid Storage Pools Using Disk and Flash with ZFS
(Now with the benefit of hindsight!)
Flash Emerges • Storage medium invented in 1980
– Very fast reads (~50us) – Fast writes (~300us) – High IOPS / low latency – Limited number of write cycles
• 2004: flash cost as much as DRAM • 2007: flash cost was right between DRAM and disk
Disk is dead… just like tape • Many predicted the death of disk or relegaSon of disk to backup • Didn’t happen • All-‐flash soluSons sSll trying to gain mass adopSon
ZFS circa 2007 • Sun was developing a ZFS-‐based storage appliance (Fishworks) • ZFS: enterprise class storage on commodity hardware • Problem: enterprise storage was a lot faster • Looked at tradiSonal soluSons
– NV-‐DRAM to accelerate writes – Massive DRAM to cache reads
• But it was just the right Sme for flash…
Hybrid Storage Pool (HSP) • Use flash as a storage Ser • Between DRAM and disk in cost, capacity, latency, throughput • Use commodity disks
– 7200 RPM – Good throughput – Great $/GB and wa_s/GB
• Combine disk, flash, DRAM into a hybrid pool • In ZFS:
– ZFS intent log (ZIL) for write acceleraSon – L2ARC to extend the reach of the ZFS cache
Hybrid Storage Pool Example
Hybrid Storage Pool Example
ZFS Caching • AdapSve Replacement Cache (ARC) as the primary DRAM cache • L2ARC developed by Brendan Gregg to use external (flash) devices • Takes into account opSmal IO pa_erns for flash
– Random, small writes = hastened failure – SequenSal, large writes = happy SSDs – Thro_les writes to preserve longevity
• Uses predicSve evicSon to idenSfy blocks to cache
L2ARC Problems • Non-‐persistent
– Aeer a reboot or fatal system failure, the cache is empty • Slow to warm up
– Will only write to one device at a Sme -‐> best case 1TB / hour – Real world example 2TB in 24 hours
• Conceptually most of the way there • No real way to tune it to a workload • Not much real-‐world tesSng and tuning done
Changing Landscape • DRAM prices have dropped dramaScally • Large memory systems available (3TB+) • NAND flash is geing trickier to build around • Endurance and performance decrease as lithography and price decrease
– MLC and “TLC” (volume flash) have parScularly short lives
• Running into size limitaSons – 32nm in 2008 – 19nm today – Supposed floor around 11nm
• SSDs are becoming increasingly complex
What to do today? • The L2ARC can help
– The SSD space is large and highly varied – Generally cheap, laptop SSDs suffice for the L2ARC – Give it enough Sme to warm up (hours or days) – Measure the impact on your actual workload
• The ARC is great and relaSvely simple – Load up on DRAM
Next for ZFS • For the L2ARC to be viable, it needs to be persistent • Lots of performance work needed
– Run it through a bunch of real-‐world use cases – Make it easy to collect coherent, relevant data – Create the right knobs for users to turn
• There are a few companies using the L2ARC • Hopefully they will take up the mantle
Questions?