Managing 100TB of small files…
description
Transcript of Managing 100TB of small files…
![Page 1: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/1.jpg)
Managing 100TB of small files…
Prospero Media Storage
IGT –
July 2011
Event
![Page 2: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/2.jpg)
Numbers
• 70TB used space• 700 million files• 200GB and 250,000 files uploaded every day• 1200Mbps bandwidth throughput in peak• 180TB of data is being served out monthly• 3700 Hits per second in peak • 40 storage node servers – 300TB raw space• $0.13 per GB
![Page 3: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/3.jpg)
Motivation
• Web 2.0 content serving paradigm shift– Too many files
• 12M users x 1 file = very long tail– Too many connections
• 1M users + keepalive = 1M connections– Living with modern content in web 2.0
• 1 file x (thumbnail + iPhone + Mac) = 3 file copies
![Page 4: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/4.jpg)
Traditional Architecture
Centralized Storage (NAS, SAN, DAS etc.)
HT TP
IO IO IO IO
![Page 5: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/5.jpg)
Traditional Architecture
Centralized Storage (NAS, SAN, DAS etc.)
HT TP – TOO MANY CONNECTIONS
IO IO IO IO
![Page 6: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/6.jpg)
Traditional Architecture
Centralized Storage (NAS, SAN, DAS etc.)
HT TP
IO IO IO IOIO IOIO
![Page 7: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/7.jpg)
Traditional Architecture
To o m u c h I O
HT TP
IO IO IOIO IOIOIO
![Page 8: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/8.jpg)
Traditional Architecture
Centralized Storage (NAS, SAN, DAS etc.)
HT TP
IO IO IO IOIO IOIO
Cache
![Page 9: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/9.jpg)
“There are only two hard things in Computer Science: cache invalidation and naming things”.
-- Tim Bray quoting Phil Karlton
![Page 10: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/10.jpg)
Architecture goals
• Symmetric identical server nodes– Simplified management and scaling– Linear scaling out
• No functional / role servers– No single point of failure– No performance bottlenecks
• Multiple datacenters support– DRP support– Geo load distribution
![Page 11: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/11.jpg)
Meet Prospero
• Distributed Web content storage system
• Full blown HTTP support
• Runs on low cost commodity hardware
• Adjustable file level replication controls redundancy policy for every content type
• Provides dynamic image manipulation
![Page 12: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/12.jpg)
How do we do it?
![Page 13: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/13.jpg)
Designed to fail
• Fallback for every operation– Geographical, machine, storage medium
• Write never fails– All files will reach their destination
• Journaling– Tracking all uploaded files
• Pending jobs – Guaranteed file distribution
![Page 14: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/14.jpg)
How do we achieve this
• Control the input– define the only unified API
• Functional process isolation– every function deserves its own process by default– watchdogs– monitors– alerts
![Page 15: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/15.jpg)
5.static 7.static3.static1.static
0.static
00-1f
2.static
20-3f
6.static
60-7f
4.static
40-5f
HTTP HTTP HTTP
HTTP HTTP HTTP
get 37D815B5.jpg Go to 37 range servers Fallback if not found
![Page 16: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/16.jpg)
Fallback Example
![Page 17: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/17.jpg)
Node Architecture
Output – Front• lighttpd forkHTTP handler• Dynamic image processor• lighttpd forkVan Gogh• customCross Datacenter Distributer• customLocal Datacenter Distributer• Tornado web framework
applicationSupervisor HTTP handler
Input – Supervisor
![Page 18: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/18.jpg)
Real Life
![Page 19: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/19.jpg)
It’s all about performance
• Non blocking IO, readiness notification (epoll)• Asynchronous file IO (AIO)• Zero copy (sendfile)• Memory maps• Inter-process binary protocols• UNIX socket• Minimize dynamic memory allocation• lighttpd memory footprint: 50MB
![Page 20: Managing 100TB of small files…](https://reader035.fdocuments.us/reader035/viewer/2022081512/56815e3a550346895dcca0d8/html5/thumbnails/20.jpg)
Lessons learnt
• Be symmetric• Control the input• Design to failure• Performance matters again• Simple is hard but a must