Scalable Object Storage with Apache CloudStack and Apache...
Transcript of Scalable Object Storage with Apache CloudStack and Apache...
![Page 1: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/1.jpg)
Scalable Object Storage with Apache CloudStack and Apache
Hadoop
February 26 2013
Chiradeep Vittal @chiradeep
![Page 2: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/2.jpg)
Agenda • What is CloudStack • Object Storage for IAAS • Current Architecture and Limitations • Requirements for Object Storage • Object Storage integrations in CloudStack • HDFS for Object Storage • Future directions
![Page 3: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/3.jpg)
• History!• Incubating in the Apache
Software Foundation since April 2012!
• Open Source since May 2010!
• In production since 2009!– Turnkey platform for delivering
IaaS clouds!– Full featured GUI, end-user API
and admin API!
Apache CloudStack
Build your cloud the way the world’s most successful
clouds are built!
![Page 4: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/4.jpg)
How did Amazon build its cloud?
Commodity Servers
Commodity Storage Networking
Open Source Xen Hypervisor
Amazon Orchestration Software
AWS API (EC2, S3, …)
Amazon eCommerce Platform
![Page 5: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/5.jpg)
How can YOU build a cloud?
Servers Storage Networking
Open Source Xen Hypervisor
Amazon Orchestration Software
AWS API (EC2, S3, …)
Amazon eCommerce Platform
Hypervisor (Xen/KVM/VMW/)
CloudStack Orchestration Software
Optional Portal
CloudStack or AWS API
![Page 6: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/6.jpg)
Secondary Storage Image
L3/L2 core
DC Edge
End users
Pod Pod Pod Pod
Zone Architecture
Pod
Access Sw
MySQL
CloudStack
Admin/User API
Primary Storage NFS/ISCSI/FC
Hypervisor (Xen /VMWare/KVM)
VM
VM
Snapshot
Snapshot
Image
Disk Disk
VM
![Page 7: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/7.jpg)
Cloud-Style Workloads
• Low cost – Standardized, cookie cutter infrastructure – Highly automated and efficient
• Application owns availability – At scale everything breaks – Focus on MTTR instead of MTBF
![Page 8: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/8.jpg)
Scale “At scale, everything breaks”
8% Kashi Venkatesh Vishwanath and Nachiappan Nagappan, Characterizing Cloud Compu3ng Hardware Reliability, SoCC’10
Annual Failure Rate of servers
Server failure comes from:!ᵒ 70% - hard disk!ᵒ 6% - RAID controller!ᵒ 5% - memory!ᵒ 18% - other factors!
Application can still fail for other reasons:!ᵒ Network failure!ᵒ Software bugs!ᵒ Human admin error!
-‐ Urs Hölzle, Google!
![Page 9: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/9.jpg)
Secondary Storage Image
L3/L2 core
DC Edge
Pod Pod Pod Pod
At scale…everything breaks
Pod
Access Sw
Primary Storage NFS/ISCSI/FC
Hypervisor (Xen /VMWare/KVM)
VM
VM
Snapshot
Snapshot
Image
Disk Disk
VM
![Page 10: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/10.jpg)
Region “West”
Zone “West-Alpha”
Zone “West-Beta”
Zone “West-Gamma”
Zone “West-Delta”
Low Latency Backbone (e.g., SONET ring)
Regions and zones
![Page 11: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/11.jpg)
Region “East”
Region “South”
Internet
Geographic separation
Region “West”
Low Latency
![Page 12: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/12.jpg)
Secondary Storage in CloudStack 4.0
• NFS server default – can be mounted by hypervisor – Easy to obtain, set up and operate
• Problems with NFS: – Scale: max limits of file systems
• Solution: CloudStack can manage multiple NFS stores (+ complexity)
– Performance • N hypervisors : 1 storage CPU / 1 network link
– Wide area suitability for cross-region storage • Chatty protocol
– Lack of replication
![Page 13: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/13.jpg)
Object Storage Technology
Region “West”
Zone “West-Alpha”
Zone “West-Beta”
Zone “West-Gamma”
Zone “West-Delta”
Object Storage in a region
• Replication • Audit • Repair • Maintenance
![Page 14: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/14.jpg)
Region “West”
Object Storage enables reliability
![Page 15: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/15.jpg)
Object Storage Technology
Region “West”
Object Storage also enables other applications
Object Store API Servers
• DropBox • Static Content • Archival
![Page 16: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/16.jpg)
Object Storage characteristics • Highly reliable and durable
– 99.9 % availability for AWS S3 – 99.999999999 % durability
• Massive scale – 1.3 trillion objects stored across 7 AWS regions [Nov 2012 figures] – Throughput: 830,000 requests per second
• Immutable objects – Objects cannot be modified, only deleted
• Simple API – PUT/POST objects, GET objects, DELETE objects – No seek / no mutation / no POSIX API
• Flat namespace – Everything stored in buckets. – Bucket names are unique – Buckets can only contain objects, not other buckets
• Cheap and getting cheaper
![Page 17: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/17.jpg)
CloudStack S3 API Server
Object Storage Technology
S3 API Servers
MySQL
![Page 18: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/18.jpg)
CloudStack S3 API Server • Understands AWS S3 REST-style and SOAP API • Pluggable backend
– Backend storage needs to map simple calls to their API
• E.g., createContainer, saveObject, loadObject!– Default backend is a POSIX filesystem – Backend with Caringo Object Store (commercial
vendor) available – HDFS backend also available
• MySQL storage – Bucket -> object mapping – ACLs, bucket policies
![Page 19: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/19.jpg)
Object Store Integration into CloudStack
• For images and snapshots • Replacement for NFS secondary storage
Or Augmentation for NFS secondary storage
• Integrations available with – Riak CS – Openstack Swift
• New in 4.2 (upcoming): – Framework for integrating storage providers
![Page 20: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/20.jpg)
What do we want to build ? • Open source, ASL licensed object storage • Scales to at least 1 billion objects • Reliability and durability on par with S3 • S3 API (or similar, e.g., Google Storage) • Tooling around maintenance and
operation, specific to object storage
![Page 21: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/21.jpg)
The following slides are a design discussion
![Page 22: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/22.jpg)
Architecture of Scalable Object Storage
API Servers
Auth Servers
Object Servers Replicators/Auditors
Object Lookup Servers
![Page 23: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/23.jpg)
Why HDFS • ASF Project (Apache Hadoop) • Immutable objects, replication • Reliability, scale and performance
– 200 million objects in 1 cluster [Facebook] – 100 PB in 1 cluster [Facebook]
• Simple operation – Just add data nodes
![Page 24: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/24.jpg)
HDFS-based Object Storage
S3 API Servers
S3 Auth Servers
Data nodes
Namenode pair
HDFS API
![Page 25: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/25.jpg)
BUT • Name Node Scalability
– 150 bytes RAM / block – GC issues
• Name Node SPOF – Being addressed in the community✔
• Cross-zone replication – Rack-awareness placement ✔ – What if the zones are spread a little further apart?
• Storage for object metadata – ACLs, policies, timers
![Page 26: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/26.jpg)
Name Node scalability • 1 billion objects = 3 billion blocks (chunks)
– Average of 5 MB/object = 5 PB (actual), 15 PB (raw)
– 450 GB of RAM per Name Node • 150b x 3 x 10^9
– 16 TB / node => 1000 Data nodes • Requires Name Node federation ? • Or an approach like HAR files
![Page 27: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/27.jpg)
Name Node Federation
Extension: Federated NameNodes are HA pairs
![Page 28: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/28.jpg)
Federation issues • HA for name nodes • Namespace shards
– Map object -> name node • Requires another scalable key-value store
– HBase?
• Rebalancing between name nodes
![Page 29: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/29.jpg)
Replication over lossy/slower links A. Asynchronous replication
– Use distcp to replicate between clusters – 6 copies vs. 3 – Master/Slave relationship
• Possibility of loss of data during failover • Need coordination logic outside of HDFS
B. Synchronous replication – API server writes to 2 clusters and acks only
when both writes are successful – Availability compromised when one zone is
down
![Page 30: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/30.jpg)
CAP Theorem Consistency or Availability during partition
Many nuances
![Page 31: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/31.jpg)
Storage for object metadata A. Store it in HDFS along with the object
– Reads are expensive (e.g., to check ACL) – Mutable data, needs layer over HDFS
B. Use another storage system (e.g. HBase) – Name node federation also requires this.
C. Modify Name Node to store metadata – High performance – Not extensible
![Page 32: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/32.jpg)
Object store on HDFS Future • Viable for small-sized deployments
– Up to 100-200 million objects – Datacenters close together
• Larger deployments needs development – No effort ongoing at this time
![Page 33: Scalable Object Storage with Apache CloudStack and Apache ...archive.apachecon.com/na2013/presentations/26-Tuesday/Cloud_Crowd... · Agenda • What is CloudStack • Object Storage](https://reader030.fdocuments.us/reader030/viewer/2022040311/5d67856288c993d4378b99e7/html5/thumbnails/33.jpg)
Conclusion • CloudStack needs object storage for
“cloud-style” workloads • Object Storage is not easy • HDFS comes close but not close enough • Join the community!