How to Run from a Zombie: CloudStack Distributed Process Management
-
Upload
john-burwell -
Category
Technology
-
view
204 -
download
1
description
Transcript of How to Run from a Zombie: CloudStack Distributed Process Management
![Page 1: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/1.jpg)
HOW TO RUN FROM A ZOMBIE: CLOUDSTACK DISTRIBUTED PROCESS
MANAGEMENT John Burwell
([email protected] | [email protected]@john_burwell)
Tuesday, June 25, 13
![Page 2: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/2.jpg)
I Am Not A Zombie
• Apache CloudStack PMC Member
• Consulting Engineer @ Basho Technologies
• Ran operations and designed automated provisioning for hybrid analytic/virtualization clouds
• Led architectural design and server-side development of a SaaS physical security platform
Tuesday, June 25, 13
![Page 3: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/3.jpg)
Current Process Management
• No consistent system-wide model
• Fail slowly, fail quietly
• Resource overcommitment issues
• Lack of instrumentation
Tuesday, June 25, 13
![Page 4: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/4.jpg)
What is a cloud?
Tuesday, June 25, 13
![Page 5: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/5.jpg)
Tuesday, June 25, 13
![Page 6: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/6.jpg)
Hopefully not ...
Tuesday, June 25, 13
![Page 7: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/7.jpg)
Tuesday, June 25, 13
![Page 8: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/8.jpg)
Tuesday, June 25, 13
![Page 9: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/9.jpg)
Tuesday, June 25, 13
![Page 10: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/10.jpg)
Hosts
VirtualRouters
VirtualMachines
PrimaryStorage
NetworksSecondaryStorage
Load����������� ������������������ Balancers
Zone
Cluster Pod
Tuesday, June 25, 13
![Page 11: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/11.jpg)
ResourceProcess State
A����������� ������������������ “thing”����������� ������������������ with����������� ������������������ a����������� ������������������ bounded����������� ������������������ capacity
PartitionOrchestration
Tuesday, June 25, 13
![Page 12: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/12.jpg)
At it’s core, CloudStack ...
Integrates infrastructure components
Manages resources
Tuesday, June 25, 13
![Page 13: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/13.jpg)
Tuesday, June 25, 13
![Page 14: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/14.jpg)
Consistency
AvailabilityPartition����������� ������������������ Tolerance
PICK 2
Tuesday, June 25, 13
![Page 15: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/15.jpg)
CloudStack provides zones, clusters, and pods to partition resources.
Tuesday, June 25, 13
![Page 16: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/16.jpg)
Orchestration operations are eventually consistent
Tuesday, June 25, 13
![Page 17: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/17.jpg)
Tuesday, June 25, 13
![Page 18: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/18.jpg)
... but resource operations must be consistent & serialized.
Tuesday, June 25, 13
![Page 19: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/19.jpg)
Tuesday, June 25, 13
![Page 20: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/20.jpg)
A system can not be simultaneouslyconsistent and available.
Tuesday, June 25, 13
![Page 21: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/21.jpg)
Orchestration����������� ������������������ ProcessesAP
CP Resource����������� ������������������ Management����������� ������������������ Processes
Tuesday, June 25, 13
![Page 22: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/22.jpg)
CP Resource?
• Ordered/Serialized operations
• Prevent overcommitment
• Execution location independent
• Lock free
Tuesday, June 25, 13
![Page 23: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/23.jpg)
Orchestration Coordination
1. Build a list of commands to be executed against a resource
2. Enqueue the list of commands to the resource management layer for execution
3. A process applies the commands to the resource
4. Aggregate the results from the reply
Tuesday, June 25, 13
![Page 24: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/24.jpg)
ResourceProcess State
Queue
1
1
Unit����������� ������������������ of����������� ������������������ Work
1
1
ExclusiveConsumer
Tuesday, June 25, 13
![Page 25: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/25.jpg)
Unit Of Work (UoW)
• Definition: A ordered list of commands executed against a one and only one resource.
• Created in the Orchestration layer
• Executed by processes in the resource management layer
• Failure of a command halts UoW execution
Tuesday, June 25, 13
![Page 26: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/26.jpg)
Instrumentation
• Collect and report statistics on a per resource basis
• Inspect and remove pending UoWs for a resource
• Kill a running process
• View a history of UoWs completed by a resource
Tuesday, June 25, 13
![Page 27: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/27.jpg)
• Process execution fails
• Resources become unavailable
• Slow consumers
When Gravity Fails
Tuesday, June 25, 13
![Page 28: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/28.jpg)
Fail Fast; Fail Loudly
• If the resource can be returned to a consistent state, reply with the process failure
• If the resource can not be returned to a consistent state, change the transition the resource to a failure state, drain the queue of pending UoWs, and reply with the process failure for each UoW
• The orchestration layer will determine the appropriate recovery strategy (e.g. retry request on another resource)
Tuesday, June 25, 13
![Page 29: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/29.jpg)
Preventing A Logjam
• Bounded Queues
• Request and Message Timeouts
• A failure to enqueue a request or a request timeout trigger a the resource’s circuit breaker
Tuesday, June 25, 13
![Page 30: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/30.jpg)
How could we implement this model?
Tuesday, June 25, 13
![Page 31: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/31.jpg)
Lightweight Threads
A thread that is not scheduled by theoperating system -- avoiding context
switch overhead.
Tuesday, June 25, 13
![Page 32: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/32.jpg)
Actor Model
• An actor represents state and behavior
• Communicate by message passing
• Each actor is allocated a lightweight thread and mailbox
• Location independent
Tuesday, June 25, 13
![Page 33: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/33.jpg)
Mailbox
ResourceActor
FSM
Orchestration
Unit����������� ������������������ of����������� ������������������ Work
Tuesday, June 25, 13
![Page 34: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/34.jpg)
Java Actor Frameworks
• Akka (http://akka.io)
• Quasar (https://github.com/puniverse/quasar)
Tuesday, June 25, 13
![Page 35: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/35.jpg)
Summary
• Orchestration and Resource Management must be properly divided to satisfy CAP
• To provide resource serialization guarantees, assign a queue and a process to each resource
• Fast fast, fail loudly
• An Actor Model based on lightweight threads may provide the scalability required to dedicate a queue and process per resource
Tuesday, June 25, 13
![Page 36: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/36.jpg)
Thoughts? Questions?
Tuesday, June 25, 13
![Page 37: How to Run from a Zombie: CloudStack Distributed Process Management](https://reader031.fdocuments.us/reader031/viewer/2022020218/5594edc01a28ab965d8b4719/html5/thumbnails/37.jpg)
Thank you!
Slides available @ http://speakerdeck.com/jburwell
Tuesday, June 25, 13