Towards Elastic Operating Systems

48
Towards Elastic Operating Systems Amit Gupta Ehab Ababneh Richard Han Eric Keller University of Colorado, Boulder

description

Towards Elastic Operating Systems. Amit Gupta Ehab Ababneh Richard Han Eric Keller. University of Colorado, Boulder. OS + Cloud Today. OS/Process. ELB/ Cloud Mgr. Resources Limited Thrashing CPUs limited I/O bottlenecks Network Storage. P resent Workarounds - PowerPoint PPT Presentation

Transcript of Towards Elastic Operating Systems

Page 1: Towards Elastic Operating Systems

Towards Elastic Operating SystemsAmit GuptaEhab AbabnehRichard HanEric Keller

University of Colorado,Boulder

Page 2: Towards Elastic Operating Systems

2

OS/Process

Resources Limited• Thrashing• CPUs limited• I/O bottlenecks

• Network• Storage

Present Workarounds• Additional Scripting/Code changes• Extra Modules/Frameworks

• Coordination• Synch/Aggregating State

OS + Cloud Today

ELB/CloudMgr

Page 3: Towards Elastic Operating Systems

3

Advantages• Expands available Memory • Extends the scope of Multithreaded

Parallelism (More CPUs available)• Mitigates I/O bottlenecks• Network• Storage

Stretch ProcessOS/Process

Page 4: Towards Elastic Operating Systems

4

ElasticOS : Our Vision

Page 5: Towards Elastic Operating Systems

5

ElasticOS: Our Goals “Elasticity” as an OS Service

Elasticize all resources – Memory,CPU, Network, …

Single machine abstraction Apps unaware whether they’re running on

1 machine or 1000 machines Simpler Parallelism

Compatible with an existing OS (e.g Linux, …)

Page 6: Towards Elastic Operating Systems

6“Stretched” Process Unified Address Space

OS/Process

V R

Elastic Page TableLocation

Page 7: Towards Elastic Operating Systems

7

Movable Execution ContextOS/Process

• OS handles elasticity – Apps don’t change• Partition locality across multiple nodes• Useful for single (and multiple) threads

• For multiple threads, seamlessly exploit network I/O and CPU parallelism

Page 8: Towards Elastic Operating Systems

8

Replicate Code, Partition Data

CODE

Data 1

Data 2

CODE CODE

• Unique copy of data (unlike DSM)• Execution context follows data

(unlike Process Migration, SSI )

Page 9: Towards Elastic Operating Systems

9

Exploiting Elastic Locality• We need an adaptive page clustering

algorithm• LRU, NSWAP i.e “always pull”• Execution follows data i.e “always

jump”• Hybrid (Initial): Pull pages, then Jump

Page 10: Towards Elastic Operating Systems

10

Status and Future Work Complete our initial prototype Improve our page placement

algorithm Improve context jump efficiency Investigate Fault Tolerance issues

Page 11: Towards Elastic Operating Systems

Thank YouQuestions

?

Contact:[email protected]

Page 12: Towards Elastic Operating Systems

12

Algorithm Performance(1)

Page 13: Towards Elastic Operating Systems

13

Algorithm Performance(2)

Page 14: Towards Elastic Operating Systems

14

Page PlacementMultinode Adaptive LRU

CPUs

Mem

Swap CPUs Swap

Mem

Pulls Threshold Reached !Pull First

JumpExecution

Context

Page 15: Towards Elastic Operating Systems

15

Locality in a Single Thread

CPUs

Mem

Swap CPUs Swap

Mem

Temporal Locality

Page 16: Towards Elastic Operating Systems

16

Locality across Multiple Threads

CPUs

Mem

Swap CPUs Swap

Mem

CPUs Swap

Page 17: Towards Elastic Operating Systems

17

Unlike DSM…

Page 18: Towards Elastic Operating Systems

18

Exploiting Elastic Locality• Assumptions • Replicate Code Pages, Place Data Pages

(vs DSM)• We need an adaptive page clustering

algorithm• LRU, NSWAP• Us (Initial): Pull pages, then Jump

Page 19: Towards Elastic Operating Systems

19

Replicate Code, Distribute Data

CODE

Data 1

Data 2

CODE CODE

• Unique copy of data (vs DSM)• Execution context follows data

(vs Process Migration)

AccessingData 1 Accessing

Data 2Accessing

Data 1

Page 20: Towards Elastic Operating Systems

20

Benefits OS handles elasticity – Apps don’t

change Partition locality across multiple nodes

Useful for single (and multiple) threads For multiple threads, seamlessly

exploit network I/O and CPU parallelism

Page 21: Towards Elastic Operating Systems

21

Benefits (delete) OS handles elasticity

Application ideally runs unmodified Application is naturally partitioned …

By Page Access locality By seamlessly exploiting multithreaded

parallelism By intelligent page placement

Page 22: Towards Elastic Operating Systems

22

How should we place pages ?

Page 23: Towards Elastic Operating Systems

23

Execution Context JumpingA single thread example

Address Space

Node 1

Address Space

Node 2

Process

TIME

Page 24: Towards Elastic Operating Systems

24

Address Space

Node 1

Address Space

Node 2

Process

V RPage Table

IP Addr

“Stretch” a Process Unified Address Space

Page 25: Towards Elastic Operating Systems

25

Operating Systems Today Resource Limit = 1 Node

OS

CPUs

Mem

Disks Process

Page 26: Towards Elastic Operating Systems

26

Cloud Applications at Scale

Cloud Manager

LoadBalancer

Process

More Resources ?

ProcessProcess

Framework (eg. Map Reduce)

Partitioned Data

Partitioned Data

Partitioned Data

More Queries ?

Page 27: Towards Elastic Operating Systems

27

Our findings Important Tradeoff

Data Page Pulls Vs Execution Context Jumps

Latency cost is realistic Our Algorithm: Worst case scenario

“always pull” == NSWAP marginal improvements

Page 28: Towards Elastic Operating Systems

28

Advantages Natural Groupings: Threads &

Pages Align resources with inherent

parallelism Leverage existing mechanisms

for synchronization

Page 29: Towards Elastic Operating Systems

29

“Stretch” a Process : Unified Address Space

V R

CPUs

Mem

Swap

CPUs

Mem

Swap

Page Table

A “Stretched” Process =

Collection of Pages + Other Resources { Across Several Machines }

IP Addr

Page 30: Towards Elastic Operating Systems

30

delete Exec. context follows Data Replicate Code Pages

Read-Only => No Consistency burden Smartly distribute Data Pages Execution context can jump

Moves towards data *Converse also allowed*

Page 31: Towards Elastic Operating Systems

31

Elasticity in Cloud Apps Today

D1

~~~~

~~~~

~~~~

Input Data

….~~~

~~~~

~~~~

~

CPUs

Mem

Disk

Output Data

D2 Dx

Page 32: Towards Elastic Operating Systems

32

D1

Load Balancer

….~~~

~~~~

~~~~

~

CPUs

Mem

Disk

Output Data

D2 Dx

Input Queries

Dy

Page 33: Towards Elastic Operating Systems

33

(delete)Goals : Elasticity dimensions Extend Elasticity to

Memory CPU I/O

Network Storage

Page 34: Towards Elastic Operating Systems

34

Thank You

Page 35: Towards Elastic Operating Systems

35

Bang Head Here !

Page 36: Towards Elastic Operating Systems

36

Stretching a Thread

Page 37: Towards Elastic Operating Systems

37

Overlapping Elastic Processes

Page 38: Towards Elastic Operating Systems

38

*Code Follows Data*

Page 39: Towards Elastic Operating Systems

39

Application Locality

Page 40: Towards Elastic Operating Systems

40

Possible Animation?

Page 41: Towards Elastic Operating Systems

41

Multinode Adaptive LRU

Page 42: Towards Elastic Operating Systems

42

Possible Animation?

Page 43: Towards Elastic Operating Systems

43

Open Topics Fault tolerance

Stack handling

Dynamic Linked Libraries Locking

Page 44: Towards Elastic Operating Systems

44

Elastic Page TableVirtual Addr

Phy. Addr Valid Node (IP addr)

A B 1 LocalhostC D 0 LocalhostE F 1 128.138.60.

1G H 0 128.138.60.

1

Local MemSwap spaceRemote Mem

RemoteSwap

Page 45: Towards Elastic Operating Systems

45

“Stretch” a Process Move beyond resource boundaries of

ONE machine CPU Memory Network, I/O

Page 46: Towards Elastic Operating Systems

46

D1 D2

~~~~

~~~~

~~~~

Input Data

….~~~

~~~~

~~~~

~

CPUs

Mem

Disk

Output Data

CPUs

Mem

Disk

Page 47: Towards Elastic Operating Systems

47

D1

CPUs

Mem

Disk

D2

CPUs

Mem

Disk

~~~~

~~~~

~~~~

Data

Page 48: Towards Elastic Operating Systems

48

Reinventing Elasticity Wheel