Yi Cloud10 Pres

download Yi Cloud10 Pres

of 29

Transcript of Yi Cloud10 Pres

  • 8/8/2019 Yi Cloud10 Pres

    1/29

    Reducing Costs of Spot Instances via

    Checkpointing in the Amazon Elastic

    Compute Cloud

    Sangho Yi, Derrick Kondo, and Artur Andrzejak

    - at CLOUD 2010 -

    Sangho Yi

    INRIA Grenoble, France([email protected])

  • 8/8/2019 Yi Cloud10 Pres

    2/29

    Outline

    Motivation and background

    Spot Instances on Amazon EC2

    Possible checkpointing strategies Performance evaluation

    Conclusions and future work

  • 8/8/2019 Yi Cloud10 Pres

    3/29

    Motivation:

    Market-based Pricing in Cloud Amazon released Spot instances in Dec.

    2009.

    Current price depends on users bids, andavailable computing resources.

    (based on the supply and demand relationship)

    30~50% price of normal instances.

    In-bid users get instances.

    Out-of-bid users lose their instances.

  • 8/8/2019 Yi Cloud10 Pres

    4/29

    Pricing in Amazon EC2

    (normal instances on demand)Virtual MachineInstance Type

    Linux

    Cost/hour (USD)

    MS Windows

    Cost/hour (USD)

    Standard Small 0.095 0.12

    Standard Large 0.38 0.48

    Standard Extra Large 0.76 0.96High CPU - Medium 0.19 0.29

    High CPU - Extra Large 0.76 1.16Data Transfer

    Type

    Cost/GB-Month

    (USD)

    Inbound Transfer 0.10

    First 10 TB 0.17

    Next 10-50 TB 0.13

    Next 50-150 TB 0.11

    Over 150 TB 0.10

    ** Pricing on eu-west (Ireland), 2010

  • 8/8/2019 Yi Cloud10 Pres

    5/29

    Pricing in Amazon EC2

    (spot-instances bidding system)

    Source: http:/cloudexchange.org

  • 8/8/2019 Yi Cloud10 Pres

    6/29

    An Example of Failures as a

    Function of Users Bid

    Users bid = 0.081

    Availability Availability Availability

    Failure Failure

    Current Price

  • 8/8/2019 Yi Cloud10 Pres

    7/29

    PDF of Failure arrivals

  • 8/8/2019 Yi Cloud10 Pres

    8/29

    Goal

    Understand impact of checkpointing methods onspot instances in terms ofmonetary cost and

    execution time

    0

    Price

    Time (hour)1 2 3 4 5 6 7 8 9 10

    0.1

    0.20.3

    0.4Bid

    availability (5) availability (3)failure (2)

  • 8/8/2019 Yi Cloud10 Pres

    9/29

    Possible Checkpointing

    Strategies In terms of taking points

    The optimal case

    Without checkpointing

    Hourly checkpointing

    Rising edge-driven checkpointing

    Checkpointing with adaptive decision whetherto skip or take a checkpoint

    Combinations of aboves

  • 8/8/2019 Yi Cloud10 Pres

    10/29

    Characteristics of Strategies

    Hourly

    Pay as you go (with checkpointing per hour)

    Rising edge-driven Takes when price goes up (except for out-

    of-bid cases)

    Adaptive taking point decision

    Has overhead for decision, also thedecision may not work well in some cases.

  • 8/8/2019 Yi Cloud10 Pres

    11/29

    Hourly vs. Edge-driven

    tc

    tc

    tc

    Time (minute)0 60 120 180

    Failure

    (without

    payment)

    Recovery

    Pay per hour Pay per hour Pay per hour

    Task execution Task execution Task execution

    Task

    execution

    Time

    Price

    fora

    s

    potinstance

    Users bid

    Available duration Available duration

    Failure

    Recovery

    : checkpoint

    : rising edge

  • 8/8/2019 Yi Cloud10 Pres

    12/29

    Adaptive Taking Point Decision

    Based on time series analysis,

    it compares the expected costs at a certain point

    when Taking a checkpoint

    Skipping a checkpoint

    Then, selects the lower-cost option.

  • 8/8/2019 Yi Cloud10 Pres

    13/29

    System Model

  • 8/8/2019 Yi Cloud10 Pres

    14/29

    To take or skip, that is the question

    tc

    tc

    Time0

    Failure

    Recovery

    (short rollback)

    Task execution Task execution Task execution

    tc

    Time0

    Failure

    Recovery

    (long rollback)

    Task execution Task execution

    (a) When taking a checkpoint at the current time

    Current time

    Current time

    (b) When skipping to take a checkpoint at the current time

  • 8/8/2019 Yi Cloud10 Pres

    15/29

    Cost analysis on the choice

  • 8/8/2019 Yi Cloud10 Pres

    16/29

    Delayed termination a way of

    exploiting Amazons policy

    Time (minute)0 60 120 180

    Failure(out-of-bid)

    Pay per hour Pay per hour Pay per hour

    Task execution Task execution Task execution

    Task

    execution Without payment

    Time (minute)0 60 120 180

    End of execution

    (when user stops)

    Pay per hour Pay per hour Pay per hour

    Task execution Task execution Task execution

    Pay last partial-hour

    (a) When a users spot instance is out-of-bid

    (b) When a user stops using spot instance

    Time (minute)0 60 120 180

    Users bid

    Available durationPrice for spot instance

  • 8/8/2019 Yi Cloud10 Pres

    17/29

    Performance evaluation

    Trace-based simulation

    http://spotckpt.sourceforge.net

    42 spot instances (now Amazon has 64)

    24 (12x2) different checkpointing strategies

  • 8/8/2019 Yi Cloud10 Pres

    18/29

    Checkpointing strategies

    OPT: The optimal case

    NONE: Without checkpointing

    H: Hourly

    E: Rising edge-driven

    AH: H with adaptive decision

    AE: E with adaptive decision

    H+E, H+AE, AH+E, AH+AE

    AF(10): adaptive decision every 10 mins

    AF(30): adaptive decision every 30 mins

  • 8/8/2019 Yi Cloud10 Pres

    19/29

    Results #1: Monetary cost

    Total cost according to Users bid on two spotinstances

  • 8/8/2019 Yi Cloud10 Pres

    20/29

    Results #2: Execution time

    Total execution time according to Users bid onthe spot instances.

  • 8/8/2019 Yi Cloud10 Pres

    21/29

    Results #3: Normalized one

    Results #4: Benefit of using DT

  • 8/8/2019 Yi Cloud10 Pres

    22/29

    Results #5: Best ones on the

    mean price bidding

    Hourly checkpointing shows good performance.

    Fine-grained checkpointing shows good performance

    sometimes.

    Adaptive decision does not show significant impact onthe results.

  • 8/8/2019 Yi Cloud10 Pres

    23/29

    Summary

    Checkpointing strategies have ~50% higheroverhead (execution time * monetary cost)

    than the optimal case.

    Thus, better checkpointing strategies arerequired to reduce both cost and time.

  • 8/8/2019 Yi Cloud10 Pres

    24/29

    Current and Future work

    1) Finding relationship between past andfuture price of spot instances

    2) Developing efficient checkpointingmethod to minimize monetary cost and

    execution time (to appear @ EKC 2010)

    3) Providing decision model to determineusers bid, given reliability and performance

    constraints (to appear @ MASCOTS 2010)

  • 8/8/2019 Yi Cloud10 Pres

    25/29

    Thanks! / Questions?

    Lets walk (or work) on the Cloud!

  • 8/8/2019 Yi Cloud10 Pres

    26/29

    ** On-going work on 2)

    Sangho Yi and Derrick Kondo How CheckpointingCan Reduce Cost of Using Clouds?, EKC 2010, July,

    2010. (accepted)

    We used the known characteristics of AmazonsSpot Instances, and better checkpoint decision

    mechanism.

    The proposed scheme shows much less

    overhead for most cases. (~20% overhead)

  • 8/8/2019 Yi Cloud10 Pres

    27/29

  • 8/8/2019 Yi Cloud10 Pres

    28/29

    ** On-going work on 2)

  • 8/8/2019 Yi Cloud10 Pres

    29/29

    Lets Make Our Cloud Better!

    On a Cloud Vendors point of view,

    TODO

    (efficiently) Adapt management strategies to fullyutilize the limited (very-large) resources (with less

    cost)

    Better

    Utilization,Less Cost

    More

    Cloud Users

    MoreMoney

    More Investment

    on R&D