AWS Resilience and Chaos Engineering Day
Transcript of AWS Resilience and Chaos Engineering Day
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS Resilience and Chaos Engineering Day
Gunnar Grosch
Sr. Developer Advocate, AWS
@gunnargrosch
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
What we’ll cover in this session
• Challenges with distributed systems
• What chaos engineering is
• Phases of chaos engineering
• Common use cases for chaos engineering
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
© 2021, Amazon Web Services, Inc. or its affiliates.
Challenges withdistributed systems
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Distributed systems are complex
Message
Message
Reply
Reply
ServerNetworkClient
https://aws.amazon.com/builders-library/challenges-with-distributed-systems/
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Traditional testing is not enough
TESTING = VERIFYING A KNOWN CONDITION
Unit testingof components
Tested in isolation to ensure function meets expectations
Functional testingof integrations
Each execution path testedto assure expected results
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
And it can get more complicated…
IOError: No space left on device
close failed in file object destructor:
IOError: No space left on device
close failed in file object destructor:
IOError: No space left on device
close failed in file object destructor:
IOError: No space left on device
close failed in file object destructor:
IOError: No space left on device
close failed in file object destructor:
IOError: No space left on device
close failed in file object destructor:
IOError: No space left on device
close failed in file object destructor:
IOError: No space left on device
close failed in file object destructor:
IOError: No space left on device
close failed in file object destructor:
logfile
ROTATE
logfile.0
ROTATE
logfile.1
ROTATE
logfile.2
ROTATE
logfile.3
ROTATE
logfile.n
ROTATE
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
© 2021, Amazon Web Services, Inc. or its affiliates.
What is chaos engineering?
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Chaos engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production
principlesofchaos.org
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Chaos engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production
principlesofchaos.org
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Chaos engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production
principlesofchaos.org
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Chaos engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production
principlesofchaos.org
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Chaos engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production
principlesofchaos.org
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
The process of chaos engineering
• Stressing an application by creating disruptive events
• Observing how the system responds
• Implementing improvements
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Fundamental goals with chaos engineering
• Improve resilience and performance
• Uncover hidden issues
• Expose blind spotsMonitoring, observability, and alarm
• And more
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
When to do chaos experiments
• Adding new code
• Creating new features
• Adding services
• Adding or changing dependencies
• Continuously
• And more
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Don’t ask what happens if a system fails; ask what happens when it fails
28
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
© 2021, Amazon Web Services, Inc. or its affiliates.
Phases of chaos engineering
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Phases of chaos engineering
Steady state
Hypothesis
Run experiment
Verify
Improve
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Phases of chaos engineering
Steady state
Hypothesis
Run experiment
Verify
Improve
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Phases of chaos engineering
Steady state
Hypothesis
Run experiment
Verify
Improve
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Phases of chaos engineering
Steady state
Hypothesis
Run experiment
Verify
Improve
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Phases of chaos engineering
Steady state
Hypothesis
Run experiment
Verify
Improve
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Phases of chaos engineering
Steady state
Hypothesis
Run experiment
Verify
Improve
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
© 2021, Amazon Web Services, Inc. or its affiliates.
Use cases
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Use cases
One-offexperiments
Periodic game days
Automated experiments
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Use cases
One-offexperiments
Periodic game days
Automated experiments
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Use cases
One-offexperiments
Periodic game days
Automated experiments
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Use cases
One-offexperiments
Periodic game days
Automated experiments
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Automated experiments
Recurring scheduled
experiments
Event-triggered experiments
Continuous delivery experiments
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Automated experiments
Recurring scheduled
experiments
Event-triggered experiments
Continuous delivery experiments
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Automated experiments
Recurring scheduled
experiments
Event-triggered experiments
Continuous delivery experiments
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Automated experiments
Recurring scheduled
experiments
Event-triggered experiments
Continuous delivery experiments
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Automated experiments
Recurring scheduled
experiments
Event-triggered experiments
Continuous delivery experiments
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Use cases
One-offexperiments
Periodic game days
Automated experiments
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Recap
• Challenges with distributed systems
• What chaos engineering is
• Phases of chaos engineering
• Common use cases for chaos engineering
© 2021, Amazon Web Services, Inc. or its affiliates.
AWS RESILIENCE AND CHAOS ENGINEERING DAY
Thank you!
© 2021, Amazon Web Services, Inc. or its affiliates.
Gunnar Grosch
Sr. Developer Advocate, AWS
@gunnargrosch