Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues...
Transcript of Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues...
![Page 1: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/1.jpg)
Chaos Engineering at Jet.com
Rachel Reese | @rachelreese | rachelree.se
Jet Technology | @JetTechnology | tech.jet.com
![Page 2: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/2.jpg)
Why do you need chaos testing?
![Page 3: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/3.jpg)
The world is naturally chaotic
![Page 4: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/4.jpg)
But do we need more testing?
Unit Sanity Random Continuous
UsabilityA/BLocalizationAcceptance
Regression Performance Integration Security
![Page 5: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/5.jpg)
You’ve already tested all your
components in multiple ways.
![Page 6: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/6.jpg)
![Page 7: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/7.jpg)
It’s super important to test the interactions in your
environment
![Page 8: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/8.jpg)
Jet? Jet who?
![Page 9: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/9.jpg)
Taking on Amazon!
Launched July 22
• Both Apple & Android named our app as one of their tops for 2015
• Over 20k orders per day
• Over 10.5 million SKUs
• #4 marketplace worldwide
• 700 microservices
We’re hiring!http://jet.com/about-us/working-at-jet
![Page 10: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/10.jpg)
Azure Web sitesCloud
services VMs Service bus queues
Services bus topics
Blob storage
Table storage Queues Hadoop DNS Active
directorySQL Azure R
F# Paket FSharp.Data Chessie Unquote SQLProvider Python
DeedleFAK
EFSharp.Async React Node Angular SAS
StormElastic Search
Xamarin Microservices Consul Kafka PDW
Splunk Redis SQL Puppet JenkinsApache
HiveApache
Tez
![Page 11: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/11.jpg)
Microservices at Jet
![Page 12: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/12.jpg)
Microservices
• An application of the single responsibility principle at the service level.
• Has an input, produces an output.
Easy scalability
Independent releasability
More even distribution of complexityBenefits
“A class should have one, and only one, reason to change.”
![Page 13: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/13.jpg)
What is chaos engineering?
![Page 14: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/14.jpg)
It’s just wreaking havoc with your code
for fun, right?
![Page 15: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/15.jpg)
![Page 16: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/16.jpg)
Chaos Engineering is…
Controlled experiments on a distributed system that help you build confidence in the system’s ability to tolerate the inevitable failures.
![Page 17: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/17.jpg)
![Page 18: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/18.jpg)
Principles of Chaos Engineering
1. Define “normal”
2. Assume ”normal” will continue in both a control group and an experimental group.
3. Introduce chaos: servers that crash, hard drives that malfunction, network connections that are severed, etc.
4. Look for a difference in behavior between the control group and the experimental group.
![Page 19: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/19.jpg)
Going farther
Build a Hypothesis around Normal Behavior
Vary Real-world Events
Run Experiments in Production
Automate Experiments to Run Continuously
From http://principlesofchaos.org/
![Page 20: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/20.jpg)
Benefits of chaos engineering
![Page 21: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/21.jpg)
Benefits of chaos engineering
You're awake Design for failure
Healthy systems Self service
![Page 22: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/22.jpg)
Current examples of chaos engineering
![Page 23: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/23.jpg)
Maybe you meant Netflix’s Chaos Monkey?
![Page 24: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/24.jpg)
How is Jet different?
![Page 25: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/25.jpg)
We’re not testing in prod (yet).
![Page 26: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/26.jpg)
SQL restarts & geo-replication
Start
- Checks the source db for write access
- Renames db on destination server (to create a new one)
- Creates a geo-replication in the destination region
Stop
- Shuts down cloud services writing to source db
- Sets source db as read-only
- Ends continuous copy
- Allows writes to secondary db
![Page 27: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/27.jpg)
Azure & F#
![Page 28: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/28.jpg)
Why F#?
![Page 29: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/29.jpg)
![Page 30: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/30.jpg)
What FP means to us
Prefer immutability
Avoid state changes, side effects, and mutable data
Use data in data out transformations
Think about mapping inputs to outputs.
Look at problems recursively
Consider successively smaller chunks of the same problem
Treat functions as unit of work
Higher-order functions
![Page 31: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/31.jpg)
The F# solution offers us an order of magnitude
increase in productivity and allows one developer to
perform the work [of] a team of dedicated
developers…
Yan Cui
Lead Server Engineer, Gamesys
“
“ “
![Page 32: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/32.jpg)
Concise and powerful code
public abstract class Transport{ }
public abstract class Car : Transport { public string Make { get; private set; } public string Model { get; private set; } public Car (string make, string model) {
this.Make = make; this.Model = model;
} }
public abstract class Bus : Transport { public int Route { get; private set; } public Bus (int route) {
this.Route = route; }
}
public class Bicycle: Transport { public Bicycle() {}
}
type Transport =| Car of Make:string * Model:string| Bus of Route:int| Bicycle
C# F#
Trivial to pattern match on!
![Page 33: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/33.jpg)
F#
pa
ttern
ma
tch
ing
C#
![Page 34: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/34.jpg)
Concise and powerful code
public abstract class Transport{ }
public abstract class Car : Transport { public string Make { get; private set; } public string Model { get; private set; } public Car (string make, string model) {
this.Make = make; this.Model = model;
} }
public abstract class Bus : Transport { public int Route { get; private set; } public Bus (int route) {
this.Route = route; }
}
public class Bicycle: Transport { public Bicycle() {}
}
type Transport =| Car of Make:string * Model:string| Bus of Route:int| Bicycle| Train of Line:int
let getThereVia (transport:Transport) =match transport with
| Car (make,model) -> ...| Bus route -> ...| Bicycle -> ...
Warning FS0025: Incomplete pattern matches on this expression. For example, the value ’Train' may indicate a case not covered by the pattern(s)
C# F#
![Page 35: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/35.jpg)
Units of Measure
![Page 36: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/36.jpg)
TickSpec – an F# project
Thanks to Scott Wlaschin for his post, Cycles and modularity in the wild
![Page 37: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/37.jpg)
SpecFlow– a comparable C# project
Thanks to Scott Wlaschin for his post, Cycles and modularity in the wild
![Page 38: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/38.jpg)
Chaos code!
![Page 39: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/39.jpg)
![Page 40: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/40.jpg)
type Input =
| Product of Product
type Output =
| ProductPriceNile of Product * decimal
| ProductPriceCheckFailed of PriceCheckFailed
let handle (input:Input) =
async {
return Some(ProductPriceNile({Sku="343434"; ProductId = 17; ProductDescription = "Myamazing product"; CostPer=1.96M}, 3.96M))
}
let interpret id output =
match output with
| Some (Output.ProductPriceNile (e, price)) -> async {()} // write to event store
| Some (Output.ProductPriceCheckFailed e) -> async {()} // log failure
| None -> async.Return ()
let consume = EventStoreQueue.consume (decodeT Input.Product) handle interpret
What do our services look like?
Define inputs
& outputs
Define how input
transforms to output
Define what to do
with output
Read events,
handle, & interpret
![Page 41: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/41.jpg)
Our code!
let selectRandomInstance compute hostedService = async {try
let! details = getHostedServiceDetails compute hostedService.ServiceNamelet deployment = getProductionDeployment details
let instance = deployment.RoleInstances|> Seq.toArray|> randomPick
return details.ServiceName, deployment.Name, instancewith e ->
log.error "Failed selecting random instance\n%A" ereraise e
}
![Page 42: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/42.jpg)
Our code!
let restartRandomInstance compute hostedService = async {try
let! serviceName, deploymentId, roleInstance = selectRandomInstance compute hostedService
match roleInstance.PowerState with| RoleInstancePowerState.Stopped ->
log.info "Service=%s Instance=%s is stopped...ignoring...” serviceName roleInstance.InstanceName
| _ ->do! restartInstance compute serviceName deploymentId roleInstance.InstanceName
with e -> log.error "%s" e.Message
}
![Page 43: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/43.jpg)
Our code!
compute|> getHostedServices|> Seq.filter ignoreList|> knuthShuffle|> Seq.distinctBy (fun a -> a.ServiceName) |> Seq.map (fun hostedService -> async {
try return! restartRandomInstance compute hostedService
with e -> log.warn "failed: service=%s . %A" hostedService.ServiceName e return ()
})|> Async.ParallelIgnore 1 |> Async.RunSynchronously
![Page 44: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/44.jpg)
Has it helped?
![Page 45: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/45.jpg)
Elasticsearch restart
![Page 46: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/46.jpg)
Additional chaos finds
- Redis
- Checkpointing
![Page 47: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/47.jpg)
![Page 48: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/48.jpg)
If availability matters, you should be
testing for it.
![Page 49: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/49.jpg)
Azure + F# + Chaos = <3
![Page 50: Chaos testing at Jet - QCon London 2020 · Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory](https://reader036.fdocuments.us/reader036/viewer/2022070711/5ec7f6754d3aba34246b0d26/html5/thumbnails/50.jpg)
Chaos Engineering at Jet.com
Rachel Reese | @rachelreese | rachelree.se
Jet Technology | @JetTechnology | tech.jet.com
Nora Jones | @nora_js