Post on 16-Apr-2017
Creating a Microservice? Answer These 10 Questions First.
Brian Kelly, VP Engineering, Datawire
DevOpsDays Austin, May 2nd 2016
@brikelly bkelly@datawire.io
datawire.io
Hi!
Me * Working in distributed systems most of my career * Built a number of middleware and messaging products * Strangled a SaaS monolith with microservices
Datawire * Based in Boston and San Francisco * We provide technology for companies adopting microservices * We’ve spent a lot of time with the master microservices practitioners
from high-growth technology companies
datawire.io
Microservices increase development velocity
DevOps increases release velocity
For organizations scaling rapidly, doing one without the other is…“suboptimal”
Microservices and DevOps: A Perfect Match
datawire.io 5
“There are only two hard problems in distributed systems:
1. Exactly-once delivery 2. Guaranteed order of messages 1. Exactly-once delivery”
@mathiasverraes
datawire.io
Force awareness in your teams of latent concerns * For example, potential future issues with scalability and reliability
It’s OK to not have sophisticated answers for each question * But asking them is important!
Why Ask These 10 Questions?
datawire.io 9
Developer Infrastructure Teams
The dev infrastructure team focuses on developer education, core infrastructure, and driving standards through a great DX.
datawire.io 10
Investing in the core infrastructure necessary for independent iteration is key
Continuous delivery workflow
Loosely coupled services
Application resilience
datawire.io
Bake
DockerPacker
Deploy
AWSCloud FoundryDockerGCPKubernetesMesosMicroso! Azure
Build
Circle CIGo.cdJFrogJenkinsTravis
Define
Datawire QuarkFinagle / Thri!HTTP / JSONgRPC / Protobuf
Monitor
AppDynamicsDataDogInfluxDataNagiosNew RelicSignalFXSysdigWavefrontZipkin
Connect
Datawire ConnectHomegrownHystrix / RIbbonSmartStack
DevOps
Development
Buildand package the code/contract into a source
artifact
GitHub / Source
JAR, Gem, npm
AMI, Container,
VM
Microservice
Definethe contract (API, data
format, protocol)
the business logicCode
Connectthe microservice to other
microservices
Monitorthe health of the deployed
microservice
Deploythe artifact to run on the
appropriate compute resources
the application & dependencies into deploy-
able artifact
Bake
Continuous Delivery Ecosystem for Microservices
Automated DevOps workflow: Spinnaker
12
Our Model
datawire.io 13
Continuous delivery workflow
1. Workflow needs to be defined but does not need to be fully automated. Increase automation as the number of microservices grows.
2. Need to have service running in production in order to fully test.
Quickly move from commit to customer
datawire.io 14
Each upgrade is an opportunity to break the contract between your new service and any other dependent services
Plenty of techniques exist for mitigating the chance of failure: * Well-specified structural and behavioral service contracts * Dark launching for examining the effect of prod traffic without risk * Response diff’ing for ensuring contract compliance * Canary testing for progressive rollout * Blue/Green deployment for fast rollback
Upgrading your Service
datawire.io 16
Ways of monitoring your service’s health:
OK: * Health check from monitor to service (GET /health from an ELB)
Better: * “Call Home” health check from service to monitor (APM approach)
Best: * The client’s experience calling real APIs on the service
Monitoring and Measuring your Service
datawire.io 17
Which service is introducing the maximum latency into a request?
Which service is the root cause of a cascade failure?
Monitor the traffic, not just the services
Diagnosis
datawire.io 19
Unit testing a single service is the easy part
What’s harder: testing the entire system
How will a developer verify that their changes to a single microservice will not break other parts of the system?
Staging environments bring a little comfort, but add significant cost, complexity, and distractions
Testing
datawire.io 20
Test before launch
Mock services Sophisticated deployment workflows Automated regression tests
Test after launch
Dark launch Canary testing Blue / green deployment
Microservice Testing Is Required on Both Sides of Deployment
Reduce probability of failure Reduce impact of failure
datawire.io 22
Most likely type of attack vectors: * Exploitation of OWASP Top 10 vulnerabilities in your web application * Internal staff with existing access * Social engineering
Less likely type of attack vector: * Attacker gains access behind your perimeter, logs on to your containers,
reverse-engineers your internal service APIs, sends fake requests to and from each microservice
Prioritize Potential Attack Vectors
datawire.io 24
“Configuration” can be categorized:
• Static configuration (log file locations, ports to listen on, …)
• Runtime configuration (thread pool sizes, JVM heap size, …)
• Behavioral configuration (feature flags, request routing rules, …)
Configuration
datawire.io 25
Prevent arbitrary static configuration changes to production systems * Instead, deploy those changes into new immutable, copy-on-write
containers
Strive for adaptive, elastic services that require zero dynamic configuration changes at runtime to stay healthy
Reserve behavioral configuration for progressive rollouts, dark launching, routing
Configuration
datawire.io 27
Your new microservice will provide new value to the rest of the system
But will it offer an SLA for its latency, uptime, and reliability?
Those who consume it will appreciate it: • They can specify timeouts and trip circuit breakers when response latency is high • They will know which operations are idempotent • They could cache some responses for large queries • They can spot uptime SLA discrepancies
Datawire’s Quark is an IDL that captures both structure and behavior
Your microservice needs a contract
datawire.io 29
Structural vs. Behavioral Contracts
Behavioral: Intended for Humans
Structural: Intended for Tools
datawire.io 31
The simpler your discovery system, the less flexibility it offers.
DNS schemes: very simple, but don’t take into account availability, also makes the developer experience difficult
Strongly consistent datastores (e.g. Zookeeper): more flexible, but don’t handle network partitions at all
Eventually consistent datastores with pub/sub (e.g. Datawire Connect): very flexible, handles partitions well, clients and services unaffected even when they can’t reach the discovery system
Service Discovery
datawire.io
Node
NodeNode
35
What will be the sequence of failures in the event of a large increase in traffic?
* Example sequence: First the database maxes out, then RAM, then CPU, then file descriptors, then ELBs, then NICs
Awareness of the likely failure sequence will help you be aware of your headroom and help build a plan for capacity growth
Knowing your Chokepoint Sequence
NodeNode
NodeCassandra Cassandra Cassandra
NodeHAProxy HAProxy
NodeNode
datawire.io
Node
NodeNode
36
What will be the sequence of failures in the event of a large increase in traffic?
* Example sequence: First the database maxes out, then RAM, then CPU, then file descriptors, then ELBs, then NICs
Awareness of the likely failure sequence will help you be aware of your headroom and help build a plan for capacity growth
Knowing your Chokepoint Sequence
NodeNode
NodeCassandra Cassandra Cassandra
NodeHAProxy HAProxy
NodeNode
datawire.io
NodeNode
Node
37
What will be the sequence of failures in the event of a large increase in traffic?
* Example sequence: First the database maxes out, then RAM, then CPU, then file descriptors, then ELBs, then NICs
Awareness of the likely failure sequence will help you be aware of your headroom and help build a plan for capacity growth
Knowing your Chokepoint Sequence
NodeNode
NodeCassandra Cassandra Cassandra
NodeHAProxy HAProxy
NodeNode
datawire.io
NodeNode
Node
38
What will be the sequence of failures in the event of a large increase in traffic?
* Example sequence: First the database maxes out, then RAM, then CPU, then file descriptors, then ELBs, then NICs
Awareness of the likely failure sequence will help you be aware of your headroom and help build a plan for capacity growth
Knowing your Chokepoint Sequence
NodeNode
NodeCassandra Cassandra Cassandra
NodeHAProxy HAProxy
NodeNode
datawire.io
NodeNode
Node
39
What will be the sequence of failures in the event of a large increase in traffic?
* Example sequence: First the database maxes out, then RAM, then CPU, then file descriptors, then ELBs, then NICs
Awareness of the likely failure sequence will help you be aware of your headroom and help build a plan for capacity growth
Knowing your Chokepoint Sequence
NodeNode
NodeCassandra Cassandra Cassandra
NodeHAProxy HAProxy
NodeNode NodeNode NodeNode
datawire.io
NodeNode
Node
40
What will be the sequence of failures in the event of a large increase in traffic?
* Example sequence: First the database maxes out, then RAM, then CPU, then file descriptors, then ELBs, then NICs
Awareness of the likely failure sequence will help you be aware of your headroom and help build a plan for capacity growth
Knowing your Chokepoint Sequence
NodeNode
NodeCassandra Cassandra Cassandra
NodeHAProxy HAProxy
NodeNode NodeNode NodeNode
datawire.io 43
Microservice architectures are a highly distributed system by their nature
That means failures will occur, and on a frequent basis
Dependency Failures
datawire.io 44
Upstream and Downstream Dependencies
Downstream MicroservicesUpstream Microservices
Request
Response
Request
Response
datawire.io 45
Any microservice calling another must handle downstream failure, with: * Timeouts * Circuit breakers to prevent cascading failure * Backpressure * Default response values * Caching prior responses * Retries * Fallback to alternative endpoints
Don’t assume that downstream failures manifest as dead endpoints * Services get sick more often than they die!
Downstream Dependency Failure
datawire.io 46
Understand what it means for the rest of the system when (not if) your service fails
A non-critical service (e.g. a logging service invoked asynchronously over UDP) can fail without causing upstream disruption, at the expense of log data loss
A critical synchronous service (e.g. a credit card payment service invoked over RPC) will require careful use by upstream components if transactions fail mid-stream
Failing to Serve Upstream Dependencies
datawire.io
It’s free and OSS!
https://github.com/datawire/datawire-connect
We work in a public Slack channel - feel free to join to ask questions about microservices in general, or about our tech (link on the GitHub page)
Watch the talks from our recent Microservices Practitioner Summit (speakers from Facebook, Netflix, Uber, Google, Yelp, New Relic…) on microservices.com
And like every other organization in here, we’re hiring!
48
Trying Datawire Connect