Top Mistakes When Writing Reactive Applications - Scala by the Bay 2016

74
MANCHESTER LONDON NEW YORK

Transcript of Top Mistakes When Writing Reactive Applications - Scala by the Bay 2016

MANCHESTER LONDON NEW YORK

Petr Zapletal @petr_zapletal@ScalaByTheBay

@cakesolutions

Top Mistakes When Writing Reactive Applications

Agenda

● Motivation

● Actors vs Futures

● Serialization

● Graceful Shutdown

● Distributed Transactions

● Longtail Latencies

● Quick Tips

Actors vs Futures

Constraints Liberate, Liberties Constrain

Pick the Right Tool for The Job

Akka

ACTORS

Power

Constraints

Akka Stream

Pick the Right Tool for The Job

Akka

ACTORS

Power

Constraints

Akka TYPED

Pick the Right Tool for The Job

Akka TYPED

Akka

ACTORS

Power

Constraints

Akka Stream

Pick the Right Tool for The JobLocal Abstractions Distribution

Akka TYPED

Akka

ACTORS

Power

Constraints

Akka Stream

Actor Use Cases

● State management

● Location transparency

● Resilience mechanisms

● Single writer

● In-memory lock-free cache

● Sharding

Akka

ACTOR

Future Use Cases

● Local Concurrency

● Simplicity

● Composition

● Typesafety

Avoid Java Serialization

Java Serialization is the default in Akka, since it is easy to start with it, but is very slow and footprint heavy

Akka

ACTOR

Sending Data Through Network

Serialization Serialization

Akka

ACTOR

Persisting Data

Akka

ACTOR

Serialization

Java Serialization - Round Trip

Java Serialization - Footprint

Java Serialization - Footprint

Java Serialization:

----sr--model.Order----h#-----J--idL--customert--Lmodel/Customer;L--descriptiont--Ljava/lang/String;L--orderLinest--Ljava/util/List;L--totalCostt--Ljava/math/BigDecimal;xp--------ppsr--java.util.ArrayListx-----a----I--sizexp----w-----sr--model.OrderLine--&-1-S----I--lineNumberL--costq-~--L--descriptionq-~--L--ordert--Lmodel/Order;xp----sr--java.math.BigDecimalT--W--(O---I--scaleL--intValt--Ljava/math/BigInteger;xr--java.lang.Number-----------xp----sr--java.math.BigInteger-----;-----I--bitCountI--bitLengthI--firstNonzeroByteNumI--lowestSetBitI--signum[--magnitudet--[Bxq-~----------------------ur--[B------T----xp----xxpq-~--xq-~--

XML:

<order id="0" totalCost="0"><orderLines lineNumber="1" cost="0"><order>0</order></orderLines></order>

JSON:

{"order":{"id":0,"totalCost":0,"orderLines":[{"lineNumber":1,"cost":0,"order":0}]}}

Points of Interest

● Performance

● Footprint

● Schema evolution

● Implementation effort

● Human readability

● Language bindings

● Backwards & forwards compatibility

● ...

JSON

● Advantages:

○ Human readability

○ Simple & well known

○ Many good libraries for all platforms

● Disadvantages:

○ Slow

○ Large

○ Object names included

○ No schema (except e.g. json schema)

○ Format and precision issues

● json4s, circe, µPickle, spray-json, argonaut, rapture-json, play-json, …

Binary formats [Schema-less]

● Metadata send together with data

● Advantages:

○ Implementation effort

○ Performance

○ Footprint *

● Disadvantages:

○ No human readability

● Kryo, Binary JSON (MessagePack, BSON, ... )

Binary formats [Schema]

● Schema defined by some kind of DSL

● Advantages:

○ Performance

○ Footprint

○ Schema evolution

● Disadvantages:

○ Implementation effort

○ No human readability

● Protobuf (+ projects like Flatbuffers, Cap’n Proto, etc.), Thrift, Avro

Summary

● Should be always changed

● Depends on particular use case

● Quick tips:

○ json4s

○ kryo

○ protobuf

Graceful Shutdown

We have thousands of sharded actors on multiple nodes and we want to shut one of them down

Graceful Shutdown

High-level Procedure

High-level Procedure

1. JVM gets the shutdown signal

High-level Procedure

1. JVM gets the shutdown signal

2. Coordinator tells all local ShardRegions to shut down gracefully

High-level Procedure

1. JVM gets the shutdown signal

2. Coordinator tells all local ShardRegions to shut down gracefully

3. Node leaves cluster

High-level Procedure

1. JVM gets the shutdown signal

2. Coordinator tells all local ShardRegions to shut down gracefully

3. Node leaves cluster

4. Coordinator gives singletons a grace period to migrate

High-level Procedure

1. JVM gets the shutdown signal

2. Coordinator tells all local ShardRegions to shut down gracefully

3. Node leaves cluster

4. Coordinator gives singletons a grace period to migrate

5. Actor System & JVM Termination

Integration with Sharded Actors

● Handling of added messages

○ Passivate() message for graceful stop

○ Context.stop() for immediate stop

● Priority mailbox

○ Priority message handling

○ Message retrying support

Summary

● We don’t want to lose data (usually)

● Shutdown coordinator on every node

● Integration with sharded actors

Distributed Transactions

Any situation where a single event results in the mutation of two separate sources of data which cannot be committed atomically

What’s Wrong With Them

● Simple happy paths

● 7 Fallacies of Distributed Programming

○ The network is reliable.

○ Latency is zero.

○ Bandwidth is infinite.

○ The network is secure.

○ Topology doesn't change.

○ There is one administrator.

○ Transport cost is zero.

○ The network is homogeneous.

Two-phase commit (2PC)Stage 1 - Prepare Stage 2 - Commit

Prepare

Prepared

PreparePrepared

Commit

Committed

CommitCommitted

Resource Manager

Resource Manager

Transaction Manager

Resource Manager

Resource Manager

Transaction Manager

Saga Pattern

T1 T2 T3 T4

C1 C2 C3 C4

The Big Trade-Off

● Distributed transactions can be usually avoided

○ Hard, expensive, fragile and do not scale

● Every business event needs to result in a single synchronous commit

● Other data sources should be updated asynchronously

● Introducing eventual consistency

Longtail Latencies

Consider a system where each service typically responds in 10ms but with a 99th percentile latency of one second

Longtail LatenciesLatency Normal vs. Longtail

Legend: Normal

Longtail

50

40

30

20

10

0 25 50 75 90 99 99.9

Late

ncy

(ms)

Percentile

Longtails really matter

● Latency accumulation

● Not just noise

● Don’t have to be power users

● Real problem

Tolerating Longtail Latencies

Tolerating Longtail Latencies

● Hedging your bet

Tolerating Longtail Latencies

● Hedging your bet

● Tied requests

Tolerating Longtail Latencies

● Hedging your bet

● Tied requests

● Selectively increase replication factors

Tolerating Longtail Latencies

● Hedging your bet

● Tied requests

● Selectively increase replication factors

● Put slow machines on probation

Tolerating Longtail Latencies

● Hedging your bet

● Tied requests

● Selectively increase replication factors

● Put slow machines on probation

● Consider ‘good enough’ responses

Tolerating Longtail Latencies

● Hedging your bet

● Tied requests

● Selectively increase replication factors

● Put slow machines on probation

● Consider ‘good enough’ responses

● Hardware update

Quick Tips

Quick Tips

● Monitoring

Quick Tips

● Monitoring

● Network partitions & Split Brain Resolver

Quick Tips

● Monitoring

● Network partitions & Split Brain Resolver

● Blocking

Quick Tips

● Monitoring

● Network partitions & Split Brain Resolver

● Blocking

● Too many actor systems

Quick Tips

● Monitoring

● Network partitions & Split Brain Resolver

● Blocking

● Too many actor systems

● Error Handling

Questions

MANCHESTER LONDON NEW YORK

MANCHESTER LONDON NEW YORK

@petr_zapletal @cakesolutions

347 708 1518

[email protected]

We are hiringhttp://www.cakesolutions.net/careers

References

● http://www.slideshare.net/ktoso/zen-of-akka

● http://eishay.github.io/jvm-serializers/prototype-results-page/

● http://java-persistence-performance.blogspot.com/2013/08/optimizing-java-serialization-java-vs.html

● https://github.com/romix/akka-kryo-serialization

● http://gotocon.com/dl/goto-chicago-2015/slides/CaitieMcCaffrey_ApplyingTheSagaPattern.pdf

● http://www.grahamlea.com/2016/08/distributed-transactions-microservices-icebergs/

● http://www.cs.duke.edu/courses/cps296.4/fall13/838-CloudPapers/dean_longtail.pdf

● https://engineering.linkedin.com/performance/who-moved-my-99th-percentile-latency

● http://doc.akka.io/docs/akka/rp-15v09p01/scala/split-brain-resolver.html

Backup Slides

MANCHESTER LONDON NEW YORK

Adding Shutdown Hook

Adding Shutdown Hook

Adding Shutdown Hook

Tell Local Regions to Shutdown

Tell Local Regions to Shutdown

Tell Local Regions to Shutdown

Tell Local Regions to Shutdown

Node Leaves the Cluster

Node Leaves the Cluster

Node Leaves the Cluster

Wait for Singletons to Migrate

Wait for Singletons to Migrate

Wait for Singletons to Migrate

Wait for Singletons to Migrate

Actor System & JVM Termination

Actor System & JVM Termination

Actor System & JVM Termination

Actor System & JVM Termination