Reducing Microservice Complexity with Kafka and Reactive Streams
Jim Riecken
Reducing Microservice Complexity with Kafka and Reactive Streams
Senior Software DeveloperJim Riecken
@jimriecken - [email protected]
@jimriecken
• Monolith to Microservices + Complexity
• Asynchronous Messaging• Kafka• Reactive Streams + Akka Streams
Agenda
• Details on how to set up a Kafka cluster
• In-depth tutorial on Akka Streams
Anti-Agenda
Monolith to Microservices
M
Efficie
ncy
Time
MS1
S2
F
S1
S2
S3
S4
S5
Efficie
ncy
Time
•Small•Scalable•Independent•Easy to Create•Clear ownership
Network Calls
•Latency•Failure
~99.5%
Reliability
99.9% 99.9% 99.9% 99.9%
Coordination
•Between services
•Between teams
AsynchronousMessaging
Message Bus
Synchronous
Asynchronous
• Decoupling•Pub/Sub
• Less coordination•Additional consumers are easy
•Help scale organization
Why?
•Well-defined delivery semantics
• High-Throughput• Highly-Available• Durable• Scalable• Backpressure
Messaging Requirements
Kafka
• Distributed, partitioned, replicated commit log service
• Pub/Sub messaging functionality• Created by LinkedIn, now an
Apache open-source project
What is Kafka?
Producers
Kafka Brokers
Consumers
0 | 1 | 2 | 3 | 4 | 5
0 | 1 | 2 | 3 | 4 | 5 | 6
0 | 1 | 2 | 3
P0
P1
P2
New Messages Appended
Topic
Topics + Partitions
• Send messages to topics• Responsible for choosing which
partition to send to•Round-robin•Consistent hashing based on a message key
Producers
• Pull messages from topics• Track their own offset in each
partition
Consumers
P0 P1 P2
1 2 3 4 5 6
Topic
Group 1 Group 2
How does Kafka meet the
requirements?
• Hundreds of MB/s of reads/writes from thousands of concurrent clients
• LinkedIn (2015)•800 billion messages per day (18
million/s peak)•175 TB of data produced per day•> 1000 servers in 60 clusters
Kafka is Fast
• Brokers•All data is persisted to disk•Partitions replicated to other nodes
• Consumers•Start where they left off
• Producers•Can retry - at-least-once messaging
Kafka is Resilient
• Capacity can be added at runtime with zero downtime
•More servers => more disk space• Topics can be larger than any
single node could hold• Additional partitions can be added
to add more parallelism
Kafka is Scalable
• Large storage capacity•Topic retention is a Consumer SLA
• Almost impossible for a fast producer to overload a slow consumer
•Allows real-time as well as batch consumption
Kafka Helps with Back-Pressure
Message Data Format
• Array[Byte]• Serialization?• JSON?• Protocol Buffers
•Binary - Fast•IDL - Code Generation•Message evolution
Messages
Processing Data with Reactive
Streams
• Standard for async stream processing with non-blocking back-pressure
•Subscriber signals demand to publisher
•Publisher sends no more than demand
• Low-level• Mainly meant for library authors
Reactive Streams
Publisher[T] Subscriber[T]
onSubscribe(s: Subscription)onNext(t: T)onComplete()onError(t: Throwable)
Subscription
subscribe(s: Subscriber[-T])
request(n: Long)cancel()
Processing Data with Akka Streams
• Library on top of Akka Actors and Reactive Streams
• Process sequences of elements using bounded buffer space
• Strongly Typed
Akka Streams
Flow
Source
SinkFanOut
FanIn
Concepts
Runnable Graph
Concepts
Composition
• Turning on the tap•Create actors•Open files/sockets/other
resources•Materialized values
•Source: Actor, Promise, Subscriber
•Sink: Actor, Future, Producer
Materialization
Reactive Kafka
• https://github.com/akka/reactive-kafka
• Akka Streams wrapper around Kafka API
•Consumer Source•Producer Sink
Reactive Kafka
• Sink - sends message to Kafka topic•Flow - sends message to Kafka topic + emits result downstream
• When the stream completes/fails the connection to Kafka will be automatically closed
Producer
• Source - pulls messages from Kafka topics
• Offset Management• Back-pressure• Materialization
•Object that can stop the consumer (and complete the stream)
Consumer
Simple Producer Example implicit val system = ActorSystem("producer-test")
implicit val materializer = ActorMaterializer()
val producerSettings = ProducerSettings(
system, new ByteArraySerializer, new StringSerializer
).withBootstrapServers("localhost:9092")
Source(1 to 100)
.map(i => s"Message $i")
.map(m => new ProducerRecord[Array[Byte], String]
("lower", m))
.to(Producer.plainSink(producerSettings)).run()
Simple Consumer Example implicit val system = ActorSystem("producer-test")
implicit val materializer = ActorMaterializer()
val consumerSettings = ConsumerSettings(
system, new ByteArrayDeserializer, new StringDeserializer, Set("lower")
).withBootstrapServers("localhost:9092").withGroupId("test-group")
val control =
Consumer.atMostOnceSource(consumerSettings.withClientId("client1"))
.map(record => record.value)
.to(Sink.foreach(v => println(v))).run()
control.stop()
val control =
Consumer.committableSource(consumerSettings.withClientId("client1"))
.map { msg =>
val upper = msg.value.toUpperCase
Producer.Message(
new ProducerRecord[Array[Byte], String]("upper", upper),
msg.committableOffset)
}.to(Producer.commitableSink(producerSettings)).run()
control.stop()
Combined Example
Demo
Wrap-Up
• Microservices have many advantages, but can introduce failure and complexity.
• Asynchronous messaging can help reduce this complexity and Kafka is a great option.
• Akka Streams makes reliably processing data from Kafka with back-pressure easy
Wrap-Up
Thank you!Questions?
@jimriecken - [email protected] Riecken
Top Related