Streaming Data with scalaz-stream
-
Upload
gary-coady -
Category
Technology
-
view
282 -
download
1
Transcript of Streaming Data with scalaz-stream
• Why do we want streaming APIs?
• Introduction to scalaz-stream
• Use case: Server-Sent Events implementation
Contents
Why do we want streaming APIs?
Information with Indeterminate/unbounded size• Lines from a text file
• Bytes from a binary file
• Chunks of data from a TCP connection
• TCP connections
• Data from Kinesis or SQS or SNS or Kafka or…
• Data from an API with paged implementation
“Dangerous” Choices
• scala.collection.Iterable Provides an iterator to step through items in sequence
• scala.collection.immutable.Stream Lazily evaluated, possibly infinite list of values
Do The Right Thing• Safe setup and cleanup
• Constant memory usage
• Constant stack usage
• Refactor with confidence
• Composable
• Back-pressure
• Creates co-data • Safe resource management • Referential transparency • Controlled asynchronous effects
What is scalaz-stream
User code
Process.await
“Waiting” for callback
User code
Callback
sealed trait Process[+F[_], +O]
Effect
Output
case class Halt(cause: Cause) extends Process[Nothing, Nothing]
case class Emit[+O](seq: Seq[O]) extends Process[Nothing, O]
case class Await[+F[_], A, +O]( req: F[A], rcv: (EarlyCause \/ A) => Process[F, O] ) extends Process[F, O]
Composition OptionsProcess1[I, O] -‐ Stateful transducer, converts I => O (with state) -‐ Combine with “pipe”
Channel[F[_], I, O] -‐ Takes I values, runs function I => F[O] -‐ Combine with “through” or “observe”.
Sink[F[_], I] -‐ Takes I values, runs function I => F[Unit] -‐ Add with “to”.
Implementing Server-sent Events (SSE)
This specification defines an API for opening an HTTP connection for
receiving push notifications from a server in the form of DOM events.
case class SSEEvent(eventName: Option[String], data: String)
data: This is the first message.
data: This is the second message, it data: has two lines.
data: This is the third message.
event: add data: 73857293
event: remove data: 2153
event: add data: 113411
Example streams
We want this type:
Process[Task, SSEEvent]
“A potentially infinite stream of SSE event messages”
async.boundedQueue[A]
• Items added to queue are removed in same order
• Connect different asynchronous domains
• Methods:def enqueueOne(a: A): Task[Unit]def dequeue: Process[Task, A]
HTTP Client Implementation
• Use Apache AsyncHTTPClient • Hook into onBodyPartReceived callback • Use async.boundedQueue to convert chunks into
stream
def httpRequest(client: AsyncHttpClient, url: String): Process[Task, ByteVector] = {
val contentQueue = async.boundedQueue[ByteVector](10)
val req = client.prepareGet(url)
req.execute(new AsyncCompletionHandler[Unit] {
override def onBodyPartReceived(content: HttpResponseBodyPart) = { contentQueue.enqueueOne( ByteVector(content.getBodyByteBuffer) ).run
super.onBodyPartReceived(content) } })
contentQueue.dequeue }
How to terminate stream?
req.execute(new AsyncCompletionHandler[Unit] {
...
override def onCompleted(r: Response): Unit = { logger.debug("Request completed") contentQueue.close.run }
...
}
How to terminate stream with errors?
req.execute(new AsyncCompletionHandler[Unit] {
...
override def onThrowable(t: Throwable): Unit = { logger.debug("Request failed with error", t) contentQueue.fail(t).run }
...
}
Process[Task, ByteVector]
Process[Task, SSEEvent]
Process[Task, Underpants]
Step 1
Step 2
Step 3
• Split at line endings
• Convert ByteVector into UTF-8 Strings
• Partition by SSE “tag” (“data”, “id”, “event”, …)
• Emit accumulated SSE data when blank line found
• Split at line endingsByteVector => Seq[ByteVector]
• Convert ByteVector into UTF-8 StringsByteVector => String
• Partition by SSE “tag” (“data”, “id”, “event”, …)String => SSEMessage
• Emit accumulated SSE data when blank line foundSSEMessage => SSEEvent
Handling Network Errors
• If a network error occurs:
• Sleep a while
• Set up the connection again and keep going
• Append the same Process definition again!
def sseStream: Process[Task, SSEEvent] = { httpRequest(client, url) .pipe(splitLines) .pipe(emitMessages) .pipe(emitEvents) .partialAttempt { case e: ConnectException => retryRequest case e: TimeoutException => retryRequest } .map(_.merge) }
def retryRequest: Process[Task, SSEEvent] = { time.sleep(retryTime) ++ sseStream }
Usage
sseStream(client, url) pipe jsonToString to io.stdOutLines
Questions?