Dataflow: Declarative concurrency in Ruby

Post on 15-Jan-2015

2.573 views 1 download

Tags:

description

While Ruby is known for its flexibility due to high mutability and meta-programming capability, these features make writing thread-safe programs using manual locking very error-prone. For this reason some people are switching to languages with easier to manage concurrency paradigms, such as Erlang/Scala’s message passing, or Clojure/Haskell’s Software Transactional Memory (STM).This talk is about Dataflow, a pure Ruby gem that adds dataflow variables to the Ruby language. Dataflow variables are write-once (or write multiple times with the same value), and suspend execution in the current thread/context if called before being assigned/bound. We will explore how this technique makes writing concurrent but thread-safe code easy, even making it possible to write tests that spawn threads without needing to worry.Declarative concurrency is a relatively unknown programming model that is an alternative to message passing and STM. Ruby’s malleability makes it an ideal host for this model. Besides performance implications, dataflow variables also have an important impact on declarative program modeling. The talk will also go over the differences in performance and memory of the library in various Ruby implementations.

Transcript of Dataflow: Declarative concurrency in Ruby

DataflowThe declarative concurrent

programming model

Larry Diehl

{:larrytheliquid => %w[.com github twitter]}

Outline

Purpose of presentation

Gradual explanation of concepts

Helpful tips

Purpose

Lexical Scope

foo = :foodefine_method :foo do fooend

Dynamic Scope

def foo @fooend

Mutability

def initialize @foo = :fooend def foo @fooend

Mutability

def foo @foo = :foo @fooend

Mutability+Concurrency

def initialize Thread.new { loop { @foo = :shazbot } }end def foo @foo = :foo @fooend

The Declarative Model

Declarative Synchronous

my_var = :boundmy_var = :rebind # NOT ALLOWED!

Declarative Synchronous

local do |my_var| my_var.object_id # thread sleepsend

Declarative Synchronous

local do |my_var| unify my_var, :bound unify my_var, :rebind # => # Dataflow::UnificationError, # ":bound != :rebind"end

Declarative Synchronous

class MyClass declare :my_var def initialize unify my_var, :bound endend

Declarative Concurrent(MAGIC)

Declarative Concurrent

local do |my_var| Thread.new { unify my_var, :bound } my_var.should == :boundend

Dependency Resolution

local do |sentence, middle, tail| Thread.new { unify middle, "base are belong #{tail}" } Thread.new { unify tail, "to us" } Thread.new { unify sentence, "all your #{middle}" } sentence.should == "all your base are belong to us"end

Asynchronous Outputdef Worker.async(output=nil) Thread.new do result = # do hard work unify output, result if output endend local do |output| Worker.async(output) output.should == # hard work resultend

Asynchronous Output

local do |output| flow(output) do # do hard work end output.should == # hard work resultend

Anonymous variables

{'google.com' => Dataflow::Variable.new, 'bing.com' => Dataflow::Variable.new}.map do |domain,var| Thread.new do unify var, open("http://#{domain}").read end varend

need_later

%w[google.com bing.com].map do |domain| need_later { open("http://#{domain}").read }end

Chunked Sequential Processing

(1..100).each_slice(10).map do |chunk| sleep(1) chunk.inject(&:+)end.inject(&:+) # => ~10s

Chunked Parallel Processing

(1..100).each_slice(10).map do |chunk| need_later do sleep(1) chunk.inject(&:+) endend.inject(&:+) # => ~1s

Leaving Declarative via Async

Ports & Streams

local do |port, stream| unify port, Dataflow::Port.new(stream) port.send 1 port.send 2 stream.take(2).should == [1, 2]end

Ports & Streams (async)local do |port, stream| unify port, Dataflow::Port.new(stream) Thread.new do stream.each do |message| puts "received: #{message}" end end %w[x y z].each do |letter| Thread.new{ port.send letter } end stream.take(3).sort.should == %w[x y z]end

FutureQueuelocal do |queue, first, second, third| unify queue, FutureQueue.new queue.pop first queue.pop second queue.push 1 queue.push 2 queue.push 3 queue.pop third [first, second, third].should == [1, 2, 3]end

ActorsPing = Actor.new { 3.times { case receive when :ping puts "Ping" Pong.send :pong end }}

Pong = Actor.new { 3.times { case receive when :pong puts "Pong" Ping.send :ping end }}

Ping.send :ping

by_need

def baz(num) might_get_used = by_need { Factory.gen } might_get_used.value if num%2 == 0end

Tips

Modular

local do |my_var| Thread.new { unify my_var, :bound } # my_var.wait my_var.should == :boundend

Debugging

local do |my_var| my_var.inspect # => #<Dataflow::Variable:2637860 unbound>end

Class/Module methods

Dataflow.local do |my_var| Dataflow.async do Dataflow.unify my_var, :bound end my_var.should == :boundend

Use Casesgeneral purpose

concurrency for elegant program structure with respect to coordination

concurrency to make use of extra processors/cores (depending on Ruby implementation)

web developmentworker daemons

concurrently munging together data from various rest api's

Ruby Implementations

Pure Ruby library, should work on any implementation

JRuby in particular has a great GC, no GIL, native threads, and a tunable threadpool option.

Rubinius has more code written in Ruby, so it proxies more method calls (e.g. Array#flatten).

class FutureQueue include Dataflow declare :push_port, :pop_port def initialize local do |pushed, popped| unify push_port, Dataflow::Port.new(pushed) unify pop_port, Dataflow::Port.new(popped) Thread.new { loop do barrier pushed.head, popped.head unify popped.head, pushed.head pushed, popped = pushed.tail, popped.tail end } end end def push(x) push_port.send x end def pop(x) pop_port.send x endend

The End

sudo port install dataflow

http://github.com/larrytheliquid/dataflow

freenode: #dataflow-gem