MBrace: Large-scale cloud computation with F# (CUFP 2014)

23
Eirik Tsarpalis – Nessos MBrace: Large-scale cloud computation with F#

description

Presentation at CUFP 2014, Gothenburg.

Transcript of MBrace: Large-scale cloud computation with F# (CUFP 2014)

Page 1: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Eirik Tsarpalis – Nessos

MBrace: Large-scale cloud computation with F#

Page 2: MBrace: Large-scale cloud computation with F# (CUFP 2014)

ISV / Consultancy based in Athens, Greece.

.NET framework, specializing in F#.

Business applications ◦ Application framework development

◦ Technology migration

◦ Customized software systems

R&D Division ◦ Open Source development

◦ Distributed computation

◦ Optimization frameworks

About Nessos

Page 3: MBrace: Large-scale cloud computation with F# (CUFP 2014)

What is MBrace?

A Programming Model. ◦ Large-scale distributed computation.

◦ Inspired by F# asynchronous workflows.

◦ Declarative, compositional, higher-order.

A Cluster Infrastructure. ◦ Based on the .NET framework.

◦ Elastic, fault tolerant, multitasking.

◦ Open Source – available on github.

Page 4: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Hello World

The MBrace Programming Model

val hello : Cloud<int> let hello = cloud { printfn "hello, world!" return 21 } let result = MBrace.Run hello

Page 5: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Sequential Composition

The MBrace Programming Model

let first = cloud { return 15 } let second = cloud { return 27 } cloud { let! x = first let! y = second return x + y }

Page 6: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Sequential fold

The MBrace Programming Model

val foldM : ('S -> 'T -> Cloud<'S>) -> 'S -> 'T list -> Cloud<'S> let rec foldM f s ts = cloud { match ts with | [] -> return s | t :: ts' -> let! s' = f s t return! foldM f s' ts' }

Page 7: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Parallel Composition

The MBrace Programming Model

val (<||>) : Cloud<'T> -> Cloud<'S> -> Cloud<'T * 'S> cloud { let first = cloud { return 15 } let second = cloud { return 27 } let! x,y = first <||> second return x + y }

Page 8: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Parallel Composition (Variadic)

The MBrace Programming Model

val Cloud.Parallel : Cloud<'T> [] -> Cloud<'T []> cloud { let sqr x = cloud { return x * x } let jobs = Array.map sqr [|1 .. 100|] let! sqrs = Cloud.Parallel jobs return Array.sum sqrs }

Page 9: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Exception handling

The MBrace Programming Model

let first = cloud { return 17 } let second = cloud { return 25 / 0 } cloud { try let! x,y = first <||> second return Some(x + y) with :? DivideByZeroException -> return None }

Page 10: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Demo

Page 11: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Parallel fold

The MBrace Programming Model

let parFold (folder : 'S -> 'T -> 'S) (combiner : 'S -> 'S -> 'S) (id : 'S) (inputs : 'T []) = cloud { let seqfold (inputs : 'T []) = cloud { return Array.fold folder id inputs } let! n = Cloud.GetWorkerCount () let chunks : 'T [] [] = Array.partition n inputs let! results = chunks |> Array.map seqFold |> Cloud.Parallel return Array.reduce combiner results }

Page 12: MBrace: Large-scale cloud computation with F# (CUFP 2014)

MBrace Data Primitives

Storage entities represented by references.

Conceptually similar to ref cells.

Creation only admissible through the monad.

Immutable*.

Support for SQL, Windows Azure.

Cloud Storage interface

Page 13: MBrace: Large-scale cloud computation with F# (CUFP 2014)

CloudRef

MBrace Data Primitives

module CloudRef = begin val New : 'T -> Cloud<CloudRef<'T>> val Read : CloudRef<'T> -> 'T end

Page 14: MBrace: Large-scale cloud computation with F# (CUFP 2014)

CloudFile

MBrace Data Primitives

module CloudFile = begin val New : (Stream -> unit) -> Cloud<CloudFile> val Read : CloudFile -> (Stream -> 'T) -> Cloud<'T> val Enumerate : string -> Cloud<CloudFile []> end

Page 15: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Demo

Page 16: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Performance

We tested MBrace against Hadoop.

Tests were staged on Windows Azure.

Clusters of 4, 8, 16 and 32 Large Azure instances.

Two algorithms were tested, grep and k-means.

Source code available on github.

Page 17: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Distributed grep

Performance

Find occurrences of given pattern in text files.

Straightforward Map-Reduce algorithm.

Input data was 32, 64, 128 and 256 GB of text.

Page 18: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Distributed grep

Performance

Find occurrences of given pattern in text files.

Straightforward Map-Reduce algorithm.

Input data was 32, 64, 128 and 256 GB of text.

Page 19: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Distributed grep

Performance

Page 20: MBrace: Large-scale cloud computation with F# (CUFP 2014)

K-means

Performance

Centroid computation out of a set of vectors.

Iterative algorithm.

Not naturally describable in Map-Reduce workflows.

Hadoop implementation using Apache Mahout.

Input was 106, randomly generated 100-dimensional points.

Page 21: MBrace: Large-scale cloud computation with F# (CUFP 2014)

K-means

Performance

Page 22: MBrace: Large-scale cloud computation with F# (CUFP 2014)

Conclusions

Declarative, composable computation through the cloud monad.

Explicit, dynamic control over parallelism patterns and granularity.

Exception handling!

On-the-fly deployment through the F# REPL.

Open Source.

Page 23: MBrace: Large-scale cloud computation with F# (CUFP 2014)

http://m-brace.net

Thank you!