Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Post on 27-Dec-2015

226 views 0 download

Transcript of Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

R Meets Julia

Dr Chris Musselle

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Outline

• Julia – What, So What, When? • Julia – Where its currently at• Julia and R• Case Study: Calculating String Similarity

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

- julialang.org

• A flexible dynamic language appropriate for scientific and numerical computing.

• Arrived Feb 2012 after 2 years development at MIT.• Julia 0.3 - released Aug 2014. • Free and open source (MIT Licensed)

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Language Features

• Performance comparable to compiled languages. • Designed with distributed computing in mind.• Dynamic typing, optional declaration, Multiple

dispatch.• Libs written in Julia, git based package management.• Direct calling of C and Fortran libraries.• Interactive REPL “Read-Eval-Print-Loop”

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

The Vision

“We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as MATLAB, as good at gluing programs together as the shell.

… something that provides the distributed power of Hadoop - without the kilobytes of boilerplate Java and XML”

--- Julia’s Authors

Source: http://julialang.org/blog/2012/02/why-we-created-julia/

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Too Good to be True?

• Scientific computing, though requiring high performance, have shifted to use dynamic languages.• More productive.• Human time for expensive than CPU time.

• Many advancements in compiler techniques and language design over the years e.g. JIT.

• Can now greatly mitigate the performance trade-off associated with a dynamic language.

• But has required building from the ground up.

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

So How Fast is Fast?

Source: http://julialang.org/benchmarks/

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Where’s Julia at now?

• Standard Library• Core Syntax, Collections and Data Structures• Linear Algebra, BLAS, Sparse Matrices• Package Manager • Graphics• Unit and Functional Testing• Profiling

• External Packages • Total of 384 external packages written by 138 primary authors.• http://pkg.julialang.org/

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Who Uses it?

• JuliaLang – The Core language• JuliaStats – Statistics• JuliaOpt – Numerical Optimization Library• JuliaSparse – Sparse Matrix Solvers• JuliaDiff – Differentiation Tools

• JuliaWeb – Web stack tools• JuliaGPU – GPU computing

• JuliaQuant – Financial Analysis Libraries• JuliaAstro / JuliaQuantum  – Astronomy/Physics/Chemistry

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

When to Use it?

• Julia allows fast prototyping of code, that is also fast to execute.

• Best used to code up bespoke algorithms.• Julia ecosystem is in its infancy, majority of

packages focus on numerical computation. • May need to re-implement ‘tools’ from scratch e.g.

parsers / data structures / algorithms etc.

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Julia and R?

• Calling R from Julia: https://github.com/lgautier/Rif.jl

• Calling Julia from R:• System calls – New session each time• https://github.com/armgong/RJulia

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Case Study: String Similarity (Edit Distance)

• The number of “edit” operations between two strings where an edit is:• An insertion• A deletion• A substitution

• E.g. Edits between sitting and Kitten• Substitute “s” for “k” at position 1• Substitute “i” for “e” at position 5• Insert “g” at position 6

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Case Study: String Similarity (Edit Distance)

• This particular formulation is known as the Levenshtein Distance.

• Used the optimised “dynamic programing” approach. • Pseudocode available at http://

en.wikipedia.org/wiki/Levenshtein_distance• Applications

• Spell checking • Computational Biology • Natural Language Processing• Speech Recognition

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Case Study: String Similarity (Edit Distance)

• Compared 5 different approaches:• R_lev - Written purely in R.• R_adist - Using the built in adist function in R• Julia – Written purely in Julia• Python_np_lev – Written in Python (using numpy)• Python_c_lev – Python wrapper to a C function

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Results

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Results (minus R lev)

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Key Results

• Pure R implementation was over 10 times slower that adist and Python and 33 time slower than Julia.

• Found Julia 2.5 to 3 times faster than Python and R• Reading line by line <<< Reading in all at once• Python + numpy ~ R’s built in adist

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Summary

• Julia – Certainly has great potential• Strengths – numerical computation in a dynamic “REPL”

language with clean syntax• Weakness’s – Playing catch-up with tools and libraries.

• Early days for integration with other languages.• Julia Other language good though.

• Don’t prototype your next algorithm in R if speed matters!

• Found Julia 2.5 to 3 times faster than Python and R

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

Thank You For Your Attention

Any Questions?

- julialang.orgCalling R from Julia: https://github.com/lgautier/Rif.jlCalling Julia from R: https://github.com/armgong/RJuliaEdit distance: http://en.wikipedia.org/wiki/Levenshtein_distance

Dr. Chris Musselle – Consultantcmusselle@mango-solutions.com

What’s Next?

• Accepted GSoC projects 2014• Libgit2 support • Linear algebra for generic types • Julia + Light Table – IDE development• IJulia Interactive Widgets • 3D Visualization Package for Julia