Future of ai on the jvm

Post on 21-Apr-2017

11.650 views 0 download

Transcript of Future of ai on the jvm

Future of AI on the JVM

Scala Days Amsterdam 2015

Adam Gibson Creator of Deeplearning4j (and 4s :)

What is AI?● Not Terminator (despite our name)● Many subfields● Our focus: Machine learning

Big Data?

Problem Space● Spam Classification● Summarization● Face Detection● Eye Tracking● Targeted Ads● Recommendation Engines

Current State of ML● Simpler models● Most of industry barely uses Logistic Reg.● Many problems are binary

o e.g. fraud, spam● Some unsupervised (clustering, reccos)● Lots of ML frameworks on JVM

ML Frameworks on JVM...● Apache Mahout● Spark’s MLlib● Weka (is that R?)

ML GUIs● Prediction.io● Encog

Problems● Monolithic● Makes assumptions about data● Hard to use ● No separation of concerns

Ring a Bell?● We call that “Monolithic”● Separate ML concerns:

Data Pipelines/VectorizationScoringModel TrainingEvaluation

Micro-Services + ML?● Kinda like micro-services● Reduce lock in● Take math, data cleaning, model training,

choosing algorithms ...● … and separate them

Math● Parametric Models (Matrices!)● Non Parametric (Random forest)● Focusing on Matrices (the hard part of ML

systems)

Matrices● NDArrays ( > 2d)● Tensors (think of pages of matrices)● Example: 2 x 2 x 2 (2 2x 2 matrices)● ^^THIS IS UNCLEAR. Two 2 x 2 matrices?● Applies to graphs w/ sparse representations

Chips/Hardware/Matrices● CPUs - We work with these● GPUs - CUDA ditto● FPGAs

o Intel bought Altera, an FPGA maker, for $17 billion this month

o The edge, the cloud

Why New Chips?

Why New Chips?● See the numbers yourself:● http://www.slideshare.net/airbots/cuda-2933

0283● http://devblogs.nvidia.com/parallelforall/bidm

ach-machine-learning-limit-gpus/● http://jcuda.org

Mixed clusters● GPUs aren’t good for all workloads● Because latency● Need to upload data: not good for small

problems● Mixed CPU/GPU clusters are best bet

Data Pipelines● More data will be binary● Frameworks today can’t process binary well● Binary data has different semantics ● Moving windows for audio● 3d for images ...

People Roll Their Own b/c● Current frameworks assume clean data :(● Pipelines are brittle, hard to maintain

● Moving towards being composable (reuse)

Dedicated Libraries● Let’s focus on vectorization -- now!● Because IoT● Because more access to raw media

● Should fit into current big data frameworks

Scoring● AUC● F1● Different Loss Functions● Hyper parameter optimization

All independent● These things work for different models● Shouldn’t be tied to a particular system● Should be embeddable

Training● Split Train/Test● Sample data (no, not all the data ;) to

validate model● Increasingly compute intensive

Deep Learning● Most done in Python...● Norm training time is measured in

hours/days -- weeks!?● Work being done in HPC (Model parallelism)● Distbelief (Data parallelism)

Automatic Learning● Good at unstructured data● Images, Text, Audio and Sensors● Quick, baseline feature engineering

● Not good at feature introspection

Or are they?

TSNE

Where Does Scala Fit In?● Akka - Real time streaming analytics/micro services● Spark - Dataframes/number crunching● JVM Key/Value Stores● Pistachio (powers Yahoo’s ad network)

o http://yahooeng.tumblr.com/post/118860853846/distributed-word2vec-on-top-of-pistachio

The Way We Learn Now● Monolithic ML frameworks● No per-chip optimizations● No Tensors (come on guys, it’s 2015...)● Need isolation and less lockin● JVM is the platform to make it happen

Other Links● http://deeplearning4j.org/● http://nd4j.org/● https://github.com/deeplearning4j/Canova

Questions?● adam@skymind.io● @agibsonccc● github.com/agibsonccc