Sparkling Water Meetup: Deep Learning for Public Safety

30
Michal Malohlava, Alex Tellez, Amy Wang and H2O.ai Building Machine Learning Applications with Sparkling Water Series 04/22/2015 Meetup Deep Learning for Public Safety: Fighting Crime with Open City Data

Transcript of Sparkling Water Meetup: Deep Learning for Public Safety

Michal Malohlava, Alex Tellez, Amy Wang and H2O.ai

Building Machine Learning Applications with Sparkling Water Series

04/22/2015 Meetup

Deep Learning for Public Safety: Fighting Crime with Open City

Data

Sparkling Water Downloadhttp://h2o.ai/download/

http://h2o-release.s3.amazonaws.com/sparkling-water/master/95/index.html

Where is the code?https://github.com/h2oai/sparkling-water/

blob/master/examples/scripts/README.md

Scalable Machine Learning

For Smarter Applications

Smarter Applications

Scalable Applications

Distributed

Able to process huge amount of data from different sources

Easy to develop and experiment

Powerful machine learning engine inside

BUT how to build

them?

Build an application with …

?

…with Spark and H2O

Open-source distributed execution platform

User-friendly API for data transformation based on RDD

Platform components - SQL, MLLib, text mining

Multitenancy

Large and active community

Open-source scalable machine learning platform

Tuned for efficient computation and memory use

Production ready machine learning algorithms

R, Python, Java, Scala APIs

Interactive UI, robust data parser

Sparkling WaterProvides

Transparent integration of H2O with Spark ecosystem

Transparent use of H2O data structures and algorithms with Spark API

Excels in existing Spark workflows requiring advanced Machine Learning algorithms

Platform for building Smarter Applications

Sparkling Water Design

spark-submitSpark Master JVM

Spark Worker

JVM

Spark Worker

JVM

Spark Worker

JVM

Sparkling Water Cluster

Spark Executor JVM

H2O

Spark Executor JVM

H2O

Spark Executor JVM

H2O

Sparkling App

implements

?

Regular Spark applicationcontaining also Sparkling Water

classes

Data Distribution

H2O

H2O

H2O

Sparkling Water Cluster

Spark Executor JVMData

Source (e.g. HDFS)

H2O RDD

Spark Executor JVM

Spark Executor JVM

Spark RDD

RDDs and DataFramesshare same memory

space

toRDD

toH2OFrame

Lets build an application !

Deep Learning for Public Safety: Fighting Crime with OPEN City

Data

Predict probability of arrest

CHICAGO

OPEN CRIME DATA

Crime Dataset: Crimes from 2001 - Present Day~ 4.6 million crimes

THE WINDY CITY

Harvest Chicago Weather data since 2001

SOCIOECONOMIC FACTORS

Crimes segmented into Community Area IDsPercent of households below poverty, unemployed, etc.

H2O.ai Machine Intelligence

22

Crime("02/08/2015 11:43:58 PM", 1811, “NARCOTICS", “STREET", false, 422, 4, 7, 46, 18)

Crime("02/08/2015 11:00:39 PM", 1150, "DECEPTIVE PRACTICE", “RESIDENCE", false, 923, 9, 14, 63, 11)

ARREST?

Predict arrest prob for crime events

ML Workflow

H2O.ai Machine Intelligence

CrimesCensusWeather

24

Data munging

Spark SQL join

Split table

Collect models metrics

Evaluate models and score new crimes

Deep Learning GBM

Application environment

sparkling-shell

Where is the code?https://github.com/h2oai/sparkling-water/

blob/master/examples/scripts/

Sparkling Water Downloadhttp://h2o.ai/download/

http://h2o-release.s3.amazonaws.com/sparkling-water/master/95/index.html

More info about apphttps://kddnuggets.com/2015/04/deep-learning-fight-crime.html

Complete app code at GitHub

https://github.com/h2oai/sparkling-water/

Checkout H2O.ai Training Books

http://learn.h2o.ai/

Checkout H2O.ai Blog

http://h2o.ai/blog/

Checkout H2O.ai Youtube Channel

https://www.youtube.com/user/0xdata

Checkout GitHub

https://github.com/h2oai/sparkling-water

Meetups

https://meetup.com/

More info

Learn more at h2o.ai Follow us at @h2oai

Thank you!Sparkling Water is

open-source ML application platform

combining power of Spark and H2O