BSSML16 L8. REST API, Bindings, and Basic Workflows

30
Automating Machine Learning API, bindings, BigMLer and Basic Workflows #BSSML16 December 2016 #BSSML16 Automating Machine Learning December 2016 1 / 29

Transcript of BSSML16 L8. REST API, Bindings, and Basic Workflows

Page 1: BSSML16 L8. REST API, Bindings, and Basic Workflows

Automating Machine LearningAPI, bindings, BigMLer and Basic Workflows

#BSSML16

December 2016

#BSSML16 Automating Machine Learning December 2016 1 / 29

Page 2: BSSML16 L8. REST API, Bindings, and Basic Workflows

Outline

1 Introduction: ML as a System Service

2 ML as a RESTful Cloudy Service

3 Client-side workflows: REST API and bindings

4 Client-side workflows: Bigmler

#BSSML16 Automating Machine Learning December 2016 2 / 29

Page 3: BSSML16 L8. REST API, Bindings, and Basic Workflows

Outline

1 Introduction: ML as a System Service

2 ML as a RESTful Cloudy Service

3 Client-side workflows: REST API and bindings

4 Client-side workflows: Bigmler

#BSSML16 Automating Machine Learning December 2016 3 / 29

Page 4: BSSML16 L8. REST API, Bindings, and Basic Workflows

Machine Learning as a System Service

The goalMachine Learning as a systemlevel service

The means

• APIs: ML building blocks

• Abstraction layer over featureengineering

• Abstraction layer overalgorithms

• Automation

#BSSML16 Automating Machine Learning December 2016 4 / 29

Page 5: BSSML16 L8. REST API, Bindings, and Basic Workflows

The Roadmap

#BSSML16 Automating Machine Learning December 2016 5 / 29

Page 6: BSSML16 L8. REST API, Bindings, and Basic Workflows

Outline

1 Introduction: ML as a System Service

2 ML as a RESTful Cloudy Service

3 Client-side workflows: REST API and bindings

4 Client-side workflows: Bigmler

#BSSML16 Automating Machine Learning December 2016 6 / 29

Page 7: BSSML16 L8. REST API, Bindings, and Basic Workflows

RESTful-ish ML Services

#BSSML16 Automating Machine Learning December 2016 7 / 29

Page 8: BSSML16 L8. REST API, Bindings, and Basic Workflows

RESTful-ish ML Services

#BSSML16 Automating Machine Learning December 2016 8 / 29

Page 9: BSSML16 L8. REST API, Bindings, and Basic Workflows

RESTful-ish ML Services

#BSSML16 Automating Machine Learning December 2016 9 / 29

Page 10: BSSML16 L8. REST API, Bindings, and Basic Workflows

RESTful-ish ML Services

• Excellent abstraction layer

• Transparent data model

• Immutable resources and UUIDs: traceability

• Simple yet effective interaction model

• Easy access from any language (API bindings)

Algorithmic complexity and computing resourcesmanagement problems mostly washed away

#BSSML16 Automating Machine Learning December 2016 10 / 29

Page 11: BSSML16 L8. REST API, Bindings, and Basic Workflows

RESTful done right: Whitebox resources

• Your data, your model

• Model reverse engineering becomesmoot

• Maximizes reach (Web, CLI, desktop,IoT)

#BSSML16 Automating Machine Learning December 2016 11 / 29

Page 12: BSSML16 L8. REST API, Bindings, and Basic Workflows

Outline

1 Introduction: ML as a System Service

2 ML as a RESTful Cloudy Service

3 Client-side workflows: REST API and bindings

4 Client-side workflows: Bigmler

#BSSML16 Automating Machine Learning December 2016 12 / 29

Page 13: BSSML16 L8. REST API, Bindings, and Basic Workflows

Higher-level Machine Learning

#BSSML16 Automating Machine Learning December 2016 13 / 29

Page 14: BSSML16 L8. REST API, Bindings, and Basic Workflows

Example workflow: Batch Centroid

Objective: Label each row in a Dataset with its associated centroid.

We need to...

• Create Dataset

• Create Cluster

• Create BatchCentroid from Clusterand Dataset

• Save BatchCentroid as new Dataset

#BSSML16 Automating Machine Learning December 2016 14 / 29

Page 15: BSSML16 L8. REST API, Bindings, and Basic Workflows

Example workflow: building blocks

curl -X POST "https://bigml.io?$AUTH/dataset" \

-D '{"source": "source/56fbbfea200d5a3403000db7"}'

curl -X POST "https://bigml.io?$AUTH/cluster" \

-D '{"source": "dataset/43ffe231a34fff333000b65"}'

curl -X POST "https://bigml.io?$AUTH/batchcentroid" \

-D '{"dataset": "dataset/43ffe231a34fff333000b65",

"cluster": "cluster/33e2e231a34fff333000b65"}'

curl -X GET "https://bigml.io?$AUTH/dataset/1234ff45eab8c0034334"

#BSSML16 Automating Machine Learning December 2016 15 / 29

Page 16: BSSML16 L8. REST API, Bindings, and Basic Workflows

Example workflow: Web UI

#BSSML16 Automating Machine Learning December 2016 16 / 29

Page 17: BSSML16 L8. REST API, Bindings, and Basic Workflows

Automation via bindingsfrom bigml.api import BigMLapi = BigML()

project = api.create_project({'name': 'ToyBoost'})

orig_source =api.create_source(source,

{"name": "ToyBoost","project": project['resource']})

api.ok(orig_source)

orig_dataset =api.create_dataset(orig_source, {"name": "Boost"})api.ok(orig_dataset)

trainset = api.get_dataset(trainset)

for loop in range(0,10):api.ok(trainset)model = api.create_model(trainset, {

"name": "ToyBoost - Model%d" % loop,"objective_fields": ["letter"],"excluded_fields": ["weight"],"weight_field": "100011"})

api.ok(model)

batchp =api.create_batch_prediction(model, trainset, {

"name": "ToyBoost - Result%d" % loop,"all_fields": True,"header": True})

api.ok(batchp)batchp = api.get_batch_prediction(batchp)batchp_dataset =

api.get_dataset(batchp['object'])trainset = api.create_dataset(batchp_dataset, {})

#BSSML16 Automating Machine Learning December 2016 17 / 29

Page 18: BSSML16 L8. REST API, Bindings, and Basic Workflows

Example workflow: Python bindings

from bigml.api import BigML

api = BigML()

source = 'source/5643d345f43a234ff2310a3e'

# create dataset and cluster, waiting for both

dataset = api.create_dataset(source)

api.ok(dataset)

cluster = api.create_cluster(dataset)

api.ok(cluster)

# create a batch centroid with output to dataset

centroid = api.create_batch_centroid(cluster, dataset,

{'output_dataset': True,

'all_fields': True})

api.ok(centroid)

# wait again, via polling, until the dataset is finished

batch_dataset_id = centroid['object']['output_dataset_resource']

batch_dataset = api.get_dataset(batch_dataset_id)

api.ok(batch_dataset)

#BSSML16 Automating Machine Learning December 2016 18 / 29

Page 19: BSSML16 L8. REST API, Bindings, and Basic Workflows

Client-side automation via bindings

Strengths of bindings-based solutionsVersatility Maximum flexibility and possibility of encapsulation (via

proper engineering)Native Easy to support any programming languageOffline Whitebox models allow local use of resources (e.g.,

real-time predictions)

#BSSML16 Automating Machine Learning December 2016 19 / 29

Page 20: BSSML16 L8. REST API, Bindings, and Basic Workflows

Client-side automation via bindings

Strengths of bindings-based solutionsfrom bigml.model import Model

model_id = 'model/5643d345f43a234ff2310a3e'

# Download of (whitebox) resource

local_model = Model(model_id)

# Purely local calculations

local_model.predict({'plasma glucose': 132})

#BSSML16 Automating Machine Learning December 2016 20 / 29

Page 21: BSSML16 L8. REST API, Bindings, and Basic Workflows

Client-side automation via bindings

Problems of bindings-based solutionsComplexity Lots of details outside the problem domain

Reuse No inter-language compatibilityScalability Client-side workflows are hard to optimize

Not enough abstraction

#BSSML16 Automating Machine Learning December 2016 21 / 29

Page 22: BSSML16 L8. REST API, Bindings, and Basic Workflows

Outline

1 Introduction: ML as a System Service

2 ML as a RESTful Cloudy Service

3 Client-side workflows: REST API and bindings

4 Client-side workflows: Bigmler

#BSSML16 Automating Machine Learning December 2016 22 / 29

Page 23: BSSML16 L8. REST API, Bindings, and Basic Workflows

Higher-level Machine Learning

#BSSML16 Automating Machine Learning December 2016 23 / 29

Page 24: BSSML16 L8. REST API, Bindings, and Basic Workflows

Simple workflow in a one-liner

# 1-clikc cluster

bigmler cluster \

--output-dir output/job

--train data/iris.csv \

--test-datasets output/job/dataset \

--remote \

--to-dataset

# the created dataset id:

cat output/job/batch_centroid_dataset

#BSSML16 Automating Machine Learning December 2016 24 / 29

Page 25: BSSML16 L8. REST API, Bindings, and Basic Workflows

Simple automation: “1-click” tasks

# "1-click" ensemble

bigmler --train data/iris.csv \

--number-of-models 500 \

--sample-rate 0.85 \

--output-dir output/iris-ensemble \

--project "vssml tutorial"

# "1-click" dataset with parameterized fields

bigmler --train data/diabetes.csv \

--no-model \

--name "4-featured diabetes" \

--dataset-fields \

"plasma glucose,insulin,diabetes pedigree,diabetes" \

--output-dir output/diabetes \

--project vssml_tutorial

#BSSML16 Automating Machine Learning December 2016 25 / 29

Page 26: BSSML16 L8. REST API, Bindings, and Basic Workflows

Rich, parameterized workflows: cross-validation

bigmler analyze --cross-validation \ # parameterized input

--dataset $(cat output/diabetes/dataset) \

--k-folds 3 \ # number of folds during validation

--output-dir output/diabetes-validation

#BSSML16 Automating Machine Learning December 2016 26 / 29

Page 27: BSSML16 L8. REST API, Bindings, and Basic Workflows

Rich, parameterized workflows: feature selection

bigmler analyze --features \ # parameterized input

--dataset $(cat output/diabetes/dataset) \

--k-folds 2 \ # number of folds during validation

--staleness 2 \ # stop criterium

--optimize precision \ # optimization metric

--penalty 1 \ # algorithm parameter

--output-dir output/diabetes-features-selection

#BSSML16 Automating Machine Learning December 2016 27 / 29

Page 28: BSSML16 L8. REST API, Bindings, and Basic Workflows

Client-side Machine Learning Automation

Problems of client-side solutionsComplex Too fine-grained, leaky abstractions

Cumbersome Error handling, network issuesHard to reuse Tied to a single programming languageHard to scale Parallelization again a problemHard to generalize CLI tools like bigmler hide complexity at the cost of

flexibility

Algorithmic complexity and computing resources managementproblems mostly washed away are back!

#BSSML16 Automating Machine Learning December 2016 28 / 29

Page 29: BSSML16 L8. REST API, Bindings, and Basic Workflows

Client-side Machine Learning Automation

Problems of client-side solutionsComplex Too fine-grained, leaky abstractions

Cumbersome Error handling, network issuesHard to reuse Tied to a single programming languageHard to scale Parallelization again a problemHard to generalize CLI tools like bigmler hide complexity at the cost of

flexibility

Algorithmic complexity and computing resources managementproblems mostly washed away are back!

#BSSML16 Automating Machine Learning December 2016 28 / 29

Page 30: BSSML16 L8. REST API, Bindings, and Basic Workflows

Questions?

#BSSML16 Automating Machine Learning December 2016 29 / 29