H2O 3 REST API Overview

Post on 16-Apr-2017

H2O 3 REST API OverviewRaymond PeckDirector of Product Engineering, H2O.ai

Long version of this content is here:


Why?• Use the REST API to drive H2O from an external script or

program in any language.

• Use the REST API when you want API stability.

• Use the Java API if you want to call the internal APIs from Java, Scala, etc.

Who?• Software developers proficient in a scripting or a

programming language.

• Those familiar with nested data representations like JSON.

• Those familiar with the functionality of H2O

• at least well enough to convert a Flow, R or Python script from a Data Scientist.

What?Any H2O functionality in Flow, R or Python can be accessed via the REST API - data import - model building - model comparison - generating predictions - admin functions

How?You can call the REST API:

• from your browser

• using browser tools such as Postman in Chrome

• using curl

• using the language of your choice

BindingsFor Python and R simply use the supplied packages.For JVM clients: - H2O currently ships with REST API payload POJOs. - We're working on endpoint proxies. - These are generated as part of the build using a Python script.

Versioning and Stability, Part 1• Current version is 3.

• Non-breaking changes are allowed; examples:

• adding output fields

• adding parameters with defaults that maintain old behavior

• Well-written clients should not break as functionality is added to version 3.

Versioning and Stability, Part 2• Backward compatibility is tested with each release,

including nightlies.

• Functionality under development is version 99.

• /99 endpoints can be called via /EXPERIMENTAL.

Examples: - /3/Frames - /3/Frames/my_frame - /3/Frames/my_frame/summary - /3/Models - /3/Models/my_model - /3/Cloud© H2O.ai, 2015 10

HTTP Verbs• GET requests fetch data and do not cause side effects.

GET /3/Frames/my_frame_name?row_offset=10000&row_count=1000

• POST requests create a new object.

They use x-www-form-urlencoded input format.

• DELETE requests delete an object.

HTTP Status Codes• 200 OK (all is well)

• 400 Bad Request (the request URL is bad)

• 404 Not Found (a specified object was not found)

• 412 Precondition Failed (bad parameters or other problem handling the request)

• 500 Internal Server Error (unanticipated failure)

Schemas, Part 1Schemas define input and output formats.

Schemas fields can be simple values or nested schemas, or arrays or dictionaries (maps) of these.

Schemas, Part 2• type

• default value

• help string

• direction (in, out or inout)

• required

• importance

{ "__meta": { "schema_name": "ModelParameterSchemaV3", "schema_type": "Iced", "schema_version": 3 }, "actual_value": { "URL": "/3/Models/prostate_glm", "__meta": { "schema_name": "ModelKeyV3", "schema_type": "Key<Model>", "schema_version": 3 }, "name": "prostate_glm", "type": "Key<Model>" }, "default_value": null, "help": "Destination id for this model; auto-generated if not specified", "label": "model_id", "level": "critical", "name": "model_id", "required": false, "type": "Key<Model>", "values": [] },

Error Condition Payloads• return a non-2xx HTTP status code

• return standardized error payloads:

• end-user message

• developer message

• HTTP status

• optional dictionary of revelant values

Example Error { "__meta": { "schema_type": "H2OError", ... }, "timestamp": 1438634936808, "error_url": "/3/Frames/missing_frame", "msg": "Object 'missing_frame' not found for argument: key", "dev_msg": "Object 'missing_frame' not found for argument: key", "http_status": 404, "values": { "argument": "key", "name": "missing_frame" }, "exception_type": "water.exceptions.H2OKeyNotFoundArgumentException", "exception_msg": "Object 'missing_frame' not found for argument: key", "stacktrace": [ ... ] }

Example EndpointsFor the complete list check the reference docs or /Metadata/endpoints. As of August 6, 2015 there are 105 endpoints:

Loading and parsing data filesFrames and ModelsAdministrative and utilityJob management and pollingPersistence© H2O.ai, 2015 18

Loading and parsing data filesGET /3/ImportFilesImport raw data files into a single-column H2O Frame.

POST /3/ParseSetupGuess the parameters for parsing raw byte-oriented data into an H2O Frame.

POST /3/ParseParse a raw byte-oriented Frame into a useful columnar data Frame.

FramesGET /3/Frames - Return all Frames in the H2O distributed K/V store.

GET /3/Frames/(?.*) - Return the specified Frame.

GET /3/Frames/(?.*)/summary - Return a Frame, including the histograms, after forcing computation of rollups.

GET /3/Frames/(?.*)/columns/(?.*)/summary - Return the summary metrics for a column, e.g. mins, maxes, mean, sigma, percentiles, etc.

DELETE /3/Frames/(?.*)DELETE /3/Frames

Building modelsGET /3/ModelBuildersReturn the Model Builder metadata for all available algorithms.

GET /3/ModelBuilders/(?.*)Return the Model Builder metadata for the specified algorithm.

POST /3/ModelBuilders/deeplearning/parametersValidate a set of Deep Learning model builder parameters.

POST /3/ModelBuilders/deeplearningTrain a Deep Learning model on the specified Frame.

Accessing and using modelsGET /3/ModelsReturn all Models from the H2O distributed K/V store.

GET /3/Models/(?.*?)(\.java)?Return the specified Model. Use .java extension for Java POJO.

POST /3/Predictions/models/(?.*)/frames/(?.*)Generate predictions for the specified Frame and Model.

DELETE /3/Models/(?.*)DELETE /3/Models

Administrative and utilityGET /3/AboutReturn information about this H2O cluster.

GET /3/CloudDetermine the status of the nodes in the H2O cloud.

HEAD /3/CloudDetermine the status of the nodes in the H2O cloud.

Job management and pollingGET /3/JobsGet a list of all the H2O Jobs (long-running actions).

GET /3/Jobs/(?.*)Get the status of the given H2O Job (long-running action).

POST /3/Jobs/(?.*)/cancelCancel a running job.

PersistencePOST /3/Frames/(?.*)/exportExport a Frame to the given path with optional overwrite.

POST /99/Models.bin/(?.*)Import given binary model into H2O.

GET /99/Models.bin/(?.*)Export given model.

Example workflows using curlSome fields have been omitted for brevity.

When using curl you can pipe (|) the output through python -m json.tool to pretty-print the JSON:curl -X GET http://localhost:54321/3/Frames | python -m json.tool

GBM_Example.flow, Step 1: ImportIn Flow:importFiles ["http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"]

In curl:curl -X GET\ http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz

© H2O.ai, 2015 27

GBM_Example.flow, Step 1 Result{ "__meta": { "schema_name": "ImportFilesV3", "schema_type": "Iced", "schema_version": 3 }, "destination_frames": [ "http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz" ], "fails": [], "files": [ "http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz" ], "path": "http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"}

GBM_Example.flow, Step 2: ParseSetupIn Flow:setupParse paths: ["http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"]

In curl:curl -X POST --data \ 'source_frames=["http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"]'

GBM_Example.flow, Step 2 Result{ "source_frames": [ { "URL": "\/3\/Frames\/http:\/\/s3.amazonaws.com\/h2o-public-test-data\/smalldata\/flow_examples\/arrhythmia.csv.gz" } ], "parse_type": "CSV", "separator": 44, "column_names": null, "column_types": [ "Numeric", "Numeric", ... ], "destination_frame": "arrhythmia.hex", "header_lines": 0, "number_columns": 280, "data": [ [ "75", "0", "190", ... ], ... ]

GBM_Example.flow, Step 3: ParseIn Flow:parseFiles paths: ["http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"] destination_frame: "arrhythmia.hex" parse_type: "CSV" separator: 44 number_columns: 280 single_quotes: false column_names: null column_types: ["Numeric","Numeric",...,"Numeric"] delete_on_done: true check_header: -1 chunk_size: 4194304

© H2O.ai, 2015 31

GBM_Example.flow, Step 3: ParseIn curl:curl -X POST --data \'destination_frame=arrhythmia.hex&\source_frames=["http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"]&\parse_type=CSV\&separator=44&\number_columns=280&\single_quotes=false&\column_names=&\column_types=["Numeric"...,"Numeric","Numeric","Numeric","Numeric","Numeric","Numeric","Numeric"]&\check_header=-1&\delete_on_done=true&\chunk_size=4194304'

GBM_Example.flow, Step 3 Result{ "job": { "key": { "URL": "\/3\/Jobs\/$03010a010a7f32d4ffffffff$_b98fc5bba38d21ea53da2a0834c44f7a" }, "description": "Parse", "status": "RUNNING", "progress_msg": "Ingesting files.", "dest": { "URL": "\/3\/Frames\/arrhythmia.hex" }, "exception": null, "messages": [ ], "error_count": 0 },...}

GBM_Example.flow, Step 4: Poll for job completionFlow polls for Job completion automagically:

© H2O.ai, 2015 34

GBM_Example.flow, Step 4: Result "jobs": [ { "key": { "URL": "\/3\/Jobs\/$03010a010a7f32d4ffffffff$_b98fc5bba38d21ea53da2a0834c44f7a" }, "description": "Parse", "status": "RUNNING", "progress_msg": "Ingesting files.", "dest": { "name": "arrhythmia.hex", "URL": "\/3\/Frames\/arrhythmia.hex" }, "error_count": 0, "exception": null, "messages": [], } ]

GBM_Example.flow, Step 5: Train the ModelIn Flow:buildModel 'gbm', {"model_id":"gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1","training_frame":"arrhythmia.hex","score_each_iteration":false,"response_column":"C1","ntrees":"20","max_depth":5,"min_rows":"25","nbins":20,"learn_rate":"0.3","distribution":"AUTO","balance_classes":false,"max_confusion_matrix_size":20,"max_hit_ratio_k":10,"class_sampling_factors":[],"max_after_balance_size":5,"seed":0}

GBM_Example.flow, Step 5: Train the ModelIn curl:curl -X POST --data \'model_id=gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1&\training_frame=arrhythmia.hex&response_column=C1&\score_each_iteration=false&ntrees=20&max_depth=5&\min_rows=25&nbins=20&learn_rate=0.3&distribution=AUTO&\balance_classes=false&max_confusion_matrix_size=20&\max_hit_ratio_k=10&class_sampling_factors=&\max_after_balance_size=5&seed=0'

GBM_Example.flow, Step 5: Result{ "job": { "key": { "URL": "\/3\/Jobs\/$03010a010a7f32d4ffffffff$_881e60f52af792b71d20540604b742dd" }, "description": "GBM", "status": "RUNNING", "progress_msg": "Running...", "dest": { "URL": "\/3\/Models\/gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1", ... }, ... }, "algo": "gbm", "algo_full_name": "Gradient Boosting Machine", "messages": [], "error_count": 0, "parameters": [ ... ]}

GBM_Example.flow, Step 6: Poll for job completionSame as for Parse

GBM_Example.flow, Step 7: View the ModelIn Flow:

getModel "gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1"

In curl:curl -X GET ''

GBM_Example.flow, Step 7: Result { "model_id": { "URL": "\/3\/Models\/gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1" }, "algo": "gbm", "parameters": [...], "output": { "__meta": { "schema_name": "GBMModelOutputV3", }, "model_category": "Regression", "scoring_history": { ... }, "training_metrics": { "model_category": "Regression", "MSE": 31.32188458883, "r2": 0.88422887487626, "mean_residual_deviance": 31.32188458883 }, "status": "DONE", "run_time": 3211,

GBM_Example.flow, Step 8: PredictionsIn Flow:predict model: "gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1", frame: "arrhythmia.hex", predictions_frame: "prediction-9d6f23f3-45c2-4e1f-a48e-393b1b7de6db"

In curl:curl -X GET \ '\ ?column_offset=0&column_count=20'

GBM_Example.flow, Step 8: Result "model_metrics": [ { "predictions": { "frame_id": { "URL": "\/3\/Frames\/prediction-9d6f23f3-45c2-4e1f-a48e-393b1b7de6db" }, "total_column_count": 1, "rows": 452, "columns": [ { "label": "predict", "data": [ 35.275735166748, 53.253980894466, 41.531820529033 ], } ], "MSE": 31.321880321916, "r2": 0.88422889064751, "mean_residual_deviance": 31.321880321916

Documentation• long version of this content is here:


• reference in the Help sidebar in Flow

• reference on the H2O.ai website, http://docs.h2o.ai/

• reference doc is generated via the /Metadata endpoints, so it's always current

© H2O.ai, 2015 45