Post on 16-Apr-2017
H2O 3 REST API OverviewRaymond PeckDirector of Product Engineering, H2O.ai
rpeck@h2oai.com© H2O.ai, 2015 1
Long version of this content is here:
https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/api/REST/h2o_3_rest_api_overview.md
© H2O.ai, 2015 2
Why?• Use the REST API to drive H2O from an external script or
program in any language.
• Use the REST API when you want API stability.
• Use the Java API if you want to call the internal APIs from Java, Scala, etc.
© H2O.ai, 2015 3
Who?• Software developers proficient in a scripting or a
programming language.
• Those familiar with nested data representations like JSON.
• Those familiar with the functionality of H2O
• at least well enough to convert a Flow, R or Python script from a Data Scientist.
© H2O.ai, 2015 4
What?Any H2O functionality in Flow, R or Python can be accessed via the REST API - data import - model building - model comparison - generating predictions - admin functions
© H2O.ai, 2015 5
How?You can call the REST API:
• from your browser
• using browser tools such as Postman in Chrome
• using curl
• using the language of your choice
© H2O.ai, 2015 6
BindingsFor Python and R simply use the supplied packages.For JVM clients: - H2O currently ships with REST API payload POJOs. - We're working on endpoint proxies. - These are generated as part of the build using a Python script.
We'll work with you to generate bindings for other languages. A user easily did C#.© H2O.ai, 2015 7
Versioning and Stability, Part 1• Current version is 3.
• Non-breaking changes are allowed; examples:
• adding output fields
• adding parameters with defaults that maintain old behavior
• Well-written clients should not break as functionality is added to version 3.
© H2O.ai, 2015 8
Versioning and Stability, Part 2• Backward compatibility is tested with each release,
including nightlies.
• Functionality under development is version 99.
• /99 endpoints can be called via /EXPERIMENTAL.
© H2O.ai, 2015 9
URLshttp://your_server:54321/version/Resource{/...}
Examples: - /3/Frames - /3/Frames/my_frame - /3/Frames/my_frame/summary - /3/Models - /3/Models/my_model - /3/Cloud© H2O.ai, 2015 10
HTTP Verbs• GET requests fetch data and do not cause side effects.
GET /3/Frames/my_frame_name?row_offset=10000&row_count=1000
• POST requests create a new object.
They use x-www-form-urlencoded input format.
• DELETE requests delete an object.
• HEAD requests return just the HTTP status.© H2O.ai, 2015 11
HTTP Status Codes• 200 OK (all is well)
• 400 Bad Request (the request URL is bad)
• 404 Not Found (a specified object was not found)
• 412 Precondition Failed (bad parameters or other problem handling the request)
• 500 Internal Server Error (unanticipated failure)
© H2O.ai, 2015 12
Schemas, Part 1Schemas define input and output formats.
Schemas fields can be simple values or nested schemas, or arrays or dictionaries (maps) of these.
© H2O.ai, 2015 13
Schemas, Part 2• type
• default value
• help string
• direction (in, out or inout)
• required
• importance
• allowed values for enumerated fields© H2O.ai, 2015 14
{ "__meta": { "schema_name": "ModelParameterSchemaV3", "schema_type": "Iced", "schema_version": 3 }, "actual_value": { "URL": "/3/Models/prostate_glm", "__meta": { "schema_name": "ModelKeyV3", "schema_type": "Key<Model>", "schema_version": 3 }, "name": "prostate_glm", "type": "Key<Model>" }, "default_value": null, "help": "Destination id for this model; auto-generated if not specified", "label": "model_id", "level": "critical", "name": "model_id", "required": false, "type": "Key<Model>", "values": [] },
© H2O.ai, 2015 15
Error Condition Payloads• return a non-2xx HTTP status code
• return standardized error payloads:
• end-user message
• developer message
• HTTP status
• optional dictionary of revelant values
• exception information if applicable.© H2O.ai, 2015 16
Example Error { "__meta": { "schema_type": "H2OError", ... }, "timestamp": 1438634936808, "error_url": "/3/Frames/missing_frame", "msg": "Object 'missing_frame' not found for argument: key", "dev_msg": "Object 'missing_frame' not found for argument: key", "http_status": 404, "values": { "argument": "key", "name": "missing_frame" }, "exception_type": "water.exceptions.H2OKeyNotFoundArgumentException", "exception_msg": "Object 'missing_frame' not found for argument: key", "stacktrace": [ ... ] }
© H2O.ai, 2015 17
Example EndpointsFor the complete list check the reference docs or /Metadata/endpoints. As of August 6, 2015 there are 105 endpoints:
Loading and parsing data filesFrames and ModelsAdministrative and utilityJob management and pollingPersistence© H2O.ai, 2015 18
Loading and parsing data filesGET /3/ImportFilesImport raw data files into a single-column H2O Frame.
POST /3/ParseSetupGuess the parameters for parsing raw byte-oriented data into an H2O Frame.
POST /3/ParseParse a raw byte-oriented Frame into a useful columnar data Frame.
© H2O.ai, 2015 19
FramesGET /3/Frames - Return all Frames in the H2O distributed K/V store.
GET /3/Frames/(?.*) - Return the specified Frame.
GET /3/Frames/(?.*)/summary - Return a Frame, including the histograms, after forcing computation of rollups.
GET /3/Frames/(?.*)/columns/(?.*)/summary - Return the summary metrics for a column, e.g. mins, maxes, mean, sigma, percentiles, etc.
DELETE /3/Frames/(?.*)DELETE /3/Frames
© H2O.ai, 2015 20
Building modelsGET /3/ModelBuildersReturn the Model Builder metadata for all available algorithms.
GET /3/ModelBuilders/(?.*)Return the Model Builder metadata for the specified algorithm.
POST /3/ModelBuilders/deeplearning/parametersValidate a set of Deep Learning model builder parameters.
POST /3/ModelBuilders/deeplearningTrain a Deep Learning model on the specified Frame.
© H2O.ai, 2015 21
Accessing and using modelsGET /3/ModelsReturn all Models from the H2O distributed K/V store.
GET /3/Models/(?.*?)(\.java)?Return the specified Model. Use .java extension for Java POJO.
POST /3/Predictions/models/(?.*)/frames/(?.*)Generate predictions for the specified Frame and Model.
DELETE /3/Models/(?.*)DELETE /3/Models
© H2O.ai, 2015 22
Administrative and utilityGET /3/AboutReturn information about this H2O cluster.
GET /3/CloudDetermine the status of the nodes in the H2O cloud.
HEAD /3/CloudDetermine the status of the nodes in the H2O cloud.
© H2O.ai, 2015 23
Job management and pollingGET /3/JobsGet a list of all the H2O Jobs (long-running actions).
GET /3/Jobs/(?.*)Get the status of the given H2O Job (long-running action).
POST /3/Jobs/(?.*)/cancelCancel a running job.
© H2O.ai, 2015 24
PersistencePOST /3/Frames/(?.*)/exportExport a Frame to the given path with optional overwrite.
POST /99/Models.bin/(?.*)Import given binary model into H2O.
GET /99/Models.bin/(?.*)Export given model.
© H2O.ai, 2015 25
Example workflows using curlSome fields have been omitted for brevity.
When using curl you can pipe (|) the output through python -m json.tool to pretty-print the JSON:curl -X GET http://localhost:54321/3/Frames | python -m json.tool
© H2O.ai, 2015 26
GBM_Example.flow, Step 1: ImportIn Flow:importFiles ["http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"]
In curl:curl -X GET http://127.0.0.1:54321/3/ImportFiles?path=\ http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz
© H2O.ai, 2015 27
GBM_Example.flow, Step 1 Result{ "__meta": { "schema_name": "ImportFilesV3", "schema_type": "Iced", "schema_version": 3 }, "destination_frames": [ "http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz" ], "fails": [], "files": [ "http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz" ], "path": "http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"}
© H2O.ai, 2015 28
GBM_Example.flow, Step 2: ParseSetupIn Flow:setupParse paths: ["http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"]
In curl:curl -X POST http://127.0.0.1:54321/3/ParseSetup --data \ 'source_frames=["http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"]'
© H2O.ai, 2015 29
GBM_Example.flow, Step 2 Result{ "source_frames": [ { "URL": "\/3\/Frames\/http:\/\/s3.amazonaws.com\/h2o-public-test-data\/smalldata\/flow_examples\/arrhythmia.csv.gz" } ], "parse_type": "CSV", "separator": 44, "column_names": null, "column_types": [ "Numeric", "Numeric", ... ], "destination_frame": "arrhythmia.hex", "header_lines": 0, "number_columns": 280, "data": [ [ "75", "0", "190", ... ], ... ]
© H2O.ai, 2015 30
GBM_Example.flow, Step 3: ParseIn Flow:parseFiles paths: ["http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"] destination_frame: "arrhythmia.hex" parse_type: "CSV" separator: 44 number_columns: 280 single_quotes: false column_names: null column_types: ["Numeric","Numeric",...,"Numeric"] delete_on_done: true check_header: -1 chunk_size: 4194304
© H2O.ai, 2015 31
GBM_Example.flow, Step 3: ParseIn curl:curl -X POST http://127.0.0.1:54321/3/Parse --data \'destination_frame=arrhythmia.hex&\source_frames=["http://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/arrhythmia.csv.gz"]&\parse_type=CSV\&separator=44&\number_columns=280&\single_quotes=false&\column_names=&\column_types=["Numeric"...,"Numeric","Numeric","Numeric","Numeric","Numeric","Numeric","Numeric"]&\check_header=-1&\delete_on_done=true&\chunk_size=4194304'
© H2O.ai, 2015 32
GBM_Example.flow, Step 3 Result{ "job": { "key": { "URL": "\/3\/Jobs\/$03010a010a7f32d4ffffffff$_b98fc5bba38d21ea53da2a0834c44f7a" }, "description": "Parse", "status": "RUNNING", "progress_msg": "Ingesting files.", "dest": { "URL": "\/3\/Frames\/arrhythmia.hex" }, "exception": null, "messages": [ ], "error_count": 0 },...}
© H2O.ai, 2015 33
GBM_Example.flow, Step 4: Poll for job completionFlow polls for Job completion automagically:
© H2O.ai, 2015 34
GBM_Example.flow, Step 4: Result "jobs": [ { "key": { "URL": "\/3\/Jobs\/$03010a010a7f32d4ffffffff$_b98fc5bba38d21ea53da2a0834c44f7a" }, "description": "Parse", "status": "RUNNING", "progress_msg": "Ingesting files.", "dest": { "name": "arrhythmia.hex", "URL": "\/3\/Frames\/arrhythmia.hex" }, "error_count": 0, "exception": null, "messages": [], } ]
© H2O.ai, 2015 35
GBM_Example.flow, Step 5: Train the ModelIn Flow:buildModel 'gbm', {"model_id":"gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1","training_frame":"arrhythmia.hex","score_each_iteration":false,"response_column":"C1","ntrees":"20","max_depth":5,"min_rows":"25","nbins":20,"learn_rate":"0.3","distribution":"AUTO","balance_classes":false,"max_confusion_matrix_size":20,"max_hit_ratio_k":10,"class_sampling_factors":[],"max_after_balance_size":5,"seed":0}
© H2O.ai, 2015 36
GBM_Example.flow, Step 5: Train the ModelIn curl:curl -X POST http://127.0.0.1:54321/3/ModelBuilders/gbm --data \'model_id=gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1&\training_frame=arrhythmia.hex&response_column=C1&\score_each_iteration=false&ntrees=20&max_depth=5&\min_rows=25&nbins=20&learn_rate=0.3&distribution=AUTO&\balance_classes=false&max_confusion_matrix_size=20&\max_hit_ratio_k=10&class_sampling_factors=&\max_after_balance_size=5&seed=0'
© H2O.ai, 2015 37
GBM_Example.flow, Step 5: Result{ "job": { "key": { "URL": "\/3\/Jobs\/$03010a010a7f32d4ffffffff$_881e60f52af792b71d20540604b742dd" }, "description": "GBM", "status": "RUNNING", "progress_msg": "Running...", "dest": { "URL": "\/3\/Models\/gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1", ... }, ... }, "algo": "gbm", "algo_full_name": "Gradient Boosting Machine", "messages": [], "error_count": 0, "parameters": [ ... ]}
© H2O.ai, 2015 38
GBM_Example.flow, Step 6: Poll for job completionSame as for Parse
© H2O.ai, 2015 39
GBM_Example.flow, Step 7: View the ModelIn Flow:
getModel "gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1"
In curl:curl -X GET 'http://127.0.0.1:54321/3/Models/gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1'
© H2O.ai, 2015 40
GBM_Example.flow, Step 7: Result { "model_id": { "URL": "\/3\/Models\/gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1" }, "algo": "gbm", "parameters": [...], "output": { "__meta": { "schema_name": "GBMModelOutputV3", }, "model_category": "Regression", "scoring_history": { ... }, "training_metrics": { "model_category": "Regression", "MSE": 31.32188458883, "r2": 0.88422887487626, "mean_residual_deviance": 31.32188458883 }, "status": "DONE", "run_time": 3211,
© H2O.ai, 2015 41
GBM_Example.flow, Step 8: PredictionsIn Flow:predict model: "gbm-51b9780b-70d0-40d0-9b5a-c723a3f358c1", frame: "arrhythmia.hex", predictions_frame: "prediction-9d6f23f3-45c2-4e1f-a48e-393b1b7de6db"
In curl:curl -X GET \ 'http://127.0.0.1:54321/3/Frames/prediction-9d6f23f3-45c2-4e1f-a48e-393b1b7de6db\ ?column_offset=0&column_count=20'
© H2O.ai, 2015 42
GBM_Example.flow, Step 8: Result "model_metrics": [ { "predictions": { "frame_id": { "URL": "\/3\/Frames\/prediction-9d6f23f3-45c2-4e1f-a48e-393b1b7de6db" }, "total_column_count": 1, "rows": 452, "columns": [ { "label": "predict", "data": [ 35.275735166748, 53.253980894466, 41.531820529033 ], } ], "MSE": 31.321880321916, "r2": 0.88422889064751, "mean_residual_deviance": 31.321880321916
© H2O.ai, 2015 43
Documentation• long version of this content is here:
https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/api/REST/h2o_3_rest_api_overview.md
• reference in the Help sidebar in Flow
• reference on the H2O.ai website, http://docs.h2o.ai/
• reference doc is generated via the /Metadata endpoints, so it's always current
© H2O.ai, 2015 44
THANKS!Questions?
© H2O.ai, 2015 45