HTTP, JSON, JavaScript, Map&Reduce built-in to MySQL

PowerPoint-Prsentation

Ulf Wendel, Oracle

Make it happen, today.

HTTP, JSON, JavaScript,
Map and Reduce
built-in to MySQL

The speaker says...

MySQL is more than SQL!

What if...... MySQL would talk HTTP and reply JSON MySQL had built-in server-side JavaScript for MyApps MySQL had poor man's Map&Reduce for JSON documents

We you and I - make it happen. Today.

You are watching the proof of concept.

New client protocols

New access methods, additional data models

New output formats

MySQL as a storage framework

Mycached (2009, Cybozu)

HandlerSocket (2010, Dena)

Drizzle HTTP JSON (2011, Steward Smith)

InnoDB Memcached (2012, Oracle)

NDB/MySQL Cluster Memcached (2012, Oracle)

JavaScript/HTTP Interface (today, You)

Groundbreaking eye-openers

The speaker says...

We thought pluggable storage was cool. Different storage backends for different purposes. We thought the dominant relational data model is the one and SQL the appropriate query language. We thought crash-safety, transactions and scale-out through replication count.

You wanted maximum performance. You had CPU bound in-memory work loads. You wanted the Key-Value model in addition to the relational one. Key-Value is fantastic for sharding. You wanted lightweight Memcached protocol and lightweight JSON replies. You did not need a powerful query language such as SQL. Luckily, you saw MySQL as a storage framework!

Memcached for ClusterMemcached for InnoDB Like PHP extensions! But not so popular

Library, loaded into the MySQL process

Expose tables, variables, functions to the user

Can start network servers/listeners

Can access data with and without SQL

MySQL Server deamon plugins

3306MySQLSQL, relational model11211Key/Value11211Key/Value

The speaker says...

MySQL daemon plugins can be compared with PHP Extensions or Apache Modules. A daemon plugin can be anything that extends the MySQL Server. The MySQL source has a blueprint, a daemon plugin skeleton: it contains as little as 300 lines of code. The code is easy to read. MySQL is written in C and portable C++.

The books MySQL 5.1 Plugin Development (Sergei Golubchik, Andrew Hutchings) and Understanding MySQL Internals (Sascha Pachev) gets you started with plugin development. Additional information, including examples, is in the MySQL Reference Manual.

Memcached Key/Value access to MySQL

MySQL benefits: crash safe, replication, ...

Some, easy to understand client commands

Small footprint network protocol

Community reports 10,000+ queries/s single threaded and 300,000+ queries/s with 100 clients

Performance

The speaker says...

I couldn't resist and will continue to show performance figures to make my point. From now on, the machine used for benchmarking is my notebook: Celeron Duo, 1.2 Ghz, 32bit virtual machine running SuSE 12.1 with 1.5GB of RAM assigned, Win XP as a host. Small datasets ensure that all benchmarks run out of main memory.

Don't let the benchmarks distract you. Dream of MySQL as a storage for many data models, many network protocols and even more programming languages.

For example, think of client-side JavaScript developers.

The PoC creates a direct wire

Sandbox: HTTP/Websocket network protocols only

Needs proxying to access MySQL

Extra deployments for proxying: LAMP or node.js

Proxying adds latency and increases system load

MySQL for client-side JavaScript

MySQLPHPApacheBrowserJavaScript330680

The speaker says...

Client-side JavaScript runs in a sandbox. JavaScript developers do HTTP/Websocket background requests to fetch data from the web server.

Because MySQL does not understand HTTP/Websocket, users need to setup and deploy a proxy for accessing MySQL. For example, one can use PHP for proxying. PHP interprets GET requests from JavaScript, connects to MySQL, executes some SQL, translates the result into JSON and returns it to JavaScript via HTTP.

Let's give JavaScript a direct wire to MySQL!

Like with PHP extensions!

Copy daemon plugin example, add your magic

Glue libraries: libevent (BSD), libevhtp (BSD)

Handle GET /?sql=, reply JSON

HTTP and JSON for MySQL

The speaker says...

First, we add a HTTP server to MySQL. MySQL shall listen on port 8080, accept GET /?sql=SELECT%1, run the SQL and reply the result as JSON to the user. The HTTP server part is easy: we glue together existing, proven BSD libraries.

Benchmark first to motivate you. The chart compares the resulting MySQL Server daemon plugin with a PHP script that accepts a GET parameter with a SQL statement, connects to MySQL, runs the SQL and returns JSON. System load reported by top is not shown. At a concurrency of 32, the load is 34 for PHP and 2,5 for the MySQL Server daemon plugin...

Don't look at extending MySQL network modules!

Virtual I/O (vio) and Network (net) are fused

Start your own socket server in plugin init()

Mission HTTP

/* Plugin initialization method called by MySQL */static int conn_js_plugin_init(void *p) { ... /* See libevent documentation */ evthread_use_pthreads(); base = event_base_new();

/* Register generic callback to handle events */ evhttp_set_gencb(http, conn_js_send_document_cb, docroot);

handle = evhttp_bind_socket_with_handle(http, host, port); event_base_dispatch(base);}

The speaker says...

Don't bother about using any network or I/O related code of the MySQL server. Everything is optimized for MySQL Protocol.

The way to go is setting up your own socket server when the plugin is started during MySQL startup. Plugins have init() and deinit() methods, very much like PHP extensions have M|RINIT and M|RSHUTDOWN hooks.

You will easily find proper code examples on using libevent and libevhtp. I show pseudo-code derived from my working proof of concept.

Request handling - see libevent examples

Done with HTTP for now

static void conn_js_send_document_cb(struct evhttp_request *req, void *arg) { /* ... */ *uri = evhttp_request_get_uri(req); decoded = evhttp_uri_parse(uri); /* parsing is in the libevent examples */ if (sql[0]) { query_in_thd(&json_result, sql); evb = evbuffer_new(); evbuffer_add_printf(evb, "%s", json_result.c_ptr()); evhttp_add_header(evhttp_request_get_output_headers(req), "Content-Type", "application/json"); evhttp_send_reply(req, 200, "OK", evb); }}

The speaker says...

You are making huge steps forward doing nothing but copying public libevent documentation examples and adapting it!

The hardest part is to come: learning how to run a SQL statement and how to convert the result into JSON.

query_in_thd() is about SQL execution. For JSON conversion we will need to create a new Protocol class.

Before (left) and after (right)

BrowserJavaScriptApachePHPMySQL

HTTP, JSON

MySQL Protocol, binary

BrowserJavaScriptMySQLBrowserJavaScript

HTTP

The speaker says...

All over the presentation I do short breaks to reflect upon the work. The cliff-hangers take a step back to show the overall architecture and progress. Don't get lost in the source code.

On the left you see todays proxing architecture at the example of Apache/PHP as a synonym for LAMP. On the right you see what has been created already.

The new plugins come unexpected

How about a SQL service API for plugin developers?

How about a handler service API? developers

Plugin development would be even easier!

Additional APIs would be cool

/* NOTE: must have to get access to THD! */#define MYSQL_SERVER 1

/* For parsing and executing a statement */#include "sql_class.h" // struct THD#include "sql_parse.h" // mysql_parse()#include "sql_acl.h" // SUPER_ACL#include "transaction.h" // trans_commit

The speaker says...

The recommended books do a great job introducing you to core MySQL components. So does the MySQL documentation. You will quickly grasp what modules there are. There is plenty information on writing storage engines, creating INFORMATION_SCHEMA tables, SQL variables, user-defined SQL functions but executing SQL is a bit more difficult.
The new class of server plugins needs comprehensive service APIs for plugin developers for accessing data. Both using SQL and using the low-level handler storage interface.

The story about THD (thread descriptor)...

Every client request is handled by a thread

Our daemon needs THD's and the define...

#define MYSQL_SERVER 1

int query_in_thd() { /* */ my_thread_init(); thd = new THD(false);

/* From event_scheduler.cc, pre_init_event_thread(THD* thd) */ thd->client_capabilities = 0;
thd->security_ctx->master_access = 0; thd->security_ctx->db_access = 0; thd->security_ctx->host_or_ip = (char*) CONN_JS_HOST; thd->security_ctx->set_user((char*) CONN_JS_USER);

my_net_init(&thd->net, NULL); thd->net.read_timeout = slave_net_timeout;

The speaker says...

MySQL uses one thread for every request/client connection. Additional system threads exist. To run a SQL statement we must create and setup a THD object. It is THE object passed all around during request execution.

The event scheduler source is a good place to learn about setting up and tearing down a THD object. The event scheduler starts SQL threads for events just like we start SQL threads to answer HTTP requests.

THD setup, setup, setup...

/* MySQLs' network abstraction- vio, virtual I/O */ my_net_init(&thd->net, NULL); thd->net.read_timeout = slave_net_timeout; thd->slave_thread = 0; thd->variables.option_bits |= OPTION_AUTO_IS_NULL; thd->client_capabilities |= CLIENT_MULTI_RESULTS;

/* MySQL THD housekeeping */ mysql_mutex_lock(&LOCK_thread_count); thd->thread_id = thd->variables.pseudo_thread_id = thread_id++; mysql_mutex_unlock(&LOCK_thread_count);

/* Guarantees that we will see the thread in SHOW PROCESSLIST though its vio is NULL. */
thd->proc_info = "Initialized"; thd->set_time();

DBUG_PRINT("info", ("Thread %ld", thd->thread_id));

THD setup, setup, setup...

/* From lib_sql.cc */ thd->thread_stack = (char*) &thd; thd->store_globals();

/* Start lexer and put THD to sleep */ lex_start(thd); thd->set_command(COM_SLEEP); thd->init_for_queries();

/* FIXME: ACL ignored, super user enforced */ sctx = thd->security_ctx; sctx->master_access |= SUPER_ACL; sctx->db_access |= GLOBAL_ACLS;

/* Make sure we are in autocommit mode */ thd->server_status |= SERVER_STATUS_AUTOCOMMIT;

/* Set default database */ thd->db = my_strdup(CONN_JS_DB, MYF(0)); thd->db_length = strlen(CONN_JS_DB);

The speaker says...

The setup is done. Following the motto make it happen, today, we ignore some nasty details such as access control, authorization or in the following bothering about charsets. It can be done, that's for sure. I leave it to the ones in the know, the MySQL server developers.

Access control? With HTTP? With our client-side JavaScript code and all its secret passwords embedded in the clear text HTML document downloaded by the browser?

Hacking is fun!

Executing SQL

thd->set_query_id(get_query_id()); inc_thread_running(); /* From sql_parse.cc - do_command() */ thd->clear_error(); thd->get_stmt_da()->reset_diagnostics_area(); /* From sql_parse.cc - dispatch command() */ thd->server_status &= ~SERVER_STATUS_CLEAR_SET;

/* Text protocol and plain question, no prepared statement */ thd->set_command(COM_QUERY);

/* To avoid confusing VIEW detectors */ thd->lex->sql_command = SQLCOM_END;

/* From sql_parse.cc - alloc query() = COM_QUERY package parsing */ query = my_strdup(CONN_JS_QUERY, MYF(0)); thd->set_query(query, strlen(query) + 1); /* Free here lest PS break */ thd->rewritten_query.free(); if (thd->is_error()) { return; }

Heck, where is the result?

Parser_state parser_state; parser_state.init(thd, thd->query(), thd->query_length()); /* From sql_parse.cc */

mysql_parse(thd, thd->query(), thd->query_length(), &parser_state);

/* NOTE: multi query is not handled */ if (parser_state.m_lip.found_semicolon != NULL) { return; }

if (thd->is_error()) { return; } thd->update_server_status(); if (thd->killed) { thd->send_kill_message(); return; }

/* Flush output buffers, protocol is mostly about output format */ thd->protocol->end_statement();

/* Reset THD and put to sleep */ thd->reset_query(); thd->set_command(COM_SLEEP);

The speaker says...

Our query has been executed. Unfortunately, the result is gone by the wind.

MySQL has streamed the results during the query execution into the Protocol object of THD. Protocol in turn has converted the raw results from MySQL into MySQL (text) protocol binary packages and send them out using vio/net modules. Net module was set to NULL by us earlier. Results are lost.

Let's hack a JSON-Protocol class that returns a string to the calller. The result is stored in a string buffer.

We are here...


BrowserJavaScriptMySQLBrowserJavaScript

GET /?sql=

The speaker says...

Quick recap.

MySQL now understands the GET /?sql= request. is used as a statement string. The statement has been executed.

Next: return the result as JSON.

JSON Protocol

class Protocol_json :public Protocol_text {

private: String json_result;

public: Protocol_json() {} Protocol_json(THD *thd_arg) :Protocol_text(thd_arg) {} void init(THD* thd_arg); virtual bool store_tiny(longlong from); /* ...*/ virtual bool json_get_result(String * buffer); virtual void json_begin_result_set_row(); virtual bool json_add_result_set_column(uint field_pos, const uchar* s, uint32 s_length); virtual bool json_add_result_set_column(uint field_pos, String str); virtual bool json_add_result_set_column_cs(uint field_pos, const char* s, uint32 s_length, const CHARSET_INFO *fromcs, const CHARSET_INFO *tocs); /* ... */};

The speaker says...

The proof-of-concept daemon plugin shall be simplistic. Thus, we derive a class from the old MySQL 4.1 style text protocol, used for calls like mysql_query(), mysqli_query() and so forth. Prepared statement use a different Protocol class.

Method implementation is straight forward. We map every store_() call to json_add_result_set_column(). Everything becomes a C/C++ string (char*, ...). Returning a numberic column type as a number of the JSON world is possible.

JSON Protocol method

bool Protocol_json::json_add_result_set_column(uint field_pos, const uchar* s, uint32 s_length) DBUG_ENTER("Protcol_json::json_add_result_set_column()"); DBUG_PRINT("info", ("field_pos %u", field_pos)); uint32 i, j; uchar * buffer; if (0 == field_pos) { json_begin_result_set_row();} json_result.append("\""); /* TODO CHARSETs, KLUDGE type conversions, JSON escape incomplete! */ buffer = (uchar*)my_malloc(s_length * 2 * sizeof(uchar), MYF(0)); for (i = 0, j = 0; i < s_length; i++, j++) { switch (s[i]) { case '"': case '\\': case '/': case '\b': case '\f': case '\n': case '\r': case '\t': buffer[j] = '\\'; j++; break; } buffer[j] = s[i]; } /*...*/

The speaker says...

It is plain vanilla C/C++ code one has to write. Please remember, I show proof of concept code. Production code from the MySQL Server team is of much higher quality. For example, can you explain the reasons for memcpy() in this code?
func(uchar *pos) {
ulong row_num;
memcpy(&row_num, pos, sizeof(row_num)); }
Leave the riddle for later. JSON is not complex!

Use of JSON Protocol

int query_in_thd(String * json_result) { /* */ thd= new THD(false));

/* JSON, replace protocol object of THD */ protocol_json.init(thd); thd->protocol=&protocol_json; DBUG_PRINT("info", ("JSON protocol object installed"));

/*... execute COM_QUERY SQL statement ...*/ /* FIXME, THD will call Protocol::end_statement, the parent implementation. Thus, we cannot hook end_statement() but need and extra call in Protocol_json to fetch the result. */ protocol_json.json_get_result(json_result);

/* Calling should not be needed in our case */ thd->protocol->end_statement(); /*...*/

The speaker says...

Straight forward: we install a different protocol object for THD and fetch the result after the query execution.

Proof: MySQL with HTTP, JSON

nixnutz@linux-rn6m:~/> curl -v http://127.0.0.1:8080/?sql=SELECT%201* About to connect() to 127.0.0.1 port 8080 (#0)* Trying 127.0.0.1... connected> GET /?sql=SELECT%201 HTTP/1.1> User-Agent: curl/7.22.0 (i686-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.0e zlib/1.2.5 c-ares/1.7.5 libidn/1.22 libssh2/1.2.9> Host: 127.0.0.1:8080> Accept: */*> < HTTP/1.1 200 OK< Content-Type: text/plain< Content-Length: 7< * Connection #0 to host 127.0.0.1 left intact* Closing connection #0[["1"]]

The speaker says...

Extracting metadata (types, column names) into the Protocol_json method for sending was not called as it proved to big as a task for PoC.

We wrap up the HTTP task with a multi-threaded libevhtp based HTTP interface. Once again, copy and adapt examples from the library documentation...

We are here...


BrowserJavaScriptMySQLBrowserJavaScriptGET /?sql=JSON reply

The speaker says...

Quick recap.

MySQL understands a new GET /?sql= command. is used as a statement string. The statement has been executed. The result has been formatted as a JSON documented. A HTTP reply has been sent.

Next: from single-threaded to multi-threaded.

Multi-threaded HTTP server

int conn_js_evhtp_init() { evthread_use_pthreads(); base = event_base_new(); htp = evhtp_new(base, NULL); evhtp_set_gencb(htp, conn_js_evhtp_send_document_cb, NULL); evhtp_use_threads(htp, conn_js_evhtp_init_thread, 16, NULL); evhtp_bind_socket(htp, CONN_JS_HTTP_HOST, port, 1024){ event_base_loop(base, 0);}

void conn_js_evhtp_init_thread(

evhtp_t * htp, evthr_t * thread, void * arg) { struct worker_data * worker_data;

/* Thread local storage TLS */

worker_data = (struct worker_data *)
calloc(sizeof(struct worker_data), 1); worker_data->evbase = evthr_get_base(thread); evthr_set_aux(thread, worker_data);}

The speaker says...

Multi-threading out-of-the box thanks to libevhtp. Libevhtp is a BSD library that aims to replace the HTTP functionality in libevent.

Note the thread-local storage (TLS) of the HTTP worker threads. I have missed the opportunity of caching THD in TLS. Doing so may further improve performance.

A webserver needs a script language. Let's add server-side JavaScript! TLS will come handy soon.

JavaScriptPHPWe are here...

JavaScriptApachePHPMySQL

JavaScriptJavaScriptMySQLJavaScript

400 Req/s, Load 34

GET /?sql=SELECT%132 concurrent clients

1606 Req/s, Load 2,5

The speaker says...

MySQL understands a new GET /?sql= command. is used as a statement string. The statement has been executed. The result has been formatted as a JSON documented. A HTTP reply has been sent. The MySQL HTTP Interface is multi-threaded. So is MySQL ever since.

The need for proxing (see the left) is gone. No extra deployment of a proxying solution any more. MySQL gets more system resources resulting in a performance boost.



Glue libraries: Google V8 JavaScript engine (BSD)

Handle GET /?app=

JavaScript for MySQL

The speaker says...

The chart shows Hello world with Apache/PHP compared to MySQL/server-side JavaScript. I am caching the JavaScript source code once its loaded from a database table. JavaScript is compiled and run upon each request.

System load reported by top during ab2 -n 50000 -c32 is 27 during the PHP test and 5 for MySQL/server-side JavaScript...

mysql> select * from js_applications where name='internetsuperhero'\G*************************** 1. row *************************** name: internetsuperherosource: function main() { return "Hello world"; } main();

Cache expensive operations

Keep v8::Context in thread local storage

Cache the script code after fetching from MySQL

Embedding Google V8

#include using namespace v8;int main(int argc, char* argv[]) { HandleScope handle_scope; Persistent context = Context::New(); Context::Scope context_scope(context); Handle source = String::New("'Hello' + ', World!'"); Handle script = Script::Compile(source); Handle result = script->Run(); context.Dispose(); String::AsciiValue ascii(result); printf("%s\n", *ascii); return 0;}

The speaker says...

Google V8 is the JavaScript engine used in Google Chrome. Google's open source browser. It is written in C++ and said to be a fast engine. It is used by node.js and some NoSQL databases.

Armed with the previously developed function query_in_thd() to fetch the source of an MySQLApp stored in a table into a string, it's easy going. Learn the basic concepts of V8 and make it happen. Once done with the V8 documentation, study http://www.codeproject.com/Articles/29109/Using-V8-Google-s-Chrome-JavaScript-Virtual-Machin

Load the source

int conn_js_v8_run_program(const char * name, ::String * script_result) { /* */ buffer_len = sizeof(SELECT_JS_APPLICATIONS) + 1 + strlen(name) + 3; buffer = (char*)my_malloc(buffer_len, MYF(0));

/* KLUDGE: escaping */ my_snprintf(buffer, buffer_len, "%s'%s'", SELECT_JS_APPLICATIONS, name); query_in_thd(&json_result, buffer); my_free(buffer);

/* */ buffer_len = json_result.length() - 6; buffer = (char*)my_malloc(buffer_len, MYF(0)); for (i = 3; i < (buffer_len + 2); i++) { buffer[i - 3] = json_result.c_ptr()[i]; } buffer[buffer_len - 1] = '\0';

conn_js_v8_run_code(buffer, script_result); my_free(buffer); }

The speaker says...

Final code would store the source in a system table. The table would be accessed through the handler interface. NoSQL would be used, so to say. Many integrity checks would be done.

However, you haven't learned yet how to use the Handler interface. Thus, we use what we have: query_in_thd(). It is amazing how far we can get with only one function.

Make it run fast

void conn_js_evhtp_init_thread(evhtp_t * htp, evthr_t * thr, void * arg) { /* ... */ conn_js_v8_init_thread(&worker_data->v8_context); evthr_set_aux(thread, worker_data);}void conn_js_v8_init_thread(void ** tls) { v8_thread_context * context = (v8_thread_context *)my_malloc(...); /* */ context->isolate = v8::Isolate::New(); context->have_context = 0; *tls = context;}

static void conn_js_v8_run_using_context(::String * res, void * tls) { v8_thread_context * context = (v8_thread_context *)tls; Isolate::Scope iscope(context->isolate); Locker l(context->isolate); HandleScope handle_scope; if (!context->have_context) { context->context = v8::Context::New(); context->have_context = 1; } /*...*/

The speaker says...

To boost the performance we cache v8::Context in the thread-local storage of our HTTP worker threads. The v8::Context is needed for compiling and running scripts. A v8::Context contains all built-in utility functions and objects.

For fast multi-threaded V8, each HTTP worker gets its own v8::Isolate object. We want more than one global v8::Isolate to boost concurrency. Isolate? Think of it as a Mutex. Additionally, we cache the script source code in the TLS.Teach your HTTP server to call the functions. Done.

Proof: Server-side JavaScript

~/> curl -v http://127.0.0.1:8080/?app=internetsuperhero* About to connect() to 127.0.0.1 port 8080 (#0)* Trying 127.0.0.1... connected> GET /?app=internetsuperhero HTTP/1.1> User-Agent: curl/7.22.0 (i686-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.0e zlib/1.2.5 c-ares/1.7.5 libidn/1.22 libssh2/1.2.9> Host: 127.0.0.1:8080> Accept: */*> < HTTP/1.1 200 OK< Content-Type: text/plain< Content-Length: 11< * Connection #0 to host 127.0.0.1 left intact* Closing connection #0Hello world

The speaker says...

Boring MySQLApp!

JavaScript has no access to MySQL tables!


ApachePHPMySQLJavaScript1107 Req/s, Load 27

GET hello.php vs. GET /app=hello32 concurrent clients

2360 Req/s, Load 5

The speaker says...

Quick recap.

MySQL has a built-in mult-threaded web server. Users can use JavaScript for server-side scripting. Hello world runs faster than Apache/PHP. The CPU load is lower.

This is an intermediate step. This is not a new general purpose web server.

Next: Server-side JavaScript gets SQL access.




Handle GET /?app=

Server-side JS does SELECT 1

The speaker says...

The charts and the system load is what we all expect. The MySQL Server deamon plugin proof of concept remains in the top position. It is faster and uses less CPU.

Here's the server-side JavaScript I have benchmarked. The PHP counterpart is using mysqli() to execute SELECT 1 and converts the result into JSON.

mysql> select * from js_applications where name='select'\G*************************** 1. row *************************** name: selectsource: function main() { return ulf("SELECT 1"); } main();1 row in set (0,00 sec)

ulf() for server-side JavaScript

static void conn_js_v8_run_using_context(::String * script_result, void * tls) { /* ... */ if (!context->have_context) { Handle global = ObjectTemplate::New(); global->Set(v8::String::New("ulf"), FunctionTemplate::New(UlfCallback)); context->context = v8::Context::New(NULL, global); context->have_context = 1; } /* ... */}

static Handle UlfCallback(const Arguments& args) { if (args.Length() < 1) return v8::Undefined(); ::String json_result; HandleScope scope; Handle sql = args[0]; v8::String::AsciiValue value(sql); query_in_thd(&json_result, *value); return v8::String::New(json_result.c_ptr(), json_result.length());}

The speaker says...

Sometimes programming means to glue pieces together. This time, query_in_thd() is connected with V8.

Imagine, server-side JavaScript had access to more functions to fetch data. That would be fantastic for map & reduce assuming you want it.

Proof: JS runs ulf('SELECT 1')

> curl -v http://127.0.0.1:8080/?app=select* About to connect() to 127.0.0.1 port 8080 (#0)* Trying 127.0.0.1... connected> GET /?app=select HTTP/1.1> User-Agent: curl/7.22.0 (i686-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.0e zlib/1.2.5 c-ares/1.7.5 libidn/1.22 libssh2/1.2.9> Host: 127.0.0.1:8080> Accept: */*> < HTTP/1.1 200 OK< Content-Type: text/plain< Content-Length: 7< * Connection #0 to host 127.0.0.1 left intact* Closing connection #0[["1"]]

The speaker says...

SELECT 1 is great to show the base performance of a technology. SQL runtime is as short as possible. SQL runtime is constant. SQL runtime contributes little to overall runtime. With long running SQL, the majority of the time is spend on SQL. The differences between proxying through Apache/PHP and server-side JavaScript dimish.

But SELECT 1 is still boring. What if we put a BLOB into MySQL, store JSON documents in it and filter them at runtime using server-side JavaScript?



JavaScriptJavaScriptMySQLJavaScript

448 Req/s, Load 34

GET /?app=select132 concurrent clients


JavaScript

The speaker says...

This is a second intermediate step on the way to the main question that has driven the author: how could MySQL be more than SQL. MySQL is not only SQL.

Next: poor-man's document storage.


Copy daemon plugin example, add magic


Handle GET /?app=

JS to filter JSON documents

The speaker says...

MySQL with server-side JavaScript is still faster than PHP but at a 32 concurrent clients the system load reported by top is 9. Have we reached a dead end?

Or, should I buy myself a new notebook? The subnotebook that runs all benchmarks in a VM is four years old. Please, take my absolute performance figures with a grain of salt. Benchmarking on modern commodity server hardware was no goal.



JavaScript

JavaScriptMySQLJavaScript


GET /?map=greetings32 concurrent clients

641 Req/s, Load 9

JavaScript

JSON documentsstored in BLOB

The speaker says...

The illustration shows vision together with first benchmark impressions.

This is how far I got linking three BSD software libraries to the MySQL Server using a MySQL Server daemon plugin. Allow me a personal note, this was the status a week before the presentation.

Use the full potential of MySQL as storage

Stream results into the map() function?

Handler interface instead of SQL

Cache the result create a view

Mission JSON document mapping

The speaker says...

Attention - you are leaving the relational model and entering the NoSQL section of this talk.

We are no longer talking about relations. We talk about JSON documents. SQL as an access language can't be used anymore. We must map & reduce documents to filter out information, cache results and use triggers to maintain integrity between derived, cached documents and originals.

If you want, you can have access to the API used inside MySQL to execute SQL the handler interface.

Too many iterations to filter the doument?

First loop inside the plugin to fetch rows

Storing all rows in a string eats memory

Second loop inside the server-side JavaScript

Maybe the loops are slow?

function filter_names() { var s = ulf("SELECT document FROM test_documents"); var docs = JSON.parse(s); var res = []; for (i = 0; i < docs.length; i++) { var doc = JSON.parse(docs[i]); if (doc.firstname !== undefined) { res[i] = "Hi " + doc.firstname; } } return JSON.stringify(res); }

The speaker says...

The map() function API is not beautiful. First, we iterate over all rows in our plugin and create a string. Then, we pass the string to JavaScript and to the same loop again.

Use handler interface

Open table

For each row: populate C++ object with document

For each row: run map() function and access object

Maybe this is faster?

function map(){ var res; var row = JSON.parse(doc.before); if (row.firstname !== undefined) res = "Hi " + row.firstname; doc.after = JSON.stringify(res); } map();

The speaker says...

The user API, the JavaScript function is still not nice but a step forward. A good one?

Using the handler interface

int handler_copy_example(const char * db_name, const char * from_table_name, const char * to_table_name, String * result) { /* ... */ TABLE_LIST tables[2]; TABLE * table_from = NULL; TABLE * table_to = NULL;

/* create and setup THD as in query_in_thd() */

/* from sql_acl.cc */ tables[0].init_one_table(db_name, strlen(db_name), from_table_name, strlen(from_table_name), from_table_name, TL_READ); tables[1].init_one_table(db_name, strlen(db_name), to_table_name, strlen(to_table_name), to_table_name, TL_WRITE); tables[0].next_local= tables[0].next_global= tables + 1;

open_and_lock_tables(thd, tables, FALSE, MYSQL_LOCK_IGNORE_TIMEOUT);

table_from = tables[0].table; table_from->use_all_columns(); table_to = tables[1].table;

table_from->file->ha_rnd_init(TRUE);

The speaker says...

For demonstrating the handler interface I show a function that copies all rows from one table to another. It is assumed that the tables have identical structures. The loop has most of what is needed to create a view or read from a view.

Before you can use the handler interface you must create a THD object. Use the setup and tear down code from query_in_thd(). Once done, create a table list to be passed to open_and_lock() tabled, tell the handler that we will access all rows and announce our plan to start reading calling ha_rnd_init().

Using the handler interface

do { if ((err = table_from->file->ha_rnd_next(table_to->record[0]))) { switch (err) { case HA_ERR_RECORD_DELETED: case HA_ERR_END_OF_FILE: goto close; break; default: table_from->file->print_error(err, MYF(0)); goto close; } } else { table_to->file->ha_write_row(table_to->record[0]); } } while (1); close: /* from sql_base.cc - open_and_lock_tables failure */ table_from->file->ha_rnd_end(); if (! thd->in_sub_stmt) { trans_commit_stmt(thd); } close_thread_tables(thd);

The speaker says...

Read rows from one table into a buffer and write the buffer into the target table. Stop in case of an error or when all rows have been read. Such loops can be found all over in the MySQL Server code.

When done, close the table handles and tear down THD before exiting.

Extracting data for map()

my_bitmap_map * old_map;my_ptrdiff_t offset;Field * field; ::String tmp, *val;/*...*/Do { /* the handler loop */ old_map = dbug_tmp_use_all_columns(table_from, table_from->read_set); offset = (my_ptrdiff_t)0; for (i = 0; i < table_from->s->fields; i++) { field = table_from->field[i]; field->move_field_offset(offset); if (!field->is_null()) { /* document is the C++/JavaScript data exchange object */ document->before = field->val_str(&tmp, &tmp); /* run map() function */ result = v8::script->Run(); /* store modified value*/ field->store(document->after.c_ptr(), document->after.length(), system_charset_info); field->move_field_offset(-offset); } dbug_tmp_restore_column_map(table_from->read_set, old_map);/* ... */ } while (1);

The speaker says...

This code goes into the handler loop instead of the simple copy done with table_to->file->ha_write_row(table_to->record[0]);

For reference it is shown how to loop over all columns of a row and extract the data. In case of the document mapping one needs to read only the data for the BLOB column and call the JavaScript map() function.

A C++ object is used for data exchange with JavaScript. The object is populated before the map() function is run and inspected afterward.

Are C++/ V8-JS context switches expensive?

Calling JS for every row is a bad idea?

Using C++ object for data exchange does not fly?

We should send rows in batches to reduce switches

Surprise: no major difference

The speaker says...

Calling the map function for every row reduces the performance a bit. Let's recap how good performance is. It is twice as fast as the PHP/Apache proxying approach.

Detailed bottleneck analysis and further benchmarking is beyond the scope and interest of this proof of concept. It has been proven that mapping is possible at very reasonable performance.

8,300 documents mapped per second with V8

8,700 docs/s if map() is an empty function

11,500 docs/s if not calling map()

12,800 docs/s is the base without v8 during read

Single threaded read

The speaker says...

There is a simple solution how we get to the base line of 12,800 documents read per second. We cache the result in a view.

The view is a SQL table that the plugin creates, if the view is accessed for the first time. Triggers could be used to update a view whenever underlying data changes.

Please note, the figure of 12,800 is extrapolated from ab2 -n 1000 -c 1 127.0.0.1:8080/?map= to repeatedly scan a small table with 522 rows (documents) using the handler interface.

JavaScriptMap and reduce with MySQL

MySQLJavaScript

GET /map=greeting32 concurrent clients

571 Req/s, Load 9

JavaScript

JavaScriptMySQLJavaScript

641 Req/s, Load 9

JavaScript

JSON documents

SQL

Handler

The speaker says...

As the name says, Map&Reduce is a two stage process. Mapping is optionally followed by reducing. If you are new to map and reduce, think of reduce as the aggregation step in SELECT FROM GROUP BY .

It has been shown that map() is possible. Results can be persisted in a table. Reducing can be understood as second mapping that works on the results of the map() function. Mapping has been proven to be possible thus reducing is. Implementation was beyond the authors goals.

Imagine someone created a BLOB optimized storage engine for MySQL. Storage engine development is covered in the books...

Imagine Websocket would be used instead of HTTP. Websocket is a raw do as you like connection whith much less overhead. GET /?sql=SELECT%201 return 71 bytes of which 7 are the payload...

Imagine Websocket would be used: transactions, events, streaming all this is within reach...

Areas for future work

PS: This is a proof of concept. No less, no more. I have created it in my after-work office. For the next couple of weeks I plan to focus on nothing but my wedding. Otherwise the bride may decide that an 19 years long evaluation is not enough. She might fear I could be coding during the ceremony...

Would you create the MySQL HTTP Interface, today? I'm busy with the wedding.

Happy Hacking!

THE END

Contact: [email protected]

Concurrency (ab2 -c )Requests/sPHP proxyServer plugin

12931049

44781539

84661697

164471858

324001606

Concurrency (ab2 -c )Requests/sPHPServer plugin

18231314

413071854

813202375

1612962314

3211072360


1290817

44591184

84901351

164651344

324481312


1231433

4374693

8386693

16366714

32358641

Concurrency (ab2 -c )Requests/sServer plugin (SQL) Server plugin (Handler interface)

1433486

4693723

8693638

16714590

32641571

Concurrency (ab2 -c )Documents processed per secondNo V8 in loopV8 but no script runV8 with empty map functionV8 with filtering map function

112862.0811567.528764.388362.44

HTTP, JSON, JavaScript, Map&Reduce built-in to MySQL

Technology

Transcript of HTTP, JSON, JavaScript, Map&Reduce built-in to MySQL