Improving sanity with structured binary formats and protocols

Post on 06-May-2015

6.530 views 2 download

Tags:

description

Did you know that you can turn a JSON API into a binary one using MessagePack with one line of code? There are a lot of myths surrounding working with binary formats, and a lot of things even the "experts" get wrong. Just as important as making a binary format that is high performance, is making one that is easy for developers to work with. This talk describes how not to implement a binary format, and then introduces structured ways to implement them that will maximize language compatibility and make everybody's lives easier. An early version of this talk was introduced at Realtime Conf Europe, and was recorded here: http://www.youtube.com/watch?v=ZlKrnOD-4TQ

Transcript of Improving sanity with structured binary formats and protocols

Learning how to let go:Improving sanity with structured binary data-interchange formats and protocols

Kyle DrakeNet Brew Ventures

If you’re doing a lot of text-based serialized object passing, and want a way to improve serialization time and payload size, try a binary format.

Why• Smaller size, representing the same

information (esp. vs XML)

• Usually faster for CPU as a consequence

• Surprisingly easy to implement (with help)

• Lots of well-tested, streamlined solutions

Why Not• No quick language support (if custom)

• Hard to understand without good docs

• Potentially harder to debug low-level if something bad happens (data entanglement/corruption?)

• Binary != Performance (Measure!)

Two Approaches• Custom, hand-rolled format you write

• Frameworks (Protocol Buffers, MessagePack, BERT, others)

Custom Binary Format• This is often a bad idea.

• No, seriously.

• Requires a very compelling reason IMHO.

Let’s look at a poorly designed custom binary format, via: The Apple Push Notification Service.

Wait, why not just use JSON for the whole thing then?

Success: no responseError: socket disconnectionInterim: unreported data loss

W.T.F.

ROFLSCALE TIPS FTW

My solution:

Meanwhile, at Google:

I think this crap is why people hate working with binary so much.But there is a better way.

Let’s look at some structured ways to work with binary!

Protocol Buffers• Developed by Google

• “3 to 10 times smaller, 20 to 100 times faster than XML”

• Requires a pre-defined .proto file

• In effect, it has a language agnostic “schema”.

Protocol Buffers

Protocol Buffers

Protocol BuffersI’m doing work with the Bitcoin protocol right now. I really wish it used protocol buffers. Not just for size, but for safety and ease of use.

Bitcoin lead core developer is warm to idea:https://bitcointalk.org/index.php?topic=632.msg6656#msg6656

Let’s look at some “schemaless” structured binary formats.

JSON• Number (50, 2.33)

• String “howdy”

• Boolean (true or false)

• Array [1,2,3]

• Object {“key”: “value”}

• null (empty)

MessagePackBasically the same thing.

Except faster and smaller!

Super simple:

Painless API Integration

• /route?format=msgpack

• Content-Type: application/x-msgpack

• Accept: application/x-msgpack

• Plugs right in, if you’re using JSON.

HAVE A JSON API? TRY A SCHEMALESS STRUCTURED BINARY FORMAT! IT’S EASY.