Sergey Nivens - Fotolia

News Stay informed about the latest enterprise technology news and product updates.

Pass on the JSON, and choose binary encoding formats instead

Find out how developers can achieve significant performance boosts using new binary encoding formats as alternatives to JSON and XML.

Binary encoding formats promise significant performance improvements for communications-intensive apps. TheServerSide caught up with Martin Thompson, founder of Real Logic, who implemented a new binary protocol format for financial trading that is over 1,000 times faster than JSON. 

What is your take on the status of new techniques for replacing JSON with more efficient binary enconding formats and protocols like Google Protocol Buffer?

Martin Thompson: First up, none of these formats are protocols. They are codecs. A protocol is a means of describing an communication interaction. Each of the individual messages used in a protocol can be encoded or decoded in a particular format by a codec.

What's your take on the relative performance of JSON/REST, compared with binary encoding formats like Google Protocol Buffers, Avro, Thrift, Bond and Simple Binary?

Thompson: Text-based encodings are typically 10x slower than the less efficient binary codecs such as GBP. There are binary encodings that are 10x to 100x more efficient such as FlatBuffers, Cap'n Proto and SBE (Simple Binary Encoding).

Does this kind of efficiency just reduce latency, or do you see a role from more efficient cloud usage by moving message parsing from JSON to these more efficient binary formats?

You will likely be shocked how much CPU time and memory is dedicated to protocols and codecs relative to the business logic.

Thompson: This increase in efficiency results in direct reductions in latency, increases in throughput, and efficiency gains. We can also see bandwidth reduction due to more compact encodings. One of the biggest wins can be on mobile devices where the battery usage is significantly reduced.

If you profile the typical business application you will likely be shocked how much CPU time and memory is dedicated to protocols and codecs relative to the business logic. It seems our applications are mostly doing protocol handling and encoding and as a side effect do a little business logic.

What are the types of applications where binary enconding format efficiency might translate into the most significant gains or reduce the cloud instance size required for a particular type of application?

Thompson: Any application that does a significant amount of communication or encoding, such as microservices or monitoring data. Text-based logging is an abomination.

What are the limitations of binary encoding formats, and particularly SBE? Are there places where it is not as good a fit?

Thompson: The main limitation is lack of understanding and experience in the development community. We spend so much of our time debugging all types of applications. Text encodings are easier for those inexperienced with binary encodings. However, with experience, binary encodings become easy to work with and in many cases are even simpler to debug because there are less edge cases.

Where would you see these formats being used in the communications stack compared with lower-level protocols like UDP/TCP, and higher-level protocols like WebSockets, XMPP, CoAP and MQTT?

Thompson: In the OSI layer model these encodings are Layer 6, i.e., presentations. UDP is Layer 4, TCP is a mix of Layer[s] 4 and 5. WebSockets, XMPP, HTTP, etc. are Layer 7 application protocols.

What are the development challenges around using SBE in terms of debugging compared with GPB and REST?

Thompson: SBE compared to GBP is very similar in usage. SBE has the restriction that messages with repeating groups must be accessed in order versus arbitrary access. Some find this restricting. I find this is just a matter of development discipline. Arbitrary memory access does not play well with the prefetchers in a CPU cache subsystems. CPUs love predictable patterns. REST is a Layer 7 protocol and does not compare.

What do you see as some of the factors holding back the wider adoption of binary encoding formats like GPB and SBE?

Thompson: Lack of experience and awareness. The cool kids are mostly using JSON these days. This is such a shame because it is such a poor encoding. It has no types and is very inefficient.

What would you consider the best practices for organizations to replace the use of REST and JSON with these more efficient formats?

Thompson: Try them on a small project and build experience. Then in time write tools to help with debugging such as Wireshark dissectors and viewing tools. The viewing tools do not need to be complex; simple command line tools can be enough.

Next Steps

Learn the difference between encoding and decoding

What is binary and when is a binary system useful?

Reasons for choosing JSON for building API protocols

Dig Deeper on Java Development Tools

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Have you experienced performance improvements by switching to a binary encoding format for your data exchanges?
I think this is a bit one-sided.  There is a reason to use text based formats which conspicuously has not been mentioned.  Sending data in a binary form is older than using something like XML or JSON, so this shouldn't be presented as a new idea.

The reason text formats are used is clearly not for raw performance, but for interoperability.  Despite the claim above, a lot of business applications wouldn't benefit massively from a binary format because comms isn't a big part of the overall application.  However, the example above highlights financial transactions, which like some other activities IS very performance sensitive, so its appropriate to use binary.  However, in a much larger portion of the computing world, I would argue that reusability, computer language-agnostic formats provide a different kind of benefit.  

I'd compare this to an argument about source code legibility vs highly optimized performance sensitive code - one is not the other.  There are needs for both, but don't claim the worlds problems will be solved by this magic fix-all solution.  This is simply another tool to be used for the right application (and its NOT new - look up cobol copybooks and C structs).
pmasters has it right.

In the "old days" where processing power and storage were scarce resources, everything had to be binary or suffer unacceptable latency and expensive storage consequences.

Today with terabyte storage falling from ~$25, multi-core/multi-threaded CPUs all running > 1 GHz, and network speeds on the rise, hardware compensates admirably for bloated (size-wise) text-based schemes.

As pmaster pointed out, there are design choices that are driven by the application and there is no "one method fits all" answer.
Those who forget history are doomed to repeat it.
This is blanket statement. Performance is not always the key to designing system information format. There are places like finance operations where binary formats may not be acceptable because they have issues with readbility and maintenance and tolerance for loss. Simple question to ask: How do you know what financial transaction was corrupt when passed between systems if your binary message is trunkated and it cannot be normalized to readable format? That is why JSON or XML might be preferred with sacrifice of performance. You can always scale differently for performance, but not neccessarily on formatting information. Also one may consider BSON. Why not?
Most data transferred over an HTTP connection uses gzip compression. Text compresses much better than binary formats - so bandwidth usage is not significantly different between binary v text.

If efficiency was the only issue then we'd all be writing code in assembler.

Data formats in general are easier to debug.

A few 'standard' text based encodings are a lot better than many different binary formats.

Everything used to be binary encoded (think .doc files) - where there are sufficient CPU cycles text always wins.
Wow !! A blast from the past.. In the '90s, using binary formats were called the worse possible alternative for communicating information. Binary formats were used since the 1950s. 

Binary formats fell out of favor when a few things happened:
1) The information communicated was a moderate byte count
2) The dropping cost of communication bandwidth
3) Human readability of was import to facilitate development.

Some may argue text formats were for interoperability, however anyone who knows how to read and implement a data format specification can provide interoperability. The truth is more likely no one wanted to spend time to argue resulting in favoring a big-endian vs little endian technology allowing one vendor marketing brownie points. 

However, mobile network efficacy and over provisioning coupled with increasing data transfers sizes are driving the need efficient transfer of information.  Will developers need to learn and require new tools to decode binary information ?  We can find opportunity everywhere.

Read pmasters and CCasey's comments.. 
Coming from large system environment (Not IBM) I was appalled to find the use of text to store and transmit numeric data. The amount of time to convert numbers (i.e. integer and float) to/from text is much more significant then most people appreciate.  I can not believe it when people complain about  transmission time when numbers are transmitted zero filled text like 0000012. 

 This is further complicated by the use of generic tags like <integer> What happened to text formats like kkkk=vvvv (example MIN=153) and text indicators like quotes or apostrophe's (example COMPANYNAME='TheServerSide').

 Claims of better interoperability and ease of debugging with text only apply to novice programmers without proper tools to view the data. Open Systems Interconnection model was (with great care, thought and research) was designed to address exactly these types of issues.
While using compression (the responsibility of OSI presentation layer not the application) reduces the transmission time, parsing and conversion times are still significant.
Detection of data corruption (Reliable transmission) is the responsibility of OSI datalink and transport layers, not the application.

 The comparison to source code optimization is not really applicable.

The very old response that "the hardware improvements will fix the software bloat" haven't really come true partly because we 'need' more and more data.