A few weeks ago, I was invited to visit the New York City office of XLABS, a group that claims to have achieved...
99% compression for videos being streamed over the Internet. If the claim is valid, its significance is enormous. Naturally, I needed to see a demo.
I started my visit by discussing the XLABS project with Chief Operating Officer Steven Saldana. I asked Saldana about the theoretical limits for compressing data. I haven't done any calculations and I'm not pretending to be an expert in such matters, but intuitively, the XLABS claim seems to approach the limit of what is theoretically possible. Saldana acknowledged this issue and said that the process, named RITA, takes a brand new approach to the compression problem. According to Saldana, traditional compression algorithms look for patterns in the data, whereas the XLABS algorithm goes beyond simple pattern matching.
On to the demonstration
Saldana made an eight-second video of me talking into a camera. For a moment, let's pretend that we're in a court of law and call the video Exhibit A. The video, captured with the built-in camera on an ordinary Mac, consumed 1.83 MB in MP4 format, and Saldana stored the video on a USB drive that I happened to have brought with me. I expected him to compress and decompress the video immediately after making it, but instead, we took the video to a secure, off-site location. At this point, we were joined by XLABS Chief Executive Officer Oliver Christie.
To respect the XLABS group's wishes for confidentiality, I won't reveal anything about the off-site location. I'll simply say that the amount of security surrounding the exact nature of the compression algorithm is impressive. In my naive view, I imagined that as soon as you discover a way to do 99% lossless compression, you can shout it to the world. If you're worried about the process being stolen, mail it to yourself in a date-stamped envelope, or send it to a patent office where the algorithm, and the identity of its originators, is safe and sound. But that's not the way it goes. At this early stage in the unveiling of the compression technique, XLABS takes great care to make sure that the algorithm doesn't get into the wrong hands. They're keeping the process under tight wraps while they refine the code, define the use cases, and secure financial backing. (While I waited for a short time in the secure area, I realized that I must be the poorest person to be seeing this demo. Others are probably potential backers with very deep pockets.)
The rest of the demo was fairly simple. On an ordinary Windows laptop, Saldana used the command line to invoke a utility that created a compressed file from my original video file. He performed this compression step twice, and the results were slightly different from one time to the next. The algorithm uses some encryption and the randomness used in encryption accounts in part for these differences. Saldana said that the algorithm's own secret sauce plays a role in accounting for the differences.
460 to 1 compression?
The first time through, compression took 24 seconds. The second time, compression took only 17 seconds. (The amount of time that it takes to perform the compression and uncompression is no small concern, but I didn't press Saldana and Christie on that point.) Each time, the compressed file size was about 4 KB. If you're keeping score, that's a compression ratio of 460-to-1. It represents a space savings of 99.78%.
On the same Windows laptop, Saldana invoked another utility that uncompressed the 4 KB file. Uncompressing took about 10 seconds. The uncompressed file was identical to the original file. Despite maintaining a healthy skepticism, I have two reasons for believing the files were identical. First, Saldana played the uncompressed video on the Windows laptop. The uncompressed video looked exactly like the original video. That was nice, but it didn't prove anything. I wouldn't be able to identify an unfaithful video copy if it bit me on the nose.
The second proof that the original and compressed files were identical was much more compelling. Saldana put the uncompressed file on my own USB drive. Later that day, on my own computer, I ran the uncompressed and compressed files through a bit-by-bit comparison program. Sure enough, the two files contain the exact same sequence of 0s and 1s. It should be noted that for security reasons, Saldana didn't copy the 4 KB compressed file to my USB drive, so I don't know what bit sequence was inside the 99.78% compressed file.
If you're keeping score, that's a compression ratio of 460-to-1. It represents a space savings of 99.78%.
What it's all about?
Computing lives or dies with things called resources, things whose supply is potentially limited. And for Internet traffic, one of the most important resources is bandwidth. Bandwidth is the number of bits that you can send over the Internet in a given time interval. And without lots of bandwidth, consumers don't get what they want when they want it. On February 26, the Federal Communications Commission voted on Net Neutrality, a law to regulate the use of bandwidth on the Internet in the US.
Consider Netflix, whose streaming videos account for 34% of all Internet traffic during peak viewing times in the United States. Without sufficient bandwidth, you can't watch a video the way it's meant to be watched. If you don't have enough bandwidth, you must compromise. You either watch the video in pieces, with starts and stops happening when the signal doesn't arrive on time, or you watch a degraded version of video. Having grown up in the days of rabbit ear antennas, I much prefer the latter. But for today's viewers, neither option makes much sense.
To keep video traffic moving, Netflix needs to reduce the amount of data that it sends to your home. Reducing the data is called compression. Netflix compresses a video and sends the compressed version to your home. At your end, a processor uncompresses the video data and presents it to you as if it was never compressed. The whole thing is a juggling act. Netflix's software must reduce the amount of data to be transmitted but do so in a way that you, the consumer, don't notice. There are dozens of ways to achieve this. Traditionally these ways fall into a few categories.
One category analyzes the frequency of certain strings of bits. To understand how this works, think about text data. In a text document, you can compress data by representing common letters, like 'e',with small strings of bits, and reserve larger strings of bits for uncommon letters like 'q'. By representing some letters with very short strings, you can compress text quite a bit. Your computer recovers this kind of compressed data when you download a .zip file and expand it.
Other compression tricks involve substituting descriptions of bits for the bits themselves. Take a sequence of bits and summarize it in some way that can be recovered later. For example, the description 1000x0 + 1000x1 + 1000x0 is much shorter than an actual sequence of three thousand 0s and 1s. If you apply this idea to video, you might describe only the changes from one video frame to the next. For example, if the host on a talk show moves her head toward a guest, send data to describe the new head position, but don't resend the description of the background in the TV studio. A brief code to summarize the fact that, from one video frame to the next, all pixels in the background remain the same, is much more efficient.
Going beyond data patterns
Early on in my meeting, Saldana compared the XLABS algorithm to the traditional trick of finding patterns. In my simple 0s and 1s example, the repeating of a thousand 0s is just such a pattern. When the background behind a talk show host doesn't change from one frame to the next, that's a pattern. According to Saldana, the XLABS algorithm has better tricks up its virtual sleeves.
I should mention that some schemes for summarizing data can get you tangled up pretty quickly. For example, Berry's paradox, which asks for "the smallest positive integer not definable in fewer than twelve words", uses exactly eleven words to describe a particular number, a number that supposedly can't be defined in so few words. Furthermore, audio compression can involve the omission of certain sounds. Humans don't hear sounds lower than 20 Hz or higher than 20 kHz. To reduce the number of bits used by an audio file, you scrap all information about sounds outside the 20 Hz to 20 kHz range. The science of psychoacoustics provides other clues about sounds that won't be missed if they're discarded.
How much can you compress data?
In the world of compression, the compression ratio is the size of the original data divided by the size of the compressed data. For example, the description 1000x0 + 1000x1 + 1000x0 consumes 192 bits when I store it on my computer. So, by storing 1000x0 + 1000x1 + 1000x0, I've described a string of 3000 bits in only 192 bits. The ratio is 3000 / 192, which is about 15 to 1. Seen from the flip side, the space savings is 1 ̶ (1/ratio). For a compression ratio of 3000 / 192, the space savings is about 93%. Of course, this is an extreme case. Useful data seldom comes in the form of three simple patches, namely all 0s, then all 1s, and finally all 0s.
Aside from all the number crunching, you can describe a compression technique as either lossly or lossless. With lossy compression, the data that you get after uncompressing a file isn't exactly the same as the data that you had before you did the compression. Any audio compression that eliminates sounds, even sounds that most people can't hear, is lossy compression. But with lossless compression, when you uncompress a file, you have an exact copy of the file that was originally compressed. No data is lost by compressing and then uncompressing. Lossy compression often yields a better compression ratio, but lossless compression is the gold standard.
The limits of compression
This brings me to the limits of compression. In 1935, E. B. White wrote a satirical essay about Readers Digest, and about the way publications try to make reading more palpable for hurried consumers. In White's essay, a researcher develops a way to summarize each day's reading in one six letter word. The first day's word was IRTNOG. (This challenges my image of life in the 1930s. Even then, people were short on time!) Of course, the satirical point is that you can't summarize a day's reading in six letters. There's no way to compress information so compactly.
With a very tiny example, you can see that certain compression ratios are simply impossible. Imagine that you have only four messages that you can transmit. For example, you can transmit NICE, GOOD, FAIR, or POOR. You can't summarize these messages with only one bit because, with only one bit, there are only two possibilities. So with only four words that can be transmitted, you can't take the word GOOD, which is commonly represented with 32 bits, and summarize the word's meaning with only one bit. In other words, you can't achieve 32-to-1 compression.
In the four-word scenario, you can summarize your message in two bits. You can use 11 for NICE, 10 for GOOD, 01 for FAIR, and 00 for POOR. So with these four words, the best you can do is to send two bits, and that's 16-to-1 compression. Yes, 16-to-1 is a wonderful compression ratio, but we're describing an unusual (and very artificial) situation. The data being transmitted is only one of four words.
Theoreticians have analyzed the limits of compression for various kinds of data. For any particular file, the theoretical limit depends on the amount of information in the file, and you can measure the amount of information with a formula for entropy. Roughly speaking low probability means high entropy. The lower the probability that you'll see the word "cow" used in this article, the more information you get if you actually read the word "cow." As the amount of information increases, the best possible ratio for compressing the information decreases. Searching the web for lossless video compression techniques, I find results as good as 3.5 to 1. That's a space savings of about 70%. For comparison, the compression that Netflix uses cuts video sizes in half, and that compression technique is slightly lossy.
XLABS does compression
Can you compress any video in a lossless way to a file that's 1/1000th of its original size? The people at XLABS say they can. I first met XLABS Chief Operating Officer Steven Saldana, pronounced Sal-dan-ya, at one of Google's Solve for X events. His claim of 99.92% lossless space savings came across loud and clear. But many attendees at the Solve for X event were skeptical. So I arranged to see it for myself at the XLABS headquarters in New York City. (The URL for XLABS is http://xlabs.ai. According to ICANN, the ai stands for Anguilla -- a territory in the Caribbean. But for the people at XLABS, ai means for "artificial intelligence.")
The XLABS team members began a number of years ago studying some very big problems. The problem areas included machine learning, pattern recognition, optimization, big data, financial modeling and data security. To attack these problems, they applied artificial intelligence techniques.
As part of their research, the XLABS team members worked with a neural net named RITA, named in honor of neurobiologist Rita Levi-Montalcini. RITA suggested a compression method that, apparently, was far better than any of the methods that are traditionally used. The people at XLABS took this method, refined it, and devised the method that, according to their claims, does lossless video compression with a whopping 99.92% space savings. (RITA sparked the burst in creativity but, for now, RITA has retreated into the background. The XLABS team's compression method runs on conventional computers.)
Emulating neural nets
In 1945, John Von Neumann wrote a paper in which he described what has come to be known as the Von Neumann computer - a processor with instructions stored in its memory. The processor executes instructions, one after another, until one of the instructions says "jump to such-and such an instruction." A program counter keeps track of the next instruction to be executed.
Almost every computer in use today is a Von Neumann computer. If you're reading this article online, you're almost certainly staring at a Von Neumann computer's screen. But Von Neumann computers have limitations. In particular, following one instruction after another is woefully uncreative. Humans don't seem to follow instructions when they come up with new, innovative ideas. So in work that goes as far back as 1943, researchers McCulloch and Pitts described the basic components of a neural net. A neural net doesn't follow instructions that are stored in memory. Instead, in a neural net, signals and information flow through units that mimic cells in the human brain. Based on the results of ongoing calculations, the net changes the connections among its neurons and adjusts the neurons' firing thresholds. In this way, a neural net mimics the human acts of learning, improving, and possibly creating.
In the 1960s and 1970s researchers debated the capabilities of neural nets, arguing that simple neural nets can't do some basic Boolean logic operations, operations that a Von Neumann machine has no trouble doing. But in 1975, researchers filled in the gap, showing how a slightly more complicated neural net, a net with intermediate layers and the backpropagation algorithm for adjusting its many parameters, can in fact do what a Von Neumann computer can do.
The whole story about basic logic makes neural nets look like second-class citizens. But when it comes to devising new compression techniques, basic logic might not have the final word. For one thing, the neural nets can adjust firing thresholds by indefinitely tiny amounts. (They don't simply jump from 0 to 1 way Von Neumann computers do.) And neural nets are trained with more sophisticated algorithms like deep learning, in which abstract problems are divided into many layers. (To understand what deep learning is all about, consider human vision. Visual information is processed in stages. Different features such as colors, contour and motion are processed by groups of neurons, some of which may overlap or interact with one another. Human vision takes place in layers, and in our day-to-day conscious lives, we're completely unaware of what goes on in most of these layers.)
In my view, neural nets don't perform magic. They can't implement algorithms that Von Neumann computers don't already implement, and their mimicry of human brain cells doesn't make them conscious or truly creative. But that's beside the point. Neural nets represent a different approach to problem solving than the approach that we typically use. If neural nets like RITA, with their learning-centric approaches, can inspire new ways for us to think about video compression, then neural nets are worth their weight in gold.
What do we make of all this?
Goof that I am, I teased Saldana once again about the need for security. Why can't they just patent the compression algorithm, take the money and run? On this point, Saldana was quite clear. The XLABS plan isn't only about a compression algorithm. The RITA neural net came up with a revolutionary solution to an age-old computing problem. The net did this because, unlike most computers, RITA can think on its own. With ability of this kind, what else can RITA do? The world has hundreds, maybe thousands, of pressing problems, and RITA's power can be applied to many of them. That's what separated XLABS from a plain old data compression company. The people at XLABS are taking the long-term view.
Have the folks at XLABS really achieved 99.92% compression? Only a careful audit of their files and methods can produce a reliable answer. And, for the time being, their files and methods are under wraps. I have no way to peek behind the curtains of the XLABS demo, and I don't want to speculate about things that I can get completely wrong. I had fun seeing the XLABS demo and, for now, that's what matters most to me.
How would you change the modern approach to data compression? Let us know.