Consummate dilettantism!

Tuesday, August 4, 2009

Kolmogorov Complexity: My Life In Two Words

This is it. Kolmogorov complexity. I've been obsessed with this idea for as long as I can remember, but only now do I know the name; in fact, I never even knew there was a name. For all I knew, it was an idea entirely of my own devising, my own little secret in a universe otherwise devoid of sentience. But it's pretty simple, really; let Wikipedia explain:
In algorithmic information theory (a subfield of computer science), the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, algorithmic entropy, or program-size complexity) of an object such as a piece of text is a measure of the computational resources needed to specify the object. For example, consider the following two strings of length 64, each containing only lowercase letters, numbers, and spaces:

abababababababababababababababababababababababababababababababab
4c1j5b2p0cv4w1x8rx2y39umgw5q85s7uraqbjfdppa0q7nieieqe9noc4cvafzf


The first string admits a short English language description, namely "ab 32 times", which consists of 11 characters. The second one has no obvious simple description (using the same character set) other than writing down the string itself, which has 64 characters.
This image illustrates part of the Mandelbrot set fractal. Simply storing the 24-bit color of each pixel in this image would require 1.62 million bits; but a small computer program can reproduce these 1.62 million bits using the definition of the Mandelbrot set. Thus, the Kolmogorov complexity of the raw file encoding this bitmap is much less than 1.62 million.


This has a lot to do with compression, and in fact, what got me thinking about it was the idea of a "compressed" file, which is not like a compressed physical object; the latter is not truly "compressed" (its elements do not take up less space), while a computer file seemingly is -- how tremendous an idea! Something out of nothing! But it's all just an illusion, a blueprint, simple repetition. You're not "compressing" (for there is no such thing, you know), you're just eliminating redundancy, just as you do when you write 10^9 instead of 1,000,000,000, replacing 13 places with 4**. In fact, what you're doing is returning to the original formula that produced that number. (There is not always one, of course; those strings (i.e., everything in the universe) that are not redundant are completely incompressible.) You are creating the blueprint from the product, if you can, and then distributing the (hopefully much lighter) blueprint.

Finally, this is all just a summary. The main thrust of my thoughts has been what sort of data are compressible and what are not, which is covered in detail on the Wikipedia page.

*There's a catch; you need a language, whether natural (English) or artificial (C), to understand it. But this is irrelevant to the notion of complexity, because a language has rules for interpreting many things, and its rules can be "smaller" than their products.

No comments:

Post a Comment