Images are just too big. A 3 MB bitmap compresses down to a 500 KB JPEG, which, don’t get me wrong, 16% of the original size is great, but why 500 KB? That’s still pretty large.
Now we do.
A week or so ago, Stable Diffusion was released, and the world went crazy, and for good reason. Stable Diffusion, if you haven’t heard, is a new AI that generates realistic images from a text prompt. You basically give it a description of the image you want, and it generates it.
Now, this alone would be revolutionary, but we got double the revolution this time: This thing can also take an image and tell you the prompt you can use to generate it.
Are you thinking what I’m thinking?
That’s right, why compress an image to 500 KB when you can compress it to 50 bytes, where the bytes are the prompt that can be used to generate the exact same image again?
You wouldn’t, of course not.
Instead, what you would do, is ask the image-describing AI to describe the image, take the resulting (very small) prompt, transmit it over the wire, where the recipient would then use it to generate the image again based on the prompt.
I call this technique STAV, or Stable Transcription and Artistic Validation. Yes, the acronym might not contain any of the words “image”, “compression”, “reconstruction”, or “diffusion”, but Philip Katzip isn’t going to be the only one giving his name to compression techniques.
As is widely known, a picture is worth 1000 words. At an average English word length of 4.7, we can expect each image to take up to 4.7 KB, regardless of its original size. The corrolary here is that we can use this method to also upscale images without any loss in quality, which I have accepted as a very fortunate side-effect of my technique.
Sure, this may have some loss of quality, but it would generally depend on the number of iterations you ran when generating the image.
Based on the numbers above, here are some rough estimates on the gains we can expect to see:
|Size compared to STAV
As we can see, due to the fact that STAV has fixed size, it is easily potentially infinitely smaller than both AVIF and JPEG, which is good.
Of course, no new compression method is complete without real-world benchmark data to back up its claims. This is why I’ve compiled an extensive analysis of sample images from Unsplash, and am presenting them here.
In the images that follow, the leftmost is the uncompressed (raw) image, the middle image is compressed with JPEG, and the rightmost image is compressed with STAV. I haven’t bothered to include the raw and JPEG sizes, as they’re thousands of times larger than the sizes of the STAV images.
For your edification, I have also included the entire STAV-compressed data below each image, in the form of the prompt that was recognized by img2prompt. Let’s analyze them one by one.
Objects in shot
As we can see, the compressor deals with objects in the shot excellently. There is no visible degradation at all, and the final image is sharp and vibrant.
One interesting note here: img2prompt has correctly intuited that the image is from Unsplash, and has mentioned that in its generated prompt. This will doubtless improve compression even further.
Another excellent performance here. The lighting is impeccable, the hairs are sharp and well-defined, and the hat looks great on the lady.
Performance here isn’t as stellar as in the other shots, as the colors are imperceptibly more muted than the original, but overall there is almost no difference. The original and the STAV-compressed images are nearly indistinguishable. JPEG is disappointing, as there are visible artifacts.
Somehow, the food in the STAV-compressed image looks even more delicious than the original. Otherwise, there is no perceptible quality difference.
Food and people
This particular image posed a challenge for the compressor, with its sharp detail and subtle blur, but the compressor pulled through. Details are preserved and vibrant, and even the blur is visible. Why we’d want to keep the blur, I don’t know, but a compressor must be faithful above all.
There isn’t much to say here. STAV blows JPEG out of the water, the flower looks almost alive, even though the original image contains no flower. If anything, this enhancement showcases a strength of this technique.
We can see here that the compressor has preserved every tiny detail of the original image, except the house, which was, admittedly, kind of ugly. It’s heartening to see this method go from strength to strength as it even enhances images.
As you can see, there is basically no loss in quality, even though the images’s sizes are around a ten-thousandth the original’s. This is an absolutely astonishing result, and will definitely herald a new era of compression. There are even some cases where quality is better than the original, and it is astonishing for a compressor to achieve 100%+ quality.
There are some minor kinks that need to be worked out, such as the fact that each image takes around a day to generate on mobile, but this is more than acceptable in certain domains. Website visitors, for example, are well-accustomed to such loading times, and would barely notice any difference.
In conclusion, I really believe that this method can help lower file sizes and make a significant difference in various niches, e.g. the web, or games that come on multiple floppy disks. I urge you to give it a try and see what kind of results you get.
If you have any feedback, please Tweet or toot at me, or email me directly. I would especially like to hear of any pathological edge-cases where the final image is somehow significantly different from the original, so I can investigate.