Sat Nov 5 13:28:54 EDT 2011
shooting yourself in the foot with great verve and accuracy
So I was doing my usual morning routine, which is looking at the tumblr themes of homestuck fans while sighing heavily, when I noticed something even more egregiously stupid than the usual fare.
That page loads 322 files. (Ugh) One of them is... different.
It would save 118kb? But wait, judging from the filename, that's a 40x40 avatar image!
Turns out, in total, it's one hundred and twenty one fucking kilobytes. Running it through PNGsquash takes it down to 3.54 kilobytes. The old file is thirty four times bigger. Just for shits and giggles, I popped it into gimp and saved it as an entirely uncompressed 32-bit BMP file. Here it is:
Double the size! Awful, terrible! It's now 6.3 kilobytes.
Now, not everyone can be as awesome as me, and use a 296 byte avatar image, but still, a 121 kilobyte 40x40 image file is a bit bloody much. Let's run it through pngchunks:
Chunk: Data Length 13 (max 2147483647), Type 1380206665 [IHDR] Critical, public, PNG 1.2 compliant, unsafe to copy IHDR Width: 40 IHDR Height: 40 IHDR Bitdepth: 8 IHDR Colortype: 6 IHDR Compression: 0 IHDR Filter: 0 IHDR Interlace: 0 IHDR Compression algorithm is Deflate IHDR Filter method is type zero (None, Sub, Up, Average, Paeth) IHDR Interlacing is disabled Chunk CRC: -1929463699 Chunk: Data Length 106022 (max 2147483647), Type 1346585449 [iCCP] Ancillary, public, PNG 1.2 compliant, unsafe to copy ... Unknown chunk type Chunk CRC: -1377520713 Chunk: Data Length 6 (max 2147483647), Type 1145523042 [bKGD] Ancillary, public, PNG 1.2 compliant, unsafe to copy ... Unknown chunk type Chunk CRC: -113001601 Chunk: Data Length 9 (max 2147483647), Type 1935231088 [pHYs] Ancillary, public, PNG 1.2 compliant, safe to copy ... Unknown chunk type Chunk CRC: 1976496277 Chunk: Data Length 6991 (max 2147483647), Type 1951945850 [zTXt] Ancillary, public, PNG 1.2 compliant, safe to copy ... Unknown chunk type Chunk CRC: 1156069395 Chunk: Data Length 6313 (max 2147483647), Type 1951945850 [zTXt] Ancillary, public, PNG 1.2 compliant, safe to copy ... Unknown chunk type Chunk CRC: -331828581 Chunk: Data Length 52 (max 2147483647), Type 1951942004 [tEXt] Ancillary, public, PNG 1.2 compliant, safe to copy ... Unknown chunk type Chunk CRC: -1807344212 Chunk: Data Length 1491 (max 2147483647), Type 1951945850 [zTXt] Ancillary, public, PNG 1.2 compliant, safe to copy ... Unknown chunk type Chunk CRC: 1166967249 Chunk: Data Length 3325 (max 2147483647), Type 1413563465 [IDAT] Critical, public, PNG 1.2 compliant, unsafe to copy IDAT contains image data Chunk CRC: -384872633 Chunk: Data Length 37 (max 2147483647), Type 1951942004 [tEXt] Ancillary, public, PNG 1.2 compliant, safe to copy ... Unknown chunk type Chunk CRC: 437683276 Chunk: Data Length 37 (max 2147483647), Type 1951942004 [tEXt] Ancillary, public, PNG 1.2 compliant, safe to copy ... Unknown chunk type Chunk CRC: 1800092912 Chunk: Data Length 17 (max 2147483647), Type 1951942004 [tEXt] Ancillary, public, PNG 1.2 compliant, safe to copy ... Unknown chunk type Chunk CRC: 745887135 Chunk: Data Length 32 (max 2147483647), Type 1951942004 [tEXt] Ancillary, public, PNG 1.2 compliant, safe to copy ... Unknown chunk type Chunk CRC: -376046480 Chunk: Data Length 0 (max 2147483647), Type 1145980233 [IEND] Critical, public, PNG 1.2 compliant, unsafe to copy IEND contains no data Chunk CRC: -1371381630
As you would know if you had read the wikipedia article on PNG, (Neil also has a good overview of the format) it's one of the modern "container" types, with various types of chunks, most of them compressed with DEFLATE (which most people know as gzip). This is why compressing a PNG file does little to nothing: it's already compressed. PNGchunks just lists the chunks inside the container format.
As wikipedia will tell you, there's four critical chunks, IHDR, (header) PLTE, (palette) IDAT, (the actual image) and IEND. (image end) This image doesn't have a palette, since it's in full 24-bit RGB color. Here's IDAT, again:
Chunk: Data Length 3325 (max 2147483647), Type 1413563465 [IDAT] Critical, public, PNG 1.2 compliant, unsafe to copy IDAT contains image data Chunk CRC: -384872633
3,325 bytes. That makes sense.
Then there's 8 tEXt and zTXt fields. One contains EXIF metadata, one contains separate (?) IPTC XMP metadata. Then there's what is probably another copy of the image, in Adobe 8BIM format. These, combined, use up 14,970 bytes, four and a half times bigger than the image itself.
That's dumb. But it gets dumber. So far we've only accounted for 18,308 bytes of the file. But if we look at the chunk list again...
Chunk: Data Length 106022 (max 2147483647), Type 1346585449 [iCCP] Ancillary, public, PNG 1.2 compliant, unsafe to copy ... Unknown chunk type Chunk CRC: -1377520713
That, my friends, is a 106 kilobyte Kodak sRGB color profile for a 3 kilobyte image file. Gaze in awe.
This is not exactly an unknown problem. Google constantly harps on optimizing images, but Tumblr blindly reuses images that its users hand it. This makes me sad.