Sat Nov 5 13:28:54 EDT 2011

shooting yourself in the foot with great verve and accuracy

So I was doing my usual morning routine, which is looking at the tumblr themes of homestuck fans while sighing heavily, when I noticed something even more egregiously stupid than the usual fare.

That page loads 322 files. (Ugh) One of them is... different.

It would save 118kb? But wait, judging from the filename, that's a 40x40 avatar image!

Turns out, in total, it's one hundred and twenty one fucking kilobytes. Running it through PNGsquash takes it down to 3.54 kilobytes. The old file is thirty four times bigger. Just for shits and giggles, I popped it into gimp and saved it as an entirely uncompressed 32-bit BMP file. Here it is:

Double the size! Awful, terrible! It's now 6.3 kilobytes.

Now, not everyone can be as awesome as me, and use a 296 byte avatar image, but still, a 121 kilobyte 40x40 image file is a bit bloody much. Let's run it through pngchunks:

Chunk: Data Length 13 (max 2147483647), Type 1380206665 [IHDR]
  Critical, public, PNG 1.2 compliant, unsafe to copy
  IHDR Width: 40
  IHDR Height: 40
  IHDR Bitdepth: 8
  IHDR Colortype: 6
  IHDR Compression: 0
  IHDR Filter: 0
  IHDR Interlace: 0
  IHDR Compression algorithm is Deflate
  IHDR Filter method is type zero (None, Sub, Up, Average, Paeth)
  IHDR Interlacing is disabled
  Chunk CRC: -1929463699
Chunk: Data Length 106022 (max 2147483647), Type 1346585449 [iCCP]
  Ancillary, public, PNG 1.2 compliant, unsafe to copy
  ... Unknown chunk type
  Chunk CRC: -1377520713
Chunk: Data Length 6 (max 2147483647), Type 1145523042 [bKGD]
  Ancillary, public, PNG 1.2 compliant, unsafe to copy
  ... Unknown chunk type
  Chunk CRC: -113001601
Chunk: Data Length 9 (max 2147483647), Type 1935231088 [pHYs]
  Ancillary, public, PNG 1.2 compliant, safe to copy
  ... Unknown chunk type
  Chunk CRC: 1976496277
Chunk: Data Length 6991 (max 2147483647), Type 1951945850 [zTXt]
  Ancillary, public, PNG 1.2 compliant, safe to copy
  ... Unknown chunk type
  Chunk CRC: 1156069395
Chunk: Data Length 6313 (max 2147483647), Type 1951945850 [zTXt]
  Ancillary, public, PNG 1.2 compliant, safe to copy
  ... Unknown chunk type
  Chunk CRC: -331828581
Chunk: Data Length 52 (max 2147483647), Type 1951942004 [tEXt]
  Ancillary, public, PNG 1.2 compliant, safe to copy
  ... Unknown chunk type
  Chunk CRC: -1807344212
Chunk: Data Length 1491 (max 2147483647), Type 1951945850 [zTXt]
  Ancillary, public, PNG 1.2 compliant, safe to copy
  ... Unknown chunk type
  Chunk CRC: 1166967249
Chunk: Data Length 3325 (max 2147483647), Type 1413563465 [IDAT]
  Critical, public, PNG 1.2 compliant, unsafe to copy
  IDAT contains image data
  Chunk CRC: -384872633
Chunk: Data Length 37 (max 2147483647), Type 1951942004 [tEXt]
  Ancillary, public, PNG 1.2 compliant, safe to copy
  ... Unknown chunk type
  Chunk CRC: 437683276
Chunk: Data Length 37 (max 2147483647), Type 1951942004 [tEXt]
  Ancillary, public, PNG 1.2 compliant, safe to copy
  ... Unknown chunk type
  Chunk CRC: 1800092912
Chunk: Data Length 17 (max 2147483647), Type 1951942004 [tEXt]
  Ancillary, public, PNG 1.2 compliant, safe to copy
  ... Unknown chunk type
  Chunk CRC: 745887135
Chunk: Data Length 32 (max 2147483647), Type 1951942004 [tEXt]
  Ancillary, public, PNG 1.2 compliant, safe to copy
  ... Unknown chunk type
  Chunk CRC: -376046480
Chunk: Data Length 0 (max 2147483647), Type 1145980233 [IEND]
  Critical, public, PNG 1.2 compliant, unsafe to copy
  IEND contains no data
  Chunk CRC: -1371381630

As you would know if you had read the wikipedia article on PNG, (Neil also has a good overview of the format) it's one of the modern "container" types, with various types of chunks, most of them compressed with DEFLATE (which most people know as gzip). This is why compressing a PNG file does little to nothing: it's already compressed. PNGchunks just lists the chunks inside the container format.

As wikipedia will tell you, there's four critical chunks, IHDR, (header) PLTE, (palette) IDAT, (the actual image) and IEND. (image end) This image doesn't have a palette, since it's in full 24-bit RGB color. Here's IDAT, again:

Chunk: Data Length 3325 (max 2147483647), Type 1413563465 [IDAT]
  Critical, public, PNG 1.2 compliant, unsafe to copy
  IDAT contains image data
  Chunk CRC: -384872633

3,325 bytes. That makes sense.

Then there's 8 tEXt and zTXt fields. One contains EXIF metadata, one contains separate (?) IPTC XMP metadata. Then there's what is probably another copy of the image, in Adobe 8BIM format. These, combined, use up 14,970 bytes, four and a half times bigger than the image itself.

That's dumb. But it gets dumber. So far we've only accounted for 18,308 bytes of the file. But if we look at the chunk list again...

Chunk: Data Length 106022 (max 2147483647), Type 1346585449 [iCCP]
  Ancillary, public, PNG 1.2 compliant, unsafe to copy
  ... Unknown chunk type
  Chunk CRC: -1377520713

That, my friends, is a 106 kilobyte Kodak sRGB color profile for a 3 kilobyte image file. Gaze in awe.

This is not exactly an unknown problem. Google constantly harps on optimizing images, but Tumblr blindly reuses images that its users hand it. This makes me sad.


Posted by | Permanent link | File under: important, nerdery