Sat Jan 14 21:07:46 EST 2012

fun and games with unix pipes

So Atomic's started a new thing. The first post is interesting, however, probably not in the way she intended.

It consists of an image thumbnail named tumblr_lxr0pclKpO1rn6clco1_500.gif, which links to the larger version, tumblr_lxr0pclKpO1rn6clco1_1280.gif. _500.gif is odd in several ways. For one, it's actually a JPEG, delivered with the image/jpeg mime-type. Secondly, it's huge, weighing in at 1,369 kilobytes... for a 500x346 pixel thumbnail. The original GIF is only 147 kilobytes, which makes the thumbnail nine and a half times larger than the full size file.

We've been down this road before. Let's take a look at the file.

exiftool -htmlFormat -v tumblr_lxr0pclKpO1rn6clco1_500.gif > report.html

If you look at that report, you'll see that the first 57,324 bytes are a perfectly normal quality 92 JPEG file, of an entirely sane size for a 500x346 image. And then there's 1,344,572 bytes of "unknown trailer", which starts with 0xffd9, the JPEG magic number. Let's do a quick bgrep...

bbot@neon:~$ bgrep ffd9 tumblr_lxr0pclKpO1rn6clco1_500.gif 
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0000dfea
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0001c111
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0002a300
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000385b0
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0004690c
tumblr_lxr0pclKpO1rn6clco1_500.gif: 00054d19
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0006318d
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000716aa
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0007fc70
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0008e2fe
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0009ca24
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000ab212
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000b9a00
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000c8126
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000d67b4
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000e4d7a
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000f3297
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0010170b
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0010fb18
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0011de74
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0012c124
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0013a313
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0014843a
tumblr_lxr0pclKpO1rn6clco1_500.gif: 00156426

Huh. 24 instances. 24 * 57 kilobytes = 1368, which is about how big our file is. How many frames are there in the original animation?

bbot@neon:~$ identify tumblr_lxr0pclKpO1rn6clco1_1280.gif | wc -l
24

Somehow, when producing the 500 pixel thumbnail, Tumblr managed to produce a thumbnail for each individual animation frame, then concatenated all of them.

Wow.

Whoops.

How did they do this? Well, I'm guessing it was a pipe.

One of ssh's many, many party tricks is providing a transparent unix pipe between two machines. Presumably Tumblr has a front-end machine that accepts uploads from users, scales it down with Imagemagick, then transfers it to Amazon S3. Here's a one-liner that replicates the bug:

$ convert -resize 500 input.gif jpg:- | ssh user@server.example.com "dd of=output.gif"

It's less obvious why this command is being executed. Additionally, while it replicates the bug, it doesn't produce the exact same file, it's about 141 kilobytes smaller.

One possible reason is that Imagemagick chokes on the original file, becoming extremely confused when you ask it to scale the overlay frames. Given this command, which should Just Work:

convert -resize 500 -layers optimize tumblr_lxr0pclKpO1rn6clco1_1280.gif 500.gif

Produces this:

Which is both extravagantly broken, and ten times larger than the original, larger in image dimensions, file. So Tumblr might have added a step in their asset pipeline to normalize certain GIF animations that Imagemagick chokes on.

(There might be a more graceful way to do this than converting it to a Motion PNG. If there is, tell me.)

convert input.gif mng:- | convert -resize 500 -layers optimize - output.gif

As you can see, this actually works, though the "thumbnail" is still twice the size of the original file.

Now, (putting ourselves in the shoes of the nameless sysadmin who was doing this) let's add the next step, where we actually upload the file to the remote server. Except that, whoops! We were hacking on the JPEG thumbnail code earlier, and we accidentally tell Imagemagick to send the image data as JPEG.

convert input.gif mng:- | convert -resize 500 - jpg:- | ssh user@server.example.com "dd of=output.gif"

And so, we end up with this ridiculous situation where the thumbnail is nine times bigger than the original file. I guess the moral of the story is to always check to make sure that something which is supposed to make files smaller, actually makes files smaller.

(Previously.)


Posted by | Permanent link | File under: important, Linux