Sat Jan 14 21:07:46 EST 2012
fun and games with unix pipes
It consists of an image thumbnail named
tumblr_lxr0pclKpO1rn6clco1_500.gif, which links to the larger version,
tumblr_lxr0pclKpO1rn6clco1_1280.gif. _500.gif is odd in several ways. For one, it's actually a JPEG, delivered with the image/jpeg mime-type. Secondly, it's huge, weighing in at 1,369 kilobytes... for a 500x346 pixel thumbnail. The original GIF is only 147 kilobytes, which makes the thumbnail nine and a half times larger than the full size file.
We've been down this road before. Let's take a look at the file.
exiftool -htmlFormat -v tumblr_lxr0pclKpO1rn6clco1_500.gif > report.html
If you look at that report, you'll see that the first 57,324 bytes are a perfectly normal quality 92 JPEG file, of an entirely sane size for a 500x346 image. And then there's 1,344,572 bytes of "unknown trailer", which starts with 0xffd9, the JPEG magic number. Let's do a quick bgrep...
bbot@neon:~$ bgrep ffd9 tumblr_lxr0pclKpO1rn6clco1_500.gif tumblr_lxr0pclKpO1rn6clco1_500.gif: 0000dfea tumblr_lxr0pclKpO1rn6clco1_500.gif: 0001c111 tumblr_lxr0pclKpO1rn6clco1_500.gif: 0002a300 tumblr_lxr0pclKpO1rn6clco1_500.gif: 000385b0 tumblr_lxr0pclKpO1rn6clco1_500.gif: 0004690c tumblr_lxr0pclKpO1rn6clco1_500.gif: 00054d19 tumblr_lxr0pclKpO1rn6clco1_500.gif: 0006318d tumblr_lxr0pclKpO1rn6clco1_500.gif: 000716aa tumblr_lxr0pclKpO1rn6clco1_500.gif: 0007fc70 tumblr_lxr0pclKpO1rn6clco1_500.gif: 0008e2fe tumblr_lxr0pclKpO1rn6clco1_500.gif: 0009ca24 tumblr_lxr0pclKpO1rn6clco1_500.gif: 000ab212 tumblr_lxr0pclKpO1rn6clco1_500.gif: 000b9a00 tumblr_lxr0pclKpO1rn6clco1_500.gif: 000c8126 tumblr_lxr0pclKpO1rn6clco1_500.gif: 000d67b4 tumblr_lxr0pclKpO1rn6clco1_500.gif: 000e4d7a tumblr_lxr0pclKpO1rn6clco1_500.gif: 000f3297 tumblr_lxr0pclKpO1rn6clco1_500.gif: 0010170b tumblr_lxr0pclKpO1rn6clco1_500.gif: 0010fb18 tumblr_lxr0pclKpO1rn6clco1_500.gif: 0011de74 tumblr_lxr0pclKpO1rn6clco1_500.gif: 0012c124 tumblr_lxr0pclKpO1rn6clco1_500.gif: 0013a313 tumblr_lxr0pclKpO1rn6clco1_500.gif: 0014843a tumblr_lxr0pclKpO1rn6clco1_500.gif: 00156426
Huh. 24 instances. 24 * 57 kilobytes = 1368, which is about how big our file is. How many frames are there in the original animation?
bbot@neon:~$ identify tumblr_lxr0pclKpO1rn6clco1_1280.gif | wc -l 24
Somehow, when producing the 500 pixel thumbnail, Tumblr managed to produce a thumbnail for each individual animation frame, then concatenated all of them.
How did they do this? Well, I'm guessing it was a pipe.
One of ssh's many, many party tricks is providing a transparent unix pipe between two machines. Presumably Tumblr has a front-end machine that accepts uploads from users, scales it down with Imagemagick, then transfers it to Amazon S3. Here's a one-liner that replicates the bug:
$ convert -resize 500 input.gif jpg:- | ssh email@example.com "dd of=output.gif"
It's less obvious why this command is being executed. Additionally, while it replicates the bug, it doesn't produce the exact same file, it's about 141 kilobytes smaller.
One possible reason is that Imagemagick chokes on the original file, becoming extremely confused when you ask it to scale the overlay frames. Given this command, which should Just Work:
convert -resize 500 -layers optimize tumblr_lxr0pclKpO1rn6clco1_1280.gif 500.gif
Which is both extravagantly broken, and ten times larger than the original, larger in image dimensions, file. So Tumblr might have added a step in their asset pipeline to normalize certain GIF animations that Imagemagick chokes on.
(There might be a more graceful way to do this than converting it to a Motion PNG. If there is, tell me.)
convert input.gif mng:- | convert -resize 500 -layers optimize - output.gif
As you can see, this actually works, though the "thumbnail" is still twice the size of the original file.
Now, (putting ourselves in the shoes of the nameless sysadmin who was doing this) let's add the next step, where we actually upload the file to the remote server. Except that, whoops! We were hacking on the JPEG thumbnail code earlier, and we accidentally tell Imagemagick to send the image data as JPEG.
convert input.gif mng:- | convert -resize 500 - jpg:- | ssh firstname.lastname@example.org "dd of=output.gif"
And so, we end up with this ridiculous situation where the thumbnail is nine times bigger than the original file. I guess the moral of the story is to always check to make sure that something which is supposed to make files smaller, actually makes files smaller.