Filed under: Linux | the bblog

introducing tinypass.py v1.0

Samuel Bierwagen — 2014-03-13T22:41:11-07:00

Before I can talk about what I did right, I have to talk about what I did wrong.

I host some files for a friend. They're great big zip files full of art, which he sells for money, so he'd like to put a password on them.

"Easy enough, this is exactly what HTTP Basic authentication is for."

But he'd like to be able to set passwords on files without having to ask me to manually fiddle with nginx config files.

"Well, I'll just whip up a quick forms-based thing for editing nginx config files. How hard could it be?"

(A chill wafts over your skin. Dread shivers up your spine.)

It took POSTed form data (filename, username, password) from a static HTML page, created a hashed password file from that password, appended a location /filename block to a config file, then called /etc/init.d/nginx reload.

And as soon as it was actually used by someone who didn't write it, it blew up.

Oh v0.1, there were so many things wrong with you, how could I possibly count them all?

1.) It had to have permissions to edit nginx config files and reload the server. So I just ran it as root, which meant that I was running a python web server as root, which is an absolute security disaster. I'm listing this first, even though nothing bad actually happened (as far as I can tell) because it was just a complete unforced error. This was the first warning sign that I was doing something dumb, and I completely ignored it.

2.) filename was just a text box, not a dropdown menu or picker, so it was trivially easy to typo a filename, and "set a password" on something that didn't exist. v0.1 had no error checking of any kind, so it couldn't refuse to do that.

3.) HTTP Basic is user-granular, but for this particular use we're doing file-granular permissions. HTTP Basic doesn't handle this very gracefully: if you're already logged in, and try to access a file you don't have permissions for, (say, if you bought several different items, or if you're me, and are trying to troubleshoot your broken fucking login system) then it just hits you with a 405 Authorization Needed error, no login window. Since HTTP Basic doesn't have a log out button, (hint: where would you put it?) you have to restart the browser, or just wait around until the browser expires your login credentials, which is, as you'd guess, implementation-specific.

4.) Remember when I said that it just appended lines to a config file and reloaded the server? v0.1 had no conception of records-- it was a basic CRUD app in theory, but in practice it only created records, it couldn't read, update or delete them. It would quite happily, create two location blocks for the same file.

Nginx will refuse to load a config file that has contradictory options. If you restart it with a bad config file, then it won't start back up, and your web server goes down until you fix it.

A minor decision I made early on really saved my ass here. I heard that using reload instead of restart let nginx wait for clients to finish transferring data, so I used it in the script. Luckily, reload won't take down your sever with a bad config file, it'll just refuse to load it.

So instead of blowing up the server, v0.1 just silently stopped applying changes until the config file was manually fixed.

Now, all these problems have solutions. You could conceivably train the end user to carefully work around the problems, on the theory that your software is great but the user is dumb, but when your tool collapses in a great heap of splinters at the slightest touch, then it's not the fault of the user, it's your fault.

You could also fix each of these bugs, add tests, etc, but the basic architecture of the program is just bad. It's fucked.

Ctrl+a, del.

Let's try again.

tinypass.py v1.0 is designed to replace HTTP Basic with something about as secure, but a little friendlier to use, as well as letting the end user set and change passwords without fatally confusing nginx.

Rather than sending login credentials in the clear over HTTP headers, like HTTP Basic, tinypass.py sends credentials in the clear over cookies, which is much more secure. Best practices here would be hashed passwords and session ID cookies, which would require more work. I didn't feel like doing that work, because...

"Principles of an Indie Game Bottom Feeder"

I don't really make a living selling games. I sell an ethical life.
How could I make a living selling games? Anyone who wants to pay me for my games doesn't have to. It's not like buying a chair, where they'll chase you down and taser you if you grab it and run out of the store. Nobody who wants my game on Windows or Mac has to pay for it to get it. Frankly, most of them don't.
So why do people pay for it? Because they understand a fundamental fact: For these games to exist, someone has to pay. If everyone just takes it, I'll have to get a real job and the supply will shut off. I don't want to get into one of the eternal tedious arguments about "software piracy". I will instead focus on one single, incontrovertible fact: I have a family to feed. If nobody pays for my games, I can't make them.
So what does someone get when they pay for my game? They get the knowledge that they are Part of the Solution and not Part of the Problem. They know that, in this case, they are one of the Good Guys. It is well-earned self-satisfaction, and it is valuable. To know they are doing the right thing, some people will happily pay 20 bucks. This is how I stay in business.

You can't stop piracy. DRM never works. You can't let somebody look at something without also letting them copy it. Cannot be done, impossible, full stop.

So tinypass.py is a speedbump, not an impassible wall. Since there are no confidential login credentials at risk, I don't go to any great lengths to keep them secure.

So hey! That's it. Check it out, I guess, just as long as you don't look at the commit history.

building dtwenty.org

Samuel Bierwagen — 2013-01-04T21:03:31-07:00

767 days ago, I commented on a HN submission about a random number generator:

3.) Providing random numbers as an advertisement for your fine line of hardware random number generators. Here it doesn't matter how much money you make [providing the numbers], you just want people to buy the hardware that made them. Oddly enough, none of the random number services (and there are quite a few) do this, for some inexplicable reason. There's not even an argument-from-proprietary technology, since HRNGs are supposed to generate perfectly random noise, and there's no way an attacker could stage a replay attack.

I left it there, because I was lazy. But last month, notorious badass Maciej Cegłowski created The Pinboard Co-Prosperity Cloud.

What is it?
The Pinboard Co-Prosperity Cloud is a startup self-incubator. Six successful applicants will receive a modest amount of funding and as much publicity as I can provide for their sustainable and useful business idea.
Is this a joke?
It is not a joke.
What are the requirements?
You must have a good idea that you are capable of building, a willingness to build it, and a plan for making it mildly profitable.
How much funding will I get?
Each successful applicant will receive $37. This will cover the cost of six months of hosting at prgmr.com and a productivity-enhancing hot beverage.

So I entered. Ha ha why not?

The more I thought about it though, the more I realized that I wasn't getting the joke. The idea was trivially simple. I already had a web server. I didn't need all that mad cash. I could just... build it.

So I did. It's right here. (EDIT 2013/3/22: I let the domain name lapse, and moved the content to bbot.org)

It was amazingly easy, even though this was pretty much my first major (har) piece of programming. I had never used python, javascript, or jquery before.

Web programming in the year 2012 has the smooth, well polished feel of something that has had the sharp edges worn off by the passage of thousands of other people. Getting nginx to talk to the WSGI server was a snap. Installing bottle.py was easy. JQuery was no problem.

Any time I had a problem, googling the error message would return a helpful, relevant page, explaining how my "build it as fast as possible, while learning as little as possible" design methodology had screwed me over again.

At the time, of course, it seemed a vast edifice of impossible complexity, but in retrospect it was painless. "It's easy to do if you know how to do it", maybe.

The only difficulty I faced was the hardware random number generator. The numbers had to come from it, since that was the whole point of the site; but my server was a virtual machine on the east coast, and my HRNG was sitting on my desk.

The "money" solution would be to buy a rackmount server, plug the widget into it, then slot it into a colo, but I didn't have money, and instead I had to be creative.

I couldn't just run the web server locally, since my ISP blocks port 80. Enter the ugly hack: I plugged the entropykey into a spare laptop, ran the application server on that, then ran a SSH tunnel to my web server, which communicates with the front end via JSON. It works, at the cost of an extra 150ms of latency per roll.

There's room to improve, of course. You could probably list off a dozen features dtwenty.org needs without pausing to draw breath, (starting with "make it less ugly") but, the ideal of the minimum viable product shines bright.

The second biggest problem after integrating the HRNG was the ad copy that makes up most of the page. It was originally twice as long-- ruthless editing has reduced to it merely "too long" from "far, far too long." This too could use improvement.

But! It's done and it works! Programming is fun.

why does nanoblogger generate broken links

Samuel Bierwagen — 2012-10-15T18:24:59-07:00

(Attention conservation notice: I found an obscure bug in my blog publishing software. You are unlikely to care about it.)

Why the hell does my site have so many broken links?

I'll spare you the grimy details of the hour of troubleshooting, and jump right to the punchline. Nanoblogger 3.4.2 has a bug which generates bad relative links when you do ./nb update all

Nanoblogger is no longer updated, so this isn't a problem that can be solved by upgrading. I didn't want to dive into the parsing engine, so I had to find a workaround, which turned out to be pretty simple: just update it a year at a time. ./nb update YYYY works perfectly. (ex. ./nb update 2012) I've only got six years of archives, so all I had to do was run it six times.

I'm posting this incredibly boring post in the hopes it'll save one of the six other users of nanoblogger some confusion in the future.

escape.sh

Samuel Bierwagen — 2012-09-30T07:13:13-07:00

I've pasted a lot of IRC logs into a lot of HTML documents, which is always a pain, since angle brackets are obviously a special character in HTML, which means I have to do a search and replace with the equivalent entity codes. I usually did this manually, using whatever graphical text editor was handy.

But that's Not The Hacker Way. I'm editing a text file produced by one program, so another program will accept it. String processing isn't a job fit for a human. This is something that should be done by a third program.

Thus:

#!/bin/sh
#
# escape.sh - Escapes angle brackets in text files
#
# Turns angle brackets into < and > HTML entities.
# With --irc, replaces the first 8 columns (the timestamp) with an 
# opening angle bracket, using an ugly hack.
#
# This is free and unencumbered software released into the public domain.

if [[ $* == *--irc* ]]
then
    sed -i 's/>/\>/g' $2
    sed -i 's/^......../\</g' $2
else
    sed -i 's//\>/g' $1
fi

(Github)

Then I stuck it in my $PATH with sudo cp escape.sh /usr/local/bin/escape This way you can run it from any directory just by doing escape example.txt

(It's not actually very Unixy-- it doesn't play well with pipes, and wildcard expansion in a directory will blow it up.)

Have fun!

ntpblogging II

Samuel Bierwagen — 2012-08-25T19:00:25-07:00

(previously)

So now bbot.org is a Stratum 2 NTP Pool server. (Its wiki page.)

Joining the pool is pretty easy: You create an account, give them your server's IP address, wait for the monitoring server to decide you're stable enough (~8 hours) and boom, you're in.

(The interface is a bit awkward: you paste the address in there, you don't click the "Add a server" link, which apparently doesn't do anything.)

I found four upstream servers by pinging 0.us.pool.ntp.org repeatedly, and choosing the one that were closest to me. Since bbot.org is in a datacenter right on the internet backbone, close can be very close:

# ntpq -np
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
-72.26.198.240   209.51.161.238   2 u  273 1024  377    2.320    3.100   1.201
+69.164.217.193  128.59.59.177    3 u  825 1024  377    3.713    0.239   0.371
-108.61.73.243   209.51.161.238   2 u  237 1024  377    3.174   -1.069   0.398
+128.113.28.67   18.26.4.105      2 u  383 1024  377    6.828    0.382   0.141
*128.118.25.5    .WWV.            1 u  426 1024  377   11.537    0.225   0.310

I had hoped that <10ms ping times would result in magically low offset numbers, measured in the tens of microseconds, but apparently jitter becomes a bigger problem when you get that low.

My reference stratum 1 server is wwv.tns.its.psu.edu, an open-access tier 1 server that John Balogh runs. Thanks John!

ntpblogging

Samuel Bierwagen — 2012-08-04T06:52:57-07:00

So I was farting about trying to figure out how to ask a NTP server what it thinks the time is without having to edit ntp.conf on the client machine, when I discovered that NTP is like SSH— any machine with it installed acts as a server.

So now both of my machines get their time from bbot.org:

magnesium:~ $ ntpq -np
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+76.72.161.27    138.236.128.112  3 u   44   64  377   81.174   -2.459   1.158
+209.177.158.233 134.21.35.167    3 u   42   64  377   62.137   -3.428   1.459
-64.34.171.122   198.60.22.240    2 u   42   64  377   85.527   -7.019   2.248
*69.50.219.51    209.51.161.238   2 u   36   64  377   54.501    0.361  45.780

bbot@neon:~ $ ntpq -np
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+76.72.161.27    138.236.128.112  3 u   60  128  377   80.328    0.690  10.561
*67.23.181.241   128.4.1.1        2 u   35  128  377   82.895    1.003   7.373
+69.167.160.102  204.9.54.119     2 u   70  128  377   75.712    4.162  12.724
+50.16.231.185   192.5.41.40      2 u   26  128  377   87.617   -0.344  64.595

(The legend for the inscrutable linux bullshit can be found in ntpq’s manual file)

(Of course, at the moment I took these screenshots, neither neon or magnesium were syncing to bbot.org…)

The magic incantation to ask a NTP server for the time is sntp,

$ sntp 0.pool.ntp.org
2012 Aug 04 01:30:15.000857 + 0.001475 +/- 0.083787 secs

Which will return, (if you’ve got a machine with accurate time) a tiny drift number (1.4ms) swamped by a giant lake of uncertainty. (83.7ms) This is because sntp can only request a single packet, which means it doesn’t have a good idea of the jitter between you and the remote machine. The NTP daemon manages to extract accurate time from the storm of random network noise by requesting lots of packets, then doing a lot of clever things. You can sanity-check sntp by going to time.is, which for any Linux machine will tell you that your clock is bang-on accurate. (NTP on Windows will only get the clock within 1000ms of the true time, by design. Microsoft assumes that their users aren’t pedantically obsessive nerds who absolutely positively must have the most accurate computer clocks possible. The same assumption can’t be made of Linux users.)

using application/ecmascript on nginx

2012-02-16T09:12:36-07:00

So while I was browsing the Wikipedia article on Internet Media Types, (I am a very boring person) I noticed this:

application/ecmascript: ECMAScript/JavaScript; Defined in RFC 4329 (equivalent to application/javascript but with stricter processing rules)

Hey now, what's all this then? "Stricter processing rules?" I love stricter processing rules! Let's look at that RFC:

3.  Deployed Scripting Media Types and Compatibility

   Various unregistered media types have been used in an ad-hoc fashion
   to label and exchange programs written in ECMAScript and JavaScript.
   These include:

      +-----------------------------------------------------+
      | text/javascript          | text/ecmascript          |
      | text/javascript1.0       | text/javascript1.1       |
      | text/javascript1.2       | text/javascript1.3       |
      | text/javascript1.4       | text/javascript1.5       |
      | text/jscript             | text/livescript          |
      | text/x-javascript        | text/x-ecmascript        |
      | application/x-javascript | application/x-ecmascript |
      | application/javascript   | application/ecmascript   |
      +-----------------------------------------------------+

So, it looks like a typical web "standards" clusterfuck. I had been wondering why nginx served Javascript with the application/x-javascript mime-type, which was used for non-standard protocols, and now I know.

The only use of JavaScript on bbot.org, prettify.js, a Javascript-based code prettyprinting package, is actually a prime use for bleeding edge standards wankery. Code highlighting is pure progressive enhancement so the readers still using IE6 on their 2002-vintage PowerPC iMacs shouldn't miss anything.

(My prettify.js implementation is actually backported from my low-signal, high-noise adjunct blog, where I wrote a pair of posts on the topic.)

Enabling it on nginx is fairly easy. Just add a line to /etc/nginx/mime.types:

    application/ecmascript                es;

This will deliver any file with the .es extension as application/ecmascript, which should Just Work in any modern browser. Pow! Whammo! Easy!

However, nginx by default will serve it uncompressed, and with a fairly short cache lifetime. Let's change that.

First, add application/ecmascript to wherever you keep your gzip_type declarations. (In my case, conf.d/compression.conf ) Next, tell nginx to deliver it with the proper cache headers. In my case, I already had a location{} block inside my server{} virtual host block that did that for a bunch of filetypes, so I added es to it:

location ~* \.(?:ico|css|js|gif|jpe?g|png|es)$ {
  expires max;
  add_header Cache-Control public;
}

This matches against a bunch of extensions, (including both jpg and jpeg) and delivers them with the Cache-Control: public header, as well as the maximum allowable time for the Expires header, which for nginx is Thu, 31 Dec 2037 23:55:55 GMT. This is a bit silly, since any browser will request a new copy long, long before the year 2037, but hey, why the hell not. Here's what the complete headers look like now:

bbot@neon:~$ curl -I --compressed bbot.org/prettify.es
HTTP/1.1 200 OK
Server: nginx/1.1.8
Date: Thu, 16 Feb 2012 13:48:01 GMT
Content-Type: application/ecmascript
Last-Modified: Thu, 16 Feb 2012 12:53:10 GMT
Connection: keep-alive
Vary: Accept-Encoding
Expires: Thu, 31 Dec 2037 23:55:55 GMT
Cache-Control: max-age=315360000
Cache-Control: public
Content-Encoding: gzip

fun and games with unix pipes

2012-01-14T21:07:46-07:00

So Atomic's started a new thing. The first post is interesting, however, probably not in the way she intended.

It consists of an image thumbnail named tumblr_lxr0pclKpO1rn6clco1_500.gif, which links to the larger version, tumblr_lxr0pclKpO1rn6clco1_1280.gif. _500.gif is odd in several ways. For one, it's actually a JPEG, delivered with the image/jpeg mime-type. Secondly, it's huge, weighing in at 1,369 kilobytes... for a 500x346 pixel thumbnail. The original GIF is only 147 kilobytes, which makes the thumbnail nine and a half times larger than the full size file.

We've been down this road before. Let's take a look at the file.

exiftool -htmlFormat -v tumblr_lxr0pclKpO1rn6clco1_500.gif > report.html

If you look at that report, you'll see that the first 57,324 bytes are a perfectly normal quality 92 JPEG file, of an entirely sane size for a 500x346 image. And then there's 1,344,572 bytes of "unknown trailer", which starts with 0xffd9, the JPEG magic number. Let's do a quick bgrep...

bbot@neon:~$ bgrep ffd9 tumblr_lxr0pclKpO1rn6clco1_500.gif 
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0000dfea
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0001c111
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0002a300
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000385b0
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0004690c
tumblr_lxr0pclKpO1rn6clco1_500.gif: 00054d19
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0006318d
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000716aa
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0007fc70
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0008e2fe
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0009ca24
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000ab212
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000b9a00
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000c8126
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000d67b4
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000e4d7a
tumblr_lxr0pclKpO1rn6clco1_500.gif: 000f3297
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0010170b
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0010fb18
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0011de74
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0012c124
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0013a313
tumblr_lxr0pclKpO1rn6clco1_500.gif: 0014843a
tumblr_lxr0pclKpO1rn6clco1_500.gif: 00156426

Huh. 24 instances. 24 * 57 kilobytes = 1368, which is about how big our file is. How many frames are there in the original animation?

bbot@neon:~$ identify tumblr_lxr0pclKpO1rn6clco1_1280.gif | wc -l
24

Somehow, when producing the 500 pixel thumbnail, Tumblr managed to produce a thumbnail for each individual animation frame, then concatenated all of them.

Wow.

Whoops.

How did they do this? Well, I'm guessing it was a pipe.

One of ssh's many, many party tricks is providing a transparent unix pipe between two machines. Presumably Tumblr has a front-end machine that accepts uploads from users, scales it down with Imagemagick, then transfers it to Amazon S3. Here's a one-liner that replicates the bug:

$ convert -resize 500 input.gif jpg:- | ssh user@server.example.com "dd of=output.gif"

It's less obvious why this command is being executed. Additionally, while it replicates the bug, it doesn't produce the exact same file, it's about 141 kilobytes smaller.

One possible reason is that Imagemagick chokes on the original file, becoming extremely confused when you ask it to scale the overlay frames. Given this command, which should Just Work:

convert -resize 500 -layers optimize tumblr_lxr0pclKpO1rn6clco1_1280.gif 500.gif

Produces this:

Which is both extravagantly broken, and ten times larger than the original, larger in image dimensions, file. So Tumblr might have added a step in their asset pipeline to normalize certain GIF animations that Imagemagick chokes on.

(There might be a more graceful way to do this than converting it to a Motion PNG. If there is, tell me.)

convert input.gif mng:- | convert -resize 500 -layers optimize - output.gif

As you can see, this actually works, though the "thumbnail" is still twice the size of the original file.

Now, (putting ourselves in the shoes of the nameless sysadmin who was doing this) let's add the next step, where we actually upload the file to the remote server. Except that, whoops! We were hacking on the JPEG thumbnail code earlier, and we accidentally tell Imagemagick to send the image data as JPEG.

convert input.gif mng:- | convert -resize 500 - jpg:- | ssh user@server.example.com "dd of=output.gif"

And so, we end up with this ridiculous situation where the thumbnail is nine times bigger than the original file. I guess the moral of the story is to always check to make sure that something which is supposed to make files smaller, actually makes files smaller.

(Previously.)

when "pretty secure" isn't secure enough

2012-01-08T14:58:11-07:00

"Richard Stallman Was Right All Along"

"As a member of the Walkman generation, I have made peace with the fact that I will require a hearing aid long before I die, and of course, it won't be a hearing aid, it will be a computer I put in my body," Doctorow explains, "So when I get into a car - a computer I put my body into - with my hearing aid - a computer I put inside my body - I want to know that these technologies are not designed to keep secrets from me, and to prevent me from terminating processes on them that work against my interests."

Something I've been thinking about off and on for the last seven years or so, is what the security model for an em would look like.

Background info, for non-transhumanists: "Em" is a short, pithy word coined by Robin Hanson to refer to a person running on a computer. The basic idea behind Whole Brain Emulation is to scan a human brain with an electron microscope, then make a model of all the scanned atoms, and run that model in a physics simulator, which will run all the chemical interactions between neurons like it was a physical brain. This model will have all the memories of the person that was scanned, but has all the advantages of software: functional immortality, easy copying, can be run millions of times faster than real time...

The problem arises when you start to think about what kind of computer you're going to run this simulation on. It must be completely, flawlessly, secure. It absolutely cannot be hacked, because once you lose control of that computer, that's the ballgame. A copy of your brain-state is you. It's got all your memories, knows all your passwords.

That's bad. It gets worse: a brain-state is software, it can't "die" in the organic sense of the word. You could torture it to death, over and over, for a thousand years; if you felt like it. "They populate the simulation spaces of its mind, exploring all the possible alternative endings to their life."

So it's pretty clear that the operating system for a em is going to have be very special indeed. Quebes isn't paranoid enough. OpenBSD isn't paranoid enough. seL4 isn't paranoid enough. You will need a degree of paranoia hitherto unseen outside of nuclear weapons safety protocols and space shuttle flight control systems. Multiple, concentric, airgapped systems. ASICs that refuse to export their contents. Physical safety interlocks. Power draw monitoring. (Here being used in a somewhat unusual way: monitoring the power draw of a secure processor to verify that it hasn't been compromised) Provably secure code. Self-destruct charges!

Some of the sting of "killing yourself rather than be captured by the enemy" is taken out by having a couple dozen copies as backup, however.

This is a bar set amazingly, impossibly high; and it goes absolutely without saying that no general-purpose commercial OS clears it. However, many of the freedom-destroying technologies cut both ways. The Xbox 360, which has been out for six years now, uses code signing to enforce a closed platform. Downside: no third-party software, at all. Upside: there has never been a virus on the 360. (apt-get uses a weak form of code signing, and to the best of my knowledge, has never distributed a virus either)

The Trusted Platform Module can be used to build a computer which you can only install Windows on, but can also be used by Linux to protect against certain attacks.

This blog post doesn't really have a point, I just wanted to talk about some stuff. Sorry.

I'm certainly not saying that there's some kind of tradeoff between open-source and security. That would just be utter, blithering nonsense. I guess if there's any point here that I'm flailing in the direction of, it's that there are certain dual-use technologies, which are in danger of being misused by people looking to make money at the expense of the users; also known as the Facebook strategy.

adventures in HTML optimization

2011-06-10T09:41:29-07:00

[1]

Last week, someone linked me to Google Page Speed. This sucked, since it directly resulted in me spending rather a lot more time than strictly necessary dicking around with Apache configuration files.

My server doesn't get a whole lot of traffic,[2] so I hadn't bothered setting Expires: headers, under the "who cares" school of thought. When Apache CPU utilization doesn't get above .01% when you're getting 5 hits a second from HN, there's not a whole lot of incentive to aggressively cache files. But when I ran bbot.org through Page Speed, I received the humiliating news that it only scored 68/100. 68! That's a low number!

Resolving most of the issues was easy, (Turning on Expires, bzip compression, changing the black and white logo image from full 24-bit color to grayscale, etc.) but going from 98/100 to 100/100 was kinda painful.

Page Speed is a vast improvement over YSlow, but it shares some of the inherent problems of an automated performance tool.

For one, it doesn't seem to care much about actual page load times, as long as you follow its rules. It gives cracked.com, ("Auschwitz for javascript engines") a phat 90/100, even though the front page takes 8.2 seconds to load, makes 188 HTTP requests from about a billion domains, loads 36 javascript files, throws 166 warnings in Chrome's audit tab, has 466 unused CSS rules, and is, in fact, pure evil. For a period of time while I was testing things, the links div loaded its own font-face, the smallcaps version of Linux Libertine. Now, anyone with half a brain can tell you that loading a 300 kilobyte font just to style 20 words of text, something you can just do in CSS with font-variant:small-caps anyway, is pants-on-head retarded; but Page Speed was totally fine with it.

For two, while it doesn't actually come out and say that you should "Use a CMS" like YSlow does, (A monumentally useless piece of "advice") it sure does wink a lot and nod suggestively towards it.

The sticking point, that robbed me of two points and kept from that tantalizing perfect score, was "Inline small CSS". I, of course, kept the stylesheet in an external file, and linked to it from every page, because it's easier to maintain that way. Except, of course, it's a small stylesheet, and page rendering would be faster if you just stuck it in the HTML file. This would be a pain... unless you used a CMS, which could just seamlessly inline a stylesheet when publishing a document. Funny.

Inlining the CSS granted me the two points, but then page speed turned right around and docked me one, since the stylesheet had a lot of whitespace, and pushed the file over the tipping point where Google thought it would be worthwhile to minify the source code. Now this is a pain, since I hand-edit my code, and minifying makes code hideously ugly. Would have been trivially easy to do if I used a CMS, of course. Minifying was tedious and fiddly, since the tool I used liked to munge the inlined CSS, and scribble all over my link formatting. It was worth it, though! After twenty minutes of swearing, I finally trimmed off that last 120 bytes, and scored a perfect 100/100! Yeah!

Now, if you'll excuse me, I have to go put everything back to the way it was.

1: Google showing off their mad UI skillz there on the "refresh results" button.

2: How much traffic does it get? Last month, my host called in a panic. Apparently, my box had consumed 1000% more bandwidth than it did from the month before-- it had used up 10 megabytes! My site doesn't get a lotta traffic, I tell ya, every page load takes 30 seconds, because the disks end up spinning down between hits! My traffic is so low, my Alexia site rank is measured in scientific notation! It's low, I tell ya!