Sun Sep 30 07:13:13 EDT 2012

I've pasted a lot of IRC logs into a lot of HTML documents, which is always a pain, since angle brackets are obviously a special character in HTML, which means I have to do a search and replace with the equivalent entity codes. I usually did this manually, using whatever graphical text editor was handy.

But that's Not The Hacker Way. I'm editing a text file produced by one program, so another program will accept it. String processing isn't a job fit for a human. This is something that should be done by a third program.


# - Escapes angle brackets in text files
# Turns angle brackets into < and > HTML entities.
# With --irc, replaces the first 8 columns (the timestamp) with an 
# opening angle bracket, using an ugly hack.
# This is free and unencumbered software released into the public domain.

if [[ $* == *--irc* ]]
    sed -i 's/>/\>/g' $2
    sed -i 's/^......../\</g' $2
    sed -i 's/</\&lt;/g' $1
    sed -i 's/>/\&gt;/g' $1


Then I stuck it in my $PATH with sudo cp /usr/local/bin/escape This way you can run it from any directory just by doing escape example.txt

(It's not actually very Unixy-- it doesn't play well with pipes, and wildcard expansion in a directory will blow it up.)

Have fun!

Posted by Samuel Bierwagen | Permanent link | File under: Linux