The blog post hiatus has ended! Here's what's new in the world o' Pablotron. First of all, the main hard drive on
vault
— my file/database/LDAP/email
server — bit the dust last Wednesday. Fortunately the drive
just started to fail (instead of dying outright). I had ample
room to do immediate backups, and I had an unused 160G drive laying
around. I spent most of Sunday afternoon and all of Monday evening,
partitioning the new drive and copying stuff back to it. As far as I
can tell, the only thing I actually lost was the words file for spamprobe
. I don't really consider that much of a
loss, since I
save all my email (even the cursed spam), so I can easily toss the
requisite good and bad corpora at spamprobe
to get things
going again. Even though I'm short a 100G drive now, the experience
overall has been a positive one. Here's some thoughts I had; maybe
they'll prevent a week of stress for someone else:
- Regular backups are just something you do. The ad-hoc
backups I've been doing are better than nothing, but they wouldn't have
done me any good if the my drive had died outright. Had the
circumstances been different, I would have lost weeks, possibly even a
month of email. My solution is (rather, will be, once everything is up
and running again) an
NFS-mounted backup
directory on every machine (obviously not for peope who don't like NFS)). Each machine will be responsible for it's
own daily and weekly backups, via
cron
. Depending on how large this data set is, I'll be burning DVDs of the backup directory contents on a weekly or bi-weekly basis. Aside: Richard (richlowe) has been advocating revision controlled config files for quite a while (eg.cvs -d pabs@cvs:/cvs co etc-files/vault
); maybe I'll give that for a spin, too. - Distribute services across machines. I've got 4 other machines
sitting around twiddling their thumbs at the moment. Any of them coud
easily be an authentication, database, email,
LDAP,
or CVS server,
but instead they're all sitting around twiddling their thumbs (to be
fair,
sumo
is my IRC /PostGres machine, but that hardly qualifies as a crippling load). - Keep extra hardware laying around. As a true geek you're already
doing this, of course :). The drive in
vault
started failing at 1:30 in the morning on a Wendesday morning. I was able to start making backups and moving stuff around right then. If I didn't have the extra hard drive, I would have been SOL for several platter-scraping hours. - Losing your spam filter settings means you get to say cool words like "corpora" on your web page.
On the non-catastrophic hardware failure front, I upgraded
halcyon
to the latest Xorg, then
promptly downgraded to the latest stable release. Here's the
approximate order of events:
- Spent an hour or two configuring, compiling, and installing the latest Xorg.
- Ran X, and found out that the proprietary NVidia driver isn't compatible with the latest CVS snapshot of Xorg.
- Discovered just how painful the composite
extension is without hardware acceleration by foolishly attempting
to run X using the
nv
driver. Hint: Imagine using Netscape Navigator 3.0 on your old Commodore 64 with Photoshop doing an RLE Gaussian Blur on a 100 meg image in the background. - Promptly downgraded to the stable release, cursing both NVidia for their proprietary sillyness, and the bastards at freedesktop.org for having the audacity to make source code changes that inconvenienced me. I spent plenty of time on this step, so go ahead and re-read that last paragraph a couple of times.
Since I spent the majority of a Sunday afternoon recompiling X no less
than 3 times, I also took the opportunity to try out the latest Enlightenment DR16 from CVS (yes Kim, I'm one of
the few
people still using e16). It's got it's own built-in, mostly (semi?)
working composite manager, so the neither the patch nor the
xcompmgr
hackery I describe in this post are necessary any
more). The new default theme looks great, too!
Why use other peoples' broken software when you can write your own? Here's the latest on the Pablotron coding front:
- I've converted the RSS feeds on pablotron.org, paulduncan.org, and raggle.org from steaming loads of standards-incompliant crap to pedantically-correct RSS 2.0. If your RSS aggregator couldn't read my pages before, it probably can now (unless your aggregator is based on the RSS library built-in to Ruby 1.8, but I'll get to that part of the story in a few minutes...)
- Lots and lots and lots of updates to the next version of Raggle. Some of the changes are even by me! Thomas Kirchner (redshift) has been
doing an unbelievable amount of work on the CVS version of
Raggle. So much so, in fact, that I feel kind of embarassed calling
this latest version mine at all. So I think when it's ready for
release, we'll call it
kirchneraggle
or something more suitable ;). - This
patch for Ruby which
adds
wcolor_set
support to the built-in Curses interface. Ville suggested it eons ago, and that was the last thing stopping me from porting Raggle from Ncurses-Ruby. - A partially working Curses windowing library for Ruby. This isn't in CVS just yet, but don't worry, I've got some new stuff for you to play with. Keep reading...
The big stuff I've been working on lately is core of the future Raggle. Before I begin, here's a high-level overview of how the components interact with one another (yup, a diagram!):
I've mentioned Squaggle
previously,
but for those of you sleeping in the back of the class (you know who you
are), here's a brief
recap. Squaggle is
the SQLite-Ruby-based
engine for Raggle. It's cleaner, faster, it
uses less memory, and it lets me do all sorts of cool things I can't
really do with the current engine (fancty delicious-style tagging, fast cross-feed
searching, smart/auto categorization, and more). The version of Squaggle in CVS
is functional (it even includes a usable WEBrick-based interface.
So what's this new stuff on ye olde diagram? libptime
is a
C-based
RFC822 datetime and W3C datetime parsing library. It's
BSD licensed, so you can download
version 0.1.0 (signature),
and use it to your heart's content. The other new library on the
diagram is libfeed
, an
Expat-based RSS
(0.9x, 1.0, and
2.0)/Atom
feed parser. Why bother writing an RSS
parser in C? The existing Raggle engine is
slow, partly from being DOM-based, and partly from being
written in Ruby. Don't get me wrong, REXML is a
great XML parser,
but RSS
aggregators deal in volume, and I want to be sure the volume isn't
constrained by parsing. I also noticed there wasn't a nice C-based RSS/Atom
parsing library. Now there is (well, almost!). If that doesn't convince you, then maybe this will:
pabs@halcyon:~/cvs/libfeed/test> du -sh data/big-pdo-wdom.rss
15M data/big-pdo-wdom.rss
pabs@halcyon:~/cvs/libfeed/test> time perl -mXML::RSS -e \
'$rss = new XML::RSS; $rss->parsefile("data/big-pdo-wdom.rss");'
real 7m56.892s
user 4m31.578s
sys 0m19.939s
pabs@halcyon:~/cvs/libfeed/test> time perl -mXML::RSS -e \
'$rss = new XML::RSS; $rss->parsefile("data/big-pdo-wdom.rss");'
real 5m57.838s
user 4m28.727s
sys 0m3.703s
pabs@halcyon:~/cvs/libfeed/test> time ruby -rrss/2.0 -e \
'RSS::Parser::parse(File.read("data/big-pdo-wdom.rss"))'
real 2m30.950s
user 1m46.904s
sys 0m8.610s
pabs@halcyon:~/cvs/libfeed/test> time ./testfeed data/big-pdo-wdom.rss \
>/dev/null 2>&1
real 0m2.195s
user 0m1.472s
sys 0m0.104s
pabs@halcyon:~/cvs/libfeed/test> time ./testfeed data/big-pdo-wdom.rss \
>/dev/null 2>&1
real 0m2.010s
user 0m1.475s
sys 0m0.099s
The Perl times were so bad I had to run them twice to be sure. 60 times faster than Ruby and over 100 times faster than Perl; I'd say that's a pretty good start :).
Unfortunately, I have to be awake in three hours, so I'll have to save the rest of the next-gen Raggle description for another day...