New Raggle Engine in CVS

October 25, 2004

What will probably become the new Raggle engine is now in CVS, under the module name squaggle. Here's what I've got so far:

  • SQLite backend.
  • Full Conditional HTTP GET support (both ETag and Last-Modified)
  • HTTP proxy support (via the http_proxy env variable or the config hash; there's a stub for win32 proxy support at the moment)
  • HTTP 1.0 basic authentication support
  • Simple adding and listing feeds (via the Squaggle#feeds and Squaggle#feed_items methods
  • Engine should be Ruby thread safe, but at the moment there's some quirk with the SQLite behavior.
  • Significantly better memory consumption (memory use will ultimately depend on the interface implementation, but the engine is designed so the interface can query as much or as little information about feeds and feed items as it wants)
  • Basic RSS 0.91-0.92 (Userland), 1.0, and 2.0 support (presumably it'll work with Netscape 0.90-0.91 and Userland 0.93-0.94 feeds as well, although I haven't tested with those). There are stubs for RSS 1.0 modules (via the feed_attrs table, for elements I haven't implemented yet, and for Atom support as well. I have more to say about this one below

I spent a bunch of time in the last month reading through as many RSS specs as could get my hands on. I read through the Atom spec as well. The three biggest problems users have had with Raggle are speed, memory use, and supported feeds. I'm attempting to address the speed issue in a couple of ways: by deferring as much of the internal searching and sorting to SQLite (aside: this also has a side benefit of dramatically simplifying the code, since all the funky array indexing, time conversions, ID hashing, etc goes away and becomes SQL queries :D). The memory use has also been addressed with a caveat (see my note above about the end-user interfaces and memory requirements). Paradoxically, the Ncurses interface may end up using more memory than the web interface, because the Ncurses interface has more speed and caching requirements than the web interface. As for proper feed support, that one is a little bit trickier.

Supporting RSS properly is actually kind of a bitch, because there is no official standard (although there are plenty of specifications). Even worse, a lot of feeds play fast an loose with requirements, so strict RSS parsers (like the undocumented one included with Ruby 1.8, or Chad Fowler's Ruby/RSS module) are nice pieces of code, but useless for writing an RSS aggregator, in the same way that strict HTML parsers are useless for web browsers.

The way I dealt with this problem in previous versions of Raggle was to simply ignore the specs that were out there and look for specific elements in feeds. This has worked so well I'm going to keep doing it, with a twist. My goal with Squaggle is to keep Raggle aware of as much of the RSS spectrum as I can, but have the engine (Squaggle) only pay attention to what it absolutely has to. For example, if a feed has mixed RSS 0.92/1.0 elements, Raggle will parse it blindly and save what it can.

What I've got so far is available in CVS under the module squaggle. Play around with it and let me know what you think.