Retreat!

December 6, 2003

I give up. I'm going to disable the parallel feed grabbing in Raggle so we can put out a new version. Claes (pekdon) suggested I try and rewrite it, but the implementation is already pretty simple. Here's a high-level view of how the old non-parallel and new parallel feed grabbing stuff works:

Old Code

$config['feeds'].each { |feed|
  # download feed
}

New Code

threads = { }
$config['feeds'].each { |the_feed|
  threads[the_feed['url']] = Thread::new(the_feed) { |feed|
    # download feed
  }

  thread = threads[the_feed['url']]
  if thread && thread.status == 'run' &&
     !$config['grab_in_parallel']
    # thread.join            
  end
  until Thread::list.size < ($config['max_threads'] || 10)
    $log.puts 'DEBUG: waiting for threads'
    sleep 5
  end
}

Of course, looking at this code as I'm pasting it, it just occured to me that if you have two feeds with the same URL, you could have two threads trying to muck with the feed at the same time. Wonder if that's what's causing Ruby to freak out. By the way, this is why I really dislike threads. Not because I'm an ignoramus, but because they encourage subtle bugs like this. Anyway, let's see if that fixes our random crash woes.

Oh, and before anyone asks, yes, I realize that's not the best way to implement the thread capping stuff. And yes, I realize thread pooling would be more efficient. Right now I'm just trying to get it to work reliably, then I'll focus on optimization.