2007 / October 7th/ Long running tasks in Rails: Backgroundrb
A few months ago, I was looking for a solution to long running tasks in Rails. A long running task is basically just a process that can detach from the main application. Let’s say you have a task that will take 20 minutes to complete — you obviously don’t want your users waiting for a 20 minute pageload. The solution is to use something like backgroundrb.
The following is part introduction, part tutorial, and part commentary on backgroundrb based on my experience with it. Be forewarned: this article turned out much longer than initially anticipated — guess it goes to show you how amazing I think this plugin is. As an aside: this guide should give you all you need to start using backgroundrb 0.2.1.
Introduction
From the RDoc Page:
BackgrounDRb is a ruby job server and scheduler. It main intent is to be used with Ruby on Rails applications for offloading long running tasks. Since a rails application blocks while servicing a request it is best to move long running tasks off into a background process that is divorced from the http request/response cycle.
There are three main components to backgroundrb: the Rails plugin, the backgroundrb service, and the actual worker files. Installing the Rails plugin is a little non-standard, but easy nonetheless. From the docs:
cd RAILS_ROOT/vendor/plugins
svn co \
http://svn.devjavu.com/backgroundrb/tags/release-0.2.1 \
backgroundrb
cd RAILS_ROOT
rake backgroundrb:setup
You may need some additional ruby gems, but with a little bit of tweaking it was really easy getting it installed on my machine.
Workers
Workers are the meat and potatoes of backgroundrb: they’re the actual ruby that does something. Workers are pretty simple ruby files that are stored in lib/workers. Workers can have different bases: a plain worker, or a Rails worker. Rails workers have access to the Rails framework and all of your models. Plain workers are just plain old ruby files.
Creating a worker is pretty easy using the built-in generator:
./script/generate worker Ranker
At the end of the day, Workers are just ruby files — so remember that. Here’s a simple Worker:
class ExampleWorker < BackgrounDRb::Worker::Base
# do_work is called when the worker is created
def do_work(args)
logger.info('ExampleWorker do work')
end
end
ExampleWorker.register
Backgroundrb service
Since backgroundrb workers are disconnected from your application’s request/response cycle, this means it needs it’s own server (it can’t hook into mongrel or apache/etc). This is done by starting and stopping the service through the backgroundrb script:
./script/backgroundrb start
./script/backgroundrb stop
The one huge downside to this is that your classes are not dynamically loaded. Meaning that if you change the file and hit save — your code will not be updated in the running server. Every time you make a change to the worker files, you need to restart the server to see the changes.
This is an unfortunate necessity — and one that requires you tackle worker development slightly differently than standard Rails work. I cover some useful tips in the example below to help ease development.
When to use: a ranking algorithm
Sometimes it’s difficult to determine when you might need to use a long running task. For several months now I’ve been using backgroundrb to process and encode uploaded videos (using mencoder/ffmpeg). It’s been working really well, but that example is almost overly complex for your introduction to Backgroundrb.
A better introduction to backgroundrb is something I just implemented last night — a ranking algorithm. For the project I was working on, I wanted to be able to rank users and their videos on a weekly, monthly, yearly, and all-time basis based on 5-6 factors. Doing this on the fly is clearly unacceptable as I’d have to do some mad SQL calcs coupled with joining nearly every piece of data in the site just to return a list of 5 users.
So, I implemented a rankings table that effectively cached the ranking (via a points system) of each video and user per time period. I knew that updating this table was going to be more and more taxing on the system as time went on. So I decided to use a long running task to generate the table. This way, if the task takes 30 minutes to finish, it won’t matter.
The worker
I started off with a simple Rails-based Worker. There’s a lot of looping, and it’s definitely not the most optimized — but it gives you a good idea of how to implement a worker to do some work on your database.
class Ranker < BackgrounDRb::Worker::RailsBase
def do_work(options)
@options = options
logger.info "***** STARTING RANKING PROCESS *****"
start_work
logger.info "***** DONE WITH RANKING PROCESS ******"
end
def start_work
logger.info "Ranking all clips (all time)"
rank_clips('all')
logger.info "Ranking all clips (weekly)"
rank_clips('weekly')
logger.info "Ranking all clips (monthly)"
rank_clips('monthly')
logger.info "Ranking all clips (yearly)"
rank_clips('yearly')
logger.info "Ranking all users (all time)"
rank_users('all')
logger.info "Ranking all users (weekly)"
rank_users('weekly')
logger.info "Ranking all users (monthly)"
rank_users('monthly')
logger.info "Ranking all users (yearly)"
rank_users('yearly')
end
def rank_clips(time_period)
# find the start time for the time period
case time_period
when 'all'
start_time_raw = Time.now - 100.years
when 'weekly'
start_time_raw = Time.now.beginning_of_week
when 'monthly'
start_time_raw = Time.now.beginning_of_month
when 'yearly'
start_time_raw = Time.now.beginning_of_year
end
start_time = start_time_raw.to_formatted_s(:db)
# calculate the points for the given time period
Clip.find(:all).each do |clip|
points = rank_value(:comments => clip.comments.count("created_at > '#{start_time}'"),
:views => clip.views.count("created_at > '#{start_time}'"),
:purchases => clip.payments.count("created_at > '#{start_time}'"),
:featured => clip.featured ? (clip.featured_at > start_time_raw ? clip.featured : 0) : 0
)
case time_period
when 'all'
clip.ranking = clip.build_ranking(:value => 0, :time_period => "all") if (!clip.ranking)
clip.ranking.value = points
clip.ranking.save
when 'weekly'
clip.weekly_ranking = clip.build_weekly_ranking(:value => 0, :time_period => "weekly") if (!clip.weekly_ranking)
clip.weekly_ranking.value = points
clip.weekly_ranking.save
when 'monthly'
clip.monthly_ranking = clip.build_monthly_ranking(:value => 0, :time_period => "monthly") if (!clip.monthly_ranking)
clip.monthly_ranking.value = points
clip.monthly_ranking.save
when 'yearly'
clip.yearly_ranking = clip.build_yearly_ranking(:value => 0, :time_period => "yearly") if (!clip.yearly_ranking)
clip.yearly_ranking.value = points
clip.yearly_ranking.save
end
end
end
def rank_users(time_period)
# find the start time for the time period
case time_period
when 'all'
start_time_raw = Time.now - 100.years
when 'weekly'
start_time_raw = Time.now.beginning_of_week
when 'monthly'
start_time_raw = Time.now.beginning_of_month
when 'yearly'
start_time_raw = Time.now.beginning_of_year
end
start_time = start_time_raw.to_formatted_s(:db)
# calculate the points for the given time period
User.find(:all).each do |user|
points = 0
case time_period
when 'all'
user.clips.find(:all, :include => [:ranking]).each do |clip|
points += clip.ranking.value
end
user.ranking = user.build_ranking(:value => 0, :time_period => "all") if (!user.ranking)
user.ranking.value = points
user.ranking.save
when 'weekly'
user.clips.find(:all, :include => [:weekly_ranking]).each do |clip|
points += clip.weekly_ranking.value
end
user.weekly_ranking = user.build_weekly_ranking(:value => 0, :time_period => "weekly") if (!user.weekly_ranking)
user.weekly_ranking.value = points
user.weekly_ranking.save
when 'monthly'
user.clips.find(:all, :include => [:monthly_ranking]).each do |clip|
points += clip.monthly_ranking.value
end
user.monthly_ranking = user.build_monthly_ranking(:value => 0, :time_period => "monthly") if (!user.monthly_ranking)
user.monthly_ranking.value = points
user.monthly_ranking.save
when 'yearly'
user.clips.find(:all, :include => [:yearly_ranking]).each do |clip|
points += clip.yearly_ranking.value
end
user.yearly_ranking = user.build_yearly_ranking(:value => 0, :time_period => "yearly") if (!user.yearly_ranking)
user.yearly_ranking.value = points
user.yearly_ranking.save
end
end
end
# rank_value determines the final point value used for ranking
# it has one input: a hash containing several different things a clip can be ranked for
# example usage: rank_value(:comments => 5, :views => 500, :featured => 1)
def rank_value(information)
# removed for secretiveness :)
return 50
end
end
Ranker.register
Catch your syntax errors first
Remember that whole bit about re-starting the server every time you make a change to the worker files? This means you need to be pro-active about your error catching: so let’s tackle the easy ones first. On the command line just run the file with a simple ruby command:
ruby ./lib/workers/ranker.rb
You’ll either get a syntax error (which you can then fix) or you’ll get a notice about how it can’t find a require’d file: that’s fine, and if you get that far, you can just keep going forward.
Logging is your friend
Backgroundrb produces two different logs while it’s running: the server log, and the backgroundrb log.
log/backgroundrb_server.log logs all errors caused by the startup / shutdown of the server itself. This means if you’ve got a syntax error in your worker, the error (well, an error) will show up in this file telling you the worker couldn’t be loaded.
log/backgroundrb.log logs all error/debug/info coming from your workers themselves. If your worker throws an exception during it’s work, the stack trace will be logged here.
Logging is by far the best debugging tool available to backgroundrb. If you put logger.info (debug, etc) anywhere in your worker, it’ll get thrown into this file. This is really the only easy way to gain insight into what your workers are doing as they’re working.
Use logging. It is your friend.
Tying it into the request cycle
Backgroundrb allows you to create workers from withing the request cycle of Rails itself. It does this through a MiddleMan object and things called job keys. Each time a new worker is created, a hash called a job key is assigned to it. Storing this job key allows you to check on the progress of the worker at a later time.
Here’s a sample method inside one of my controllers that manually calls the ranker:
def rank
job_key = MiddleMan.new_worker(:class => :ranker)
session[:ranking_key] = job_key
flash[:notice] = "Clips are being ranked..."
redirect_to :back
end
There are also other options for the new_worker method that allow you to send arguments or define the job key manually. In the simplest form, you need only to include the class.
Scheduling
As the last little part of my project, I needed my ranker to run every night. My inital reaction was to run a cron job. Unfortunately I’m still a newbie to DRb, so I needed to look up how to create a new worker thread and have it run. A quick search on cron backgroundrb resulted in some unexpected results.
Backgroundrb has scheduling built into it. Meaning cron-jobs, without the system-level integration. Finally a solution where my application level logic (scheduled tasks) was built into the application! One simple YAML file, a restart of the backgroundrb server and my scheduled task was running.
scheduled_ranker:
:class: :ranker
:job_key: :scheduled_ranker
:worker_method: :do_work
:worker_method_args: nil
:trigger_args:
:start: <%= Time.now + 5.seconds %>
:repeat_interval: <%= 1.day %>
Holy crap. Easiest cron. Ever.
Conclusion
Just after setting up my scheduled worker I was about to proclaim my mad love towards backgroundrb — luckily I remembered that Ruby is a harsh mistress and I should probably thank Erza instead.
Erza: thank you for making such a kick ass plugin for Rails. If I could have found a donate button or Amazon wish list, you would have surely gotten something (hint hint)
If you’ve got some long running tasks that you need to implement: look no further than backgroundrb. It’s the cleanest solution I’ve ever seen. It makes creating detached processes so easy that you might actually use them. The only real caveat I’ve found is that it does add one more layer of complexity to your Rails stack — starting / stopping your backgroundrb service.
7 Comments
Make a Comment
don’t be afraid, it’s just text

Warpspire is the place that web professional Kyle Neath writes about the web. 


October 9th | #
I think this is exactly what I need. I have a ruby script that caches my flickr and delicious stuff to a file, then I marshall it in MANUALLY! I actually login to the shell and run the script periodically. I’ve tried cron for this, but no luck for whatever reason. I’ve also tried to call the script from my Rails application using %x[] and system, but no luck there, either.
Do you think backgroundrb is a good thing to use for my situation? I’d probably schedule the task at 8:00 AM and 8:00 PM — seem reasonable?
October 10th | #
Yep, I think it’d work perfectly for that.
October 17th | #
[…] Long running tasks in Rails: Backgroundrb […]
January 25th | #
Could you be a bit more specific on the scheduling part? You never mentioned where to create that yaml file or how to name it. Anyway backgroundrb really sounds great and this tutorial has really been of help - I am only trying to help you make it perfect ;-) .
February 15th | #
Hi Kyle,
I’ve just been playing around with backgroundrb today, I’ve just got a quick question.
I’m working on a site that needs to convert videos in the background (similiar to what you mentioned early in your article). Is the best option to start a new worker for each video or just have one worker that deals with all the videos?
Thanks for the article, really helped me get my head around the whole background process jazz.
April 9th | #
This looks exactly what I was looking for! Thanks a lot!
May 5th | #
Good article, a couple points of feedback.
You mention grabbing the job key so we can check back on the job, but you never mention how that might be done, which renders the point of mentioning the capturing of the job key rather meaningless.
While logging is always your friend, I would assert from my backgroundRb experience that the Ruby debugger is a much better friend than logfiles. It can work with backgroundRb so long as you don’t start it as a process (leave out the “start” param when starting BDRb) and you include the ruby debugger files in your worker such that it knows what the “debugger” line means.
Those points notwithstanding, great work!