CD and DVD movies available for download at download movies portal, cheap prices. If you are searching for mp3s, you could try to download mp3 music at this mp3 portal. Super quality, high speed downloads.

*

2007 / October 7th/ Long running tasks in Rails: Backgroundrb

A few months ago, I was looking for a solution to long running tasks in Rails. A long running task is basically just a process that can detach from the main application. Let’s say you have a task that will take 20 minutes to complete — you obviously don’t want your users waiting for a 20 minute pageload. The solution is to use something like backgroundrb.

The following is part introduction, part tutorial, and part commentary on backgroundrb based on my experience with it. Be forewarned: this article turned out much longer than initially anticipated — guess it goes to show you how amazing I think this plugin is. As an aside: this guide should give you all you need to start using backgroundrb 0.2.1.

Introduction

From the RDoc Page:

BackgrounDRb is a ruby job server and scheduler. It main intent is to be used with Ruby on Rails applications for offloading long running tasks. Since a rails application blocks while servicing a request it is best to move long running tasks off into a background process that is divorced from the http request/response cycle.

There are three main components to backgroundrb: the Rails plugin, the backgroundrb service, and the actual worker files. Installing the Rails plugin is a little non-standard, but easy nonetheless. From the docs:

  cd RAILS_ROOT/vendor/plugins
  svn co \
    http://svn.devjavu.com/backgroundrb/tags/release-0.2.1 \
    backgroundrb
  cd RAILS_ROOT
  rake backgroundrb:setup

You may need some additional ruby gems, but with a little bit of tweaking it was really easy getting it installed on my machine.

Workers

Workers are the meat and potatoes of backgroundrb: they’re the actual ruby that does something. Workers are pretty simple ruby files that are stored in lib/workers. Workers can have different bases: a plain worker, or a Rails worker. Rails workers have access to the Rails framework and all of your models. Plain workers are just plain old ruby files.

Creating a worker is pretty easy using the built-in generator:

    ./script/generate worker Ranker

At the end of the day, Workers are just ruby files — so remember that. Here’s a simple Worker:

class ExampleWorker < BackgrounDRb::Worker::Base

  # do_work is called when the worker is created
  def do_work(args)
    logger.info('ExampleWorker do work')
  end

end
ExampleWorker.register

Backgroundrb service

Since backgroundrb workers are disconnected from your application’s request/response cycle, this means it needs it’s own server (it can’t hook into mongrel or apache/etc). This is done by starting and stopping the service through the backgroundrb script:

    ./script/backgroundrb start
    ./script/backgroundrb stop

The one huge downside to this is that your classes are not dynamically loaded. Meaning that if you change the file and hit save — your code will not be updated in the running server. Every time you make a change to the worker files, you need to restart the server to see the changes.

This is an unfortunate necessity — and one that requires you tackle worker development slightly differently than standard Rails work. I cover some useful tips in the example below to help ease development.

When to use: a ranking algorithm

Sometimes it’s difficult to determine when you might need to use a long running task. For several months now I’ve been using backgroundrb to process and encode uploaded videos (using mencoder/ffmpeg). It’s been working really well, but that example is almost overly complex for your introduction to Backgroundrb.

A better introduction to backgroundrb is something I just implemented last night — a ranking algorithm. For the project I was working on, I wanted to be able to rank users and their videos on a weekly, monthly, yearly, and all-time basis based on 5-6 factors. Doing this on the fly is clearly unacceptable as I’d have to do some mad SQL calcs coupled with joining nearly every piece of data in the site just to return a list of 5 users.

So, I implemented a rankings table that effectively cached the ranking (via a points system) of each video and user per time period. I knew that updating this table was going to be more and more taxing on the system as time went on. So I decided to use a long running task to generate the table. This way, if the task takes 30 minutes to finish, it won’t matter.

The worker

I started off with a simple Rails-based Worker. There’s a lot of looping, and it’s definitely not the most optimized — but it gives you a good idea of how to implement a worker to do some work on your database.

class Ranker < BackgrounDRb::Worker::RailsBase

  def do_work(options)        
    @options = options
    logger.info "***** STARTING RANKING PROCESS *****"
    start_work
    logger.info "***** DONE WITH RANKING PROCESS ******"
  end

  def start_work
    logger.info "Ranking all clips (all time)"
    rank_clips('all')
    logger.info "Ranking all clips (weekly)"
    rank_clips('weekly')
    logger.info "Ranking all clips (monthly)"
    rank_clips('monthly')
    logger.info "Ranking all clips (yearly)"
    rank_clips('yearly')
    logger.info "Ranking all users (all time)"
    rank_users('all')
    logger.info "Ranking all users (weekly)"
    rank_users('weekly')
    logger.info "Ranking all users (monthly)"
    rank_users('monthly')
    logger.info "Ranking all users (yearly)"
    rank_users('yearly')
  end

  def rank_clips(time_period)
    # find the start time for the time period
    case time_period
    when 'all'
      start_time_raw = Time.now - 100.years
    when 'weekly'
      start_time_raw = Time.now.beginning_of_week
    when 'monthly'
      start_time_raw = Time.now.beginning_of_month
    when 'yearly'
      start_time_raw = Time.now.beginning_of_year
    end
    start_time = start_time_raw.to_formatted_s(:db)

    # calculate the points for the given time period
    Clip.find(:all).each do |clip|
      points = rank_value(:comments => clip.comments.count("created_at > '#{start_time}'"), 
                          :views => clip.views.count("created_at > '#{start_time}'"), 
                          :purchases => clip.payments.count("created_at > '#{start_time}'"), 
                          :featured => clip.featured ? (clip.featured_at > start_time_raw ? clip.featured : 0) : 0
              )
      case time_period
      when 'all'
        clip.ranking = clip.build_ranking(:value => 0, :time_period => "all") if (!clip.ranking)
        clip.ranking.value = points
        clip.ranking.save
      when 'weekly'
        clip.weekly_ranking = clip.build_weekly_ranking(:value => 0, :time_period => "weekly") if (!clip.weekly_ranking)
        clip.weekly_ranking.value = points
        clip.weekly_ranking.save
      when 'monthly'
        clip.monthly_ranking = clip.build_monthly_ranking(:value => 0, :time_period => "monthly") if (!clip.monthly_ranking)
        clip.monthly_ranking.value = points
        clip.monthly_ranking.save
      when 'yearly'
        clip.yearly_ranking = clip.build_yearly_ranking(:value => 0, :time_period => "yearly") if (!clip.yearly_ranking)
        clip.yearly_ranking.value = points
        clip.yearly_ranking.save
      end
    end

  end

  def rank_users(time_period)
    # find the start time for the time period
    case time_period
    when 'all'
      start_time_raw = Time.now - 100.years
    when 'weekly'
      start_time_raw = Time.now.beginning_of_week
    when 'monthly'
      start_time_raw = Time.now.beginning_of_month
    when 'yearly'
      start_time_raw = Time.now.beginning_of_year
    end
    start_time = start_time_raw.to_formatted_s(:db)

    # calculate the points for the given time period
    User.find(:all).each do |user|
      points = 0
      case time_period
      when 'all'
        user.clips.find(:all, :include => [:ranking]).each do |clip|
          points += clip.ranking.value
        end
        user.ranking = user.build_ranking(:value => 0, :time_period => "all") if (!user.ranking)
        user.ranking.value = points
        user.ranking.save
      when 'weekly'
        user.clips.find(:all, :include => [:weekly_ranking]).each do |clip|
          points += clip.weekly_ranking.value
        end
        user.weekly_ranking = user.build_weekly_ranking(:value => 0, :time_period => "weekly") if (!user.weekly_ranking)
        user.weekly_ranking.value = points
        user.weekly_ranking.save
      when 'monthly'
        user.clips.find(:all, :include => [:monthly_ranking]).each do |clip|
          points += clip.monthly_ranking.value
        end
        user.monthly_ranking = user.build_monthly_ranking(:value => 0, :time_period => "monthly") if (!user.monthly_ranking)
        user.monthly_ranking.value = points
        user.monthly_ranking.save
      when 'yearly'
        user.clips.find(:all, :include => [:yearly_ranking]).each do |clip|
          points += clip.yearly_ranking.value
        end
        user.yearly_ranking = user.build_yearly_ranking(:value => 0, :time_period => "yearly") if (!user.yearly_ranking)
        user.yearly_ranking.value = points
        user.yearly_ranking.save
      end
    end
  end

  # rank_value determines the final point value used for ranking
  # it has one input: a hash containing several different things a clip can be ranked for
  # example usage: rank_value(:comments => 5, :views => 500, :featured => 1)
  def rank_value(information)
    # removed for secretiveness :)
    return 50
  end

end

Ranker.register

Catch your syntax errors first

Remember that whole bit about re-starting the server every time you make a change to the worker files? This means you need to be pro-active about your error catching: so let’s tackle the easy ones first. On the command line just run the file with a simple ruby command:

    ruby ./lib/workers/ranker.rb

You’ll either get a syntax error (which you can then fix) or you’ll get a notice about how it can’t find a require’d file: that’s fine, and if you get that far, you can just keep going forward.

Logging is your friend

Backgroundrb produces two different logs while it’s running: the server log, and the backgroundrb log.

log/backgroundrb_server.log logs all errors caused by the startup / shutdown of the server itself. This means if you’ve got a syntax error in your worker, the error (well, an error) will show up in this file telling you the worker couldn’t be loaded.

log/backgroundrb.log logs all error/debug/info coming from your workers themselves. If your worker throws an exception during it’s work, the stack trace will be logged here.

Logging is by far the best debugging tool available to backgroundrb. If you put logger.info (debug, etc) anywhere in your worker, it’ll get thrown into this file. This is really the only easy way to gain insight into what your workers are doing as they’re working.

Use logging. It is your friend.

Tying it into the request cycle

Backgroundrb allows you to create workers from withing the request cycle of Rails itself. It does this through a MiddleMan object and things called job keys. Each time a new worker is created, a hash called a job key is assigned to it. Storing this job key allows you to check on the progress of the worker at a later time.

Here’s a sample method inside one of my controllers that manually calls the ranker:

def rank
  job_key = MiddleMan.new_worker(:class => :ranker)
  session[:ranking_key] = job_key
  flash[:notice] = "Clips are being ranked..."
  redirect_to :back
end

There are also other options for the new_worker method that allow you to send arguments or define the job key manually. In the simplest form, you need only to include the class.

Scheduling

As the last little part of my project, I needed my ranker to run every night. My inital reaction was to run a cron job. Unfortunately I’m still a newbie to DRb, so I needed to look up how to create a new worker thread and have it run. A quick search on cron backgroundrb resulted in some unexpected results.

Backgroundrb has scheduling built into it. Meaning cron-jobs, without the system-level integration. Finally a solution where my application level logic (scheduled tasks) was built into the application! One simple YAML file, a restart of the backgroundrb server and my scheduled task was running.

scheduled_ranker:
  :class: :ranker
  :job_key: :scheduled_ranker
  :worker_method: :do_work
  :worker_method_args: nil
  :trigger_args:
      :start: <%= Time.now + 5.seconds %>
      :repeat_interval: <%= 1.day %>

Holy crap. Easiest cron. Ever.

Conclusion

Just after setting up my scheduled worker I was about to proclaim my mad love towards backgroundrb — luckily I remembered that Ruby is a harsh mistress and I should probably thank Erza instead.

Erza: thank you for making such a kick ass plugin for Rails. If I could have found a donate button or Amazon wish list, you would have surely gotten something (hint hint)

If you’ve got some long running tasks that you need to implement: look no further than backgroundrb. It’s the cleanest solution I’ve ever seen. It makes creating detached processes so easy that you might actually use them. The only real caveat I’ve found is that it does add one more layer of complexity to your Rails stack — starting / stopping your backgroundrb service.

A word from the sponsors. Advertise with Warpspire

7 Comments

comments feed

  1. Gravatar
    Ryan

    October 9th | #

    I think this is exactly what I need. I have a ruby script that caches my flickr and delicious stuff to a file, then I marshall it in MANUALLY! I actually login to the shell and run the script periodically. I’ve tried cron for this, but no luck for whatever reason. I’ve also tried to call the script from my Rails application using %x[] and system, but no luck there, either.

    Do you think backgroundrb is a good thing to use for my situation? I’d probably schedule the task at 8:00 AM and 8:00 PM — seem reasonable?

  2. Gravatar
    Kyle

    October 10th | #

    Yep, I think it’d work perfectly for that.

  3. […] Long running tasks in Rails: Backgroundrb […]

  4. Gravatar
    izomorphius

    January 25th | #

    Could you be a bit more specific on the scheduling part? You never mentioned where to create that yaml file or how to name it. Anyway backgroundrb really sounds great and this tutorial has really been of help - I am only trying to help you make it perfect ;-) .

  5. Gravatar
    Jim Neath

    February 15th | #

    Hi Kyle,

    I’ve just been playing around with backgroundrb today, I’ve just got a quick question.

    I’m working on a site that needs to convert videos in the background (similiar to what you mentioned early in your article). Is the best option to start a new worker for each video or just have one worker that deals with all the videos?

    Thanks for the article, really helped me get my head around the whole background process jazz.

  6. Gravatar
    Peter

    April 9th | #

    This looks exactly what I was looking for! Thanks a lot!

  7. Gravatar
    Bill Harding

    May 5th | #

    Good article, a couple points of feedback.

    1. You mention grabbing the job key so we can check back on the job, but you never mention how that might be done, which renders the point of mentioning the capturing of the job key rather meaningless.

    2. While logging is always your friend, I would assert from my backgroundRb experience that the Ruby debugger is a much better friend than logfiles. It can work with backgroundRb so long as you don’t start it as a process (leave out the “start” param when starting BDRb) and you include the ruby debugger files in your worker such that it knows what the “debugger” line means.

    Those points notwithstanding, great work!

Make a Comment

don’t be afraid, it’s just text

Comments are parsed with Markdown. Basic HTML is also allowed.