Skip to content

cjohansen.no

Collecting tweets with Twibot and ActiveRecord

I recently launched a website that among other things displays "tweets" (status messages on Twitter) from a predefined set of Twitter users. In this article I'll show you how you can create a stateful Twibot bot with a custom handler to put tweets in your database using ActiveRecord from a Rails application.

Creating a Twibot bot

Back in March when I first announced Twibot, I showed several examples of how you can use the provided DSL to create a Twitter bot, like this one:

require 'twibot'

# Respond to @replies if they come from the right crowd
#
reply :from => [:cjno, :irbno] do |message, params|
  post_reply message, "I agree"
end

Run ruby bot.rb --login myaccount and you've got a bot up and running.

While DSL is awesome for many cases, it may not always be enough. Let's say you're running a website where users can register their Twitter account to appear on your site. To fetch the tweets with Twibot, you'd need a bot which changes over time. In the example above, we've defined two users to accept tweets from, but in our website example, we really want to fetch the list of users from the database.

Stateful bots

When you use the Twibot DSL, at the very least a Twibot::Bot and Twibot::Handler object is created for you behind the scenes. Bypassing the DSL, you can create these yourself. Rehashing the above example, take a look at this:

require "twibot"

class MyBotHandler < Twibot::Handler
  def initialize
    # The two parameters are the same as those accepted by the DSL:
    # pattern and options
    super(nil, :from => [:cjno, :irbno])
  end

  def handle(message, params)
    post_reply message, "I agree"
  end
end

# Create a bot instance, and hook it up with our handler
# The bot should use the default configuration, but override any settings found in
# configuration file
# The default location of the file is config/bot.yml
# The second parameter tells Twibot to prompt for credentials, if none are provided
# through configuration
bot = Twibot::Bot.new(Twibot::Config.default << Twibot::FileConfig.new, true)
bot.add_handler(:tweet, MyBotHandler.new)
bot.run!

This yields a few more lines of code, but the extended possibilities should be quite clear. The handler is now a stateful object, which can alter it's state while the bot is running. This is particularly interesting for our website example: we can alter the internal :from option whenever there are new users available in the database. This way, the bot will respond to tweets (and store them in the database) from new users whenever they are available.

Creating the tweet consuming bot

To solve the task at hand, all we really need is a custom Twibot::Handler (like the one above), a Tweet model of some kind (I won't go into the specifics of that here, it's entirely up to you how you want to record data), and a small script that creates a bot instance, attaches the handler and runs it (like above). This could be solved by a rake task.

The handler: storing tweets

To get a basic handler going that can store tweets, we need a handler that:

  1. Recognizes Twitter users stored in the database
  2. Saves Tweet objects in its handle method
class TweetCollector < Twibot::Handler
  def initialize
    super(nil, :from => screen_names)
  end

  # Store tweets in the database
  #
  def handle(message, params)
    Tweet.from_status(message).save
  end

 protected
  # Return array of users screen names
  #
  def screen_names
    users.collect { |user| user.screen_name.downcase }
  end

  # Array of Twitter users to store tweets from
  #
  def users
    @users ||= TwitterUser.all
  end
end

The Tweet.from_status method is assumed to convert a Twitter::Status object (from Twitter4R) to a local Tweet object (which is an ActiveRecord model). The resulting object is saved to the database.

So far, so good. Unfortunately, the current implementation does not pick up new users after the bot is run, since it only every updates the user list once - inside the constructor. There are several ways to fix this. I'll show you one.

Picking up new users after the bot has started

Twitter is a real time medium. For this reason, the bot probably needs to poll the Twitter service fairly often to be useful. This means that we should probably avoid loading up a new list of users on each handle. We'll give the bot an option - users_ttl - number of seconds the list of users is good. When this many seconds has passed, we'll reload the users from the database.

class TweetCollector < Twibot::Handler
  def initialize(users_ttl = 30.minutes)
    super(nil, :from => screen_names)
    @users_ttl = users_ttl
    @users_updated = Time.now
  end

  # Store tweets in the database
  #
  def handle(message, params)
    Tweet.from_status(message).save
    update_users
  end

 protected
  # Update internal user list, if it has not been updated the last @users_ttl seconds
  def update_users
    return if @users_updated > @users_ttl.ago

    @users = nil
    @options[:from] = screen_names
    @users_updated = Time.now
  end

  # Return array of users screen names
  #
  def screen_names
    users.collect { |user| user.screen_name.downcase }
  end

  # Array of Twitter users to store tweets from
  #
  def users
    @users ||= TwitterUser.all
  end
end

Making the bot come alive is a simple addition: we add an internal timestamp to keep track of when the users list was last updated. Then, everytime we handle a tweet, we check if we need to update the list. If the list is updated, the new users will be in the loop from here on.

I set the default timeout to 30 minutes. How often you need to update the users depends on how often people register Twitter users at your site, and how long a delay your willing to allow from a user is registered to when his tweets appear on the site.

Making it stable

The bot will work as intended as is, but we're still missing the one piece that will allow your bot to run continuously. Keeping the process running long, you're bound to loose your database connection at some point. Now, ActiveRecord provides ActiveRecord.with_connection to help out with this (among other things). Unfortunately, it's not currently very useful unless you're executing raw SQL. Luckily, it will be soon.

Until it's fixed, we'll just roll our own. All we really need is a method that takes a block, executes it, and retries if it fails due to a dead database connection. Something like this:

def monitor_connection
  begin
    yield
  rescue ActiveRecord::StatementInvalid => e
    if e.to_s =~ /away/
      ActiveRecord::Base.establish_connection and retry
    else
      raise e
    end
  end
end

Using this, we can now present the final version of our bot:

class TweetCollector < Twibot::Handler
  def initialize(users_ttl = 30.minutes)
    super(nil, :from => screen_names)
    @users_ttl = users_ttl
    @users_updated = Time.now
  end

  # Store tweets in the database
  #
  def handle(message, params)
    monitor_connection { Tweet.from_status(message).save }
    update_users
  end

 protected
  # Update internal user list, if it has not been updated the last @users_ttl seconds
  def update_users
    return if @users_updated > @users_ttl.ago

    @users = nil
    monitor_connection { @options[:from] = screen_names }
    @users_updated = Time.now
  end

  # Return array of users screen names
  #
  def screen_names
    users.collect { |user| user.screen_name.downcase }
  end

  # Array of Twitter users to store tweets from
  #
  def users
    @users ||= TwitterUser.all
  end
end

Put this in lib/twitter/tweet_collector.rb

You can put it all together in a rake task. Create lib/tasks/twitter/bot.rake with this:

requrie 'twibot'
require 'twitter/tweet_collector'

namespace :twitter do
  desc "Run the tweet collecting Twitter bot"
  task :bot => :environment do
    bot = Twibot::Bot.new(Twibot::Config.default << Twibot::FileConfig.new, true)
    bot.add_handler :tweet, TweetCollector.new
    bot.run!
  end
end

Running the bot is a simple matter of

rake twitter:bot

Extension ideas

Remember that you can use the first parameter to Twibot::Handler#initialize to provide a string or regex pattern to only match certain tweets. Refer to the Readme on how to use these.

The bot I'm currently uses also employs a similar system to update locally cached Twitter user data. Every few hours, when the bot receives a tweet, it looks up the locally cached user data, and updates attributes from the remote Twitter::User object.

Hopefully, this inspires you to create some stateful bots. If you have any trouble with Twibot, this example, or want to help out in any way, do get in touch. I'd also love to hear back from anyone using Twibot - let me know what you're doing with it!

Possibly related