The difference of length, size and count in ActiveRecord associations

Even experienced Ruby on Rails developers sometimes seems to forget the subtle difference between those methods.

The goal here is to count how many objects are there, persisted in database, for an association.

We have a few methods to do that, and all has pros and cons.

Given that:

class User < Account # let's forget I'm using STI in this example
  has_many :tracks

Let’s fire up a console and do some testing.

user = User.find(2)
  User Load (0.6ms)  SELECT `accounts`.* FROM `accounts` WHERE `accounts`.`type` IN ('User') AND `accounts`.`id` = 2 LIMIT

collection#count

user.tracks.count
 (5.6ms)  SELECT COUNT(*) FROM `tracks` WHERE `tracks`.`account_id` = 2
 => 3758

This method runs a query. More specifically a COUNT query, which is not bad if we need fresh data.

collection#length

user.tracks.length
 Track Load (299.5ms)  SELECT `tracks`.* FROM `tracks` WHERE `tracks`.`account_id` = 2
 => 3758

Uh.. Apparently this bad boy just loaded all the tracks for the given user in memory and counted them. Not good. Go check your code now if you are doing that.

collection#size

user.tracks.size
 => 3758

No query run. That is because in the accounts table I have set up a ‘tracks_count’ integer field and configured it to be used as counter_cache in Track model:

class Track < ActiveRecord::Base
 belongs_to :account, counter_cache: true # :account and not user because I'm using STI

Thoughts

Having a counter_cache column greatly improves the object count performance when we need this data because obviously it is cached directly in our table.

However it’s the way in which counter cache works that doesn’t makes me feel confortable at times, mainly because of the way it is calculated.

When ActiveRecord wants to update a counter cache it initializes the attribute to 0 if it’s NULL and then adds (+1) or subtract (-1) to the the actual value of this field.

While this may not be a bad thing per se I believe it doesn’t provide *real* data consistency.

In my eyes it would be much better, but with a performance penalty, to write out counter caches values that actually are the result of a fresh count.

In any case I kind of agree with ActiveRecord developers doing this +- game here, because it’s fairly easy to change this behaviour and code your custom counter caches with real SELECT COUNT(*) should you feel doing so.

I occasionally do that when I want to be absolutely sure the value has not been corrupted by a previous breakages, failed transactions or squirrels playing with power plugs.

The strategy you want to employ definitely should be tailored to the importance given to the counters in your project.

For example if those counters are used for billing I’d definitely go with a custom counter cache or another type of mechanisms (daily cron?) that guarantees the counter fields are always up to date.

It’s just another thing to be aware of and this post is a friendly reminder.

$1.99 domains with SSL purchase!

How to recover memory on Mac OS X Lion for free

Since I upgraded to Mac OS X Lion I’ve seen what I consider an abnormal rise in the App Store of micro utilities to recover memory.
Judging from some screenshots those small applications resides in the upper tray and if clicked will present a button that when pressed it start some procedure to gain back memory and make it available again for the operating system to use.

My late 2008 MacBook Pro has “only” 4GB of RAM and is suffering a lot in the memory department.
While it’s true that developing scalable web applications should not consume too many resources, all the hope is lost when you call your army of Virtual Machines to do your bidding.

I went for a fresh Lion setup and installed only a handful of applications specifically ready for Lion along with all the available updates.
Compared with the previous Snow Leopard setup, now happily living in an external disk thanks to SuperDuper!, Lion feels slow, and clearly, just by looking at the Activity Monitor, one of the bottlenecks here is the RAM.
I’m currently waiting for my 8GB upgrade and see if the situation improves (if it will work of course).

The reason for this post is that I noticed that every major app on my fresh install is taking up to 200MB+ or more, without even including the hungry and infamous the-more-I-have-the-more-I-want kernel_task, which does seem to raise a lot of confusion on official Apple forums.

I’ll illustrate in Lion how to achieve the same benefits that those applications provide, in a set-and-forget mode without spending a dime.

Using the technique below I noticed significant gains, you can verify them aswell by opening the Activity Monitor.app, right clicking the icon in the Dock and setting it to show the Memory.
Keep an eye on the green part of the pie chart when the script runs.

First, select the text editor of your choice, I personally use Textmate, and create a new file:

mate ~/Library/LaunchAgents/org.icoretech.purge.plist

For the bash novices the tilde (~) sign indicates your home directory, usually /Users/yourname

Copy/paste this code:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>org.icoretech.purge</string>
  <key>Program</key>
  <string>/usr/bin/purge</string>
  <key>StartInterval</key>
  <integer>3600</integer>
</dict>
</plist>

Save the file (CMD+S will suffice) and close the editor.

What we just did is to build a plist file that instructs our system to run the value of the Program key (/usr/bin/purge) every 3600 seconds.

Open up a Terminal and write this:

launchctl load -w ~/Library/LaunchAgents/org.icoretech.purge.plist

Now, this job will run every hour, you can tweak the number of seconds by changing the StartInterval integer number (in seconds) and it will free up your memory, you said it, for free.

To stop it:

launchctl unload -w ~/Library/LaunchAgents/org.icoretech.purge.plist

A downside of this is that the purge command will kinda freeze your mac for few seconds when it runs. It can be annoying at times.

Rare Rails pills before the jump to Rails 3

This post is dedicated to the odd stuff actually present in Rails 2-3-stable and some gems/plugins.

It’s written in hurry, I’ll probably write another post detailing my experience in migration a big project from Rails 2 to 3; I can’t tell when it will happen but hopefully soon.

1) find_each mantains the scopes forever.

Suppose you have a normal AR callback on a Class (Track in this example) and some named_scope.
Take a look at this code:

tracks.marked_for_deletion.find_each(:batch_size => 100) do |trk|
trk.destroy
end

When in your callback you want to execute some basic AR finders, they will keep any scope defined in the find_each constructor, in this case user and marked_for_deletion.
Not fun but it works this way, a normal find doesn’t retain scopes.

2) to_json does not support :procs.

In the rare occasion that you need to support Proc in your custom to_json you’re out of luck.

3) Use VCR if your tests needs an internet connection.

VCR is an awesome gem that in my opinion surpassed any other tools in its category, you can find it here: http://github.com/myronmarston/vcr

4) Less dependencies

The road to upgrade a big app to Rails 3 is not exactly straightforward, especially if you made use of ‘hacks’.
But hacks are the lesser problem when you are tied to plugins/gems that breaks every now and then.
I learned to stay away from gems such as searchlogic and formtastic, they breaks with every Rails upgrade (even stable) and reported tickets doesn’t get looked at if the problem you are describing is odd or no one else confirms.
Being an edge user I do my homework and whip up a patch if possible, but sometimes it just gets ignored because “we aren’t targeting this rails release yet”.
Forms are boring to write, but if you write them “the rails way”, maybe with a custom builder, you know for sure that they will always work.

5) TDD is very useful, but..

I bet that even amongst the TDD evangelists there is someone that doesn’t always write code in advance.
I’m not sure if I’m the first to point out the obvious (or the most courage one, admit it), but you can’t always know where your code is going.
Sometimes you are in hurry and write tests later, sometimes you do not write tests at all because the thingie is super-easy (or at least it seems) that it’s not important if you don’t write a test immediately, it can’t break.
Well, it will.
So, my advice is not to lose your sanity and write always tests first (especially when you know your app is complicated and not a collection of scaffolds), but please do write them.
If you know that you are getting behind with your tests take some hour to do them, run rcov, which should indicate where you need them the most, but don’t accumulate too much code not covered, it’s guaranteed that it will backfire horribly.

6) If you are using paperclip and you need reliable content-type detection..

You are not so lucky. Paperclip is a cool plugin/gem but for some reason when I want to extend the code I lose my sanity. Now after one year of using it extensively I know it rather well, but for some reason from the one million ticket Thoughtbot have on this problem very few gets integrated.
Reliable content-type detection is a pain. I found the best workaround examining all the paperclip forks and merging my changes.
If you want to take a look just search masterkain on github and look at it.

7) Shoulda changed heavily

The new rspec-matcher syntax is pretty cool. However that required me a solid day of work to convert everything; if you intend to keep it around better start now.

8) Bundler and :git repositories

At the time of writing Bundler have a small problem, if you run Passenger in development bundler cannot find gems you included through :git.
And yes, I use bundler on rails 2-3-stable, it comes with some pain but at least less stuff to upgrade when making the switch to Rails 3.

Install nokogiri and libxml2 on Snow Leopard

This gets me everytime.
You have installed libxml2 through MacPorts but Nokogiri won’t compile, even with complex command line arguments.

Solution:

sudo port upgrade --enforce-variants libxml2 +universal
sudo port upgrade --enforce-variants libxslt +universal
gem install nokogiri -- --with-xml2-include=/opt/local/include/libxml2 --with-xml2-lib=/opt/local/lib --with-xslt-dir=/opt/local

Now I can install capybara or webrat:

gem install capybara

Rails users < membership > groups? Enter Workflow

I bet there are a fair numbers of plugins that does the user/groups membership relations, however I’m writing a quick method here because I would like to introduce you a better state machine: Workflow plugin for Ruby on Rails.

All this is tested against Rails 2.3-stable (git)

Do not fear join models; they may feel dirty at start, because we like the idea of never touching them by doing group.users, but here’s an example where touching a join model isn’t a bad idea.

I wrote this code in 10 minutes, so excuse me if it’s not highly optimized, the point really is to just illustrate Workflow.
Let’s start with a basic User and Group model:

class User < ActiveRecord::Base
  has_many :group_users, :dependent => :destroy
  has_many :groups, :through => :group_users
end
class Group < ActiveRecord::Base
  has_many :group_users, :dependent => :destroy
  has_many :users, :through => :group_users
  # Use pure AR.
  has_many :founders, :class_name => 'User', :conditions => ['group_users.role = ?', 'founder'], :through => :group_users, :foreign_key => 'user_id', :source => :user
  has_many :moderators, :class_name => 'User', :conditions => ['group_users.role = ?', 'moderator'], :through => :group_users, :foreign_key => 'user_id', :source => :user
  has_many :members, :class_name => 'User', :conditions => ['group_users.role = ?', 'member'], :through => :group_users, :foreign_key => 'user_id', :source => :user
  has_many :waiting_users, :class_name => 'User', :conditions => ['group_users.role = ?', 'waiting'], :through => :group_users, :foreign_key => 'user_id', :source => :user
  has_many :banned_users, :class_name => 'User', :conditions => ['group_users.role = ?', 'banned'], :through => :group_users, :foreign_key => 'user_id', :source => :user
  has_many :active_users, :class_name => 'User', :conditions => ['group_users.role IN (?)', %w(founder moderator member)], :through => :group_users, :foreign_key => 'user_id', :source => :user
end

Very basic. Watch out the :dependent => :destroy code in User model because you may have special logic when deleting a User. For example you may want to force the User to disband a Group first if he’s the only founder left.

Also note that I went with GroupUser to have less problems with inflections and class names, however you can use what you prefer.

But let’s take a look at the join model, shall we.

 
#  id         :integer(4)      not null, primary key
#  group_id   :integer(4)
#  user_id    :integer(4)
#  role       :string(255)
#  created_at :datetime
#  updated_at :datetime
class GroupUser < ActiveRecord::Base
  include Workflow
  ROLES = %w(founder moderator member waiting banned)
 
  belongs_to :user
  belongs_to :group
  validates_presence_of :role, :user_id, :group_id
  validates_inclusion_of :role, :in => ROLES
 
  named_scope :founders, :conditions => { :role => 'founder' }
 
  workflow_column :role
 
  workflow do
    state :waiting do
      event :accept, :transitions_to => :member
    end
    state :member do
      event :ban, :transitions_to => :banned
      event :promote_to_moderator, :transitions_to => :moderator
    end
    state :banned do
      event :accept, :transitions_to => :member
    end
    state :moderator do
      event :demote_to_member, :transitions_to => :member
      event :promote_to_founder, :transitions_to => :founder
    end
    state :founder
  end
 
  def ban(committer = nil) # an event can accept optional parameters
    # We can't ban anyone except members.
    halt! "Cannot ban this user, has role #{current_state}" unless member?
  end
 
end

A state machine like Workflow is composed of many parts, the most important being states and events.

Workflow is not ActiveRecord dependent and also works with CouchDb, however it has a nice integration and saved states will be immediately accessible in our object instance.

Workflow like other state machines enforces the state of an object, so when you try to change state on an object where an event isn’t defined it will throw an exception. We can then smartly intercept this exception and reduce clutter in our code by using rescue.

# group.rb
 
  # Returns the membership object or a user towards this group.
  def membership_of(user)
    group_users.find(:first, :conditions => { :user_id => user.id })
  end
 
  # Quick helper to check if the user have the passed role in group.
  #   @group.role_is?(current_user, 'member')
  def role_is?(user, str)
    (membership_of(user).current_state.to_s == str) rescue false
  end
 
  # Ban a user from a group.
  # The entry will be preserved in GroupUser table.
  def ban(usr)
    begin
      usr = User.find_by_username!(usr) if usr.is_a?(String) # this is entirely another exception, add other rescue based on your app.
      membership_of(usr).ban!
      true # weak point, a state operation will return nil.
    rescue Workflow::NoTransitionAllowed => e
      false
    end
  end

Then in controller in your restful actions you can easily do if @group.ban!(user) ; else ; end

Workflow provides a lot of hooks to enter the lifecycle of the object.
I placed the state machine right into the join model, resulting in a lot of flexibility.

You can perform code when entering one or all events, right before a transition and so on.
For further informations consult the plugin’s github page, it is very informative and helpful.

Use Amazon Cloudfront when appropriate.

Every project that starts growing needs to track down performance issues and bottlenecks, audiobox.fm is no exception.

However there are cases when the cure is worst than the disease, this is one.

We do streaming, a lot of it, the entire concept of audiobox.fm revolves around streaming your media over the Internet and it should be fast as possible.

A solution was needed to store and stream this content privately, after some evaluations we eventually settled on Amazon S3 and things shaped up pretty nicely from then, instant streaming even from Europe by fetching data contained in an US bucket.

I then received an email from Amazon, in which they explained they opened up CloudFront, a content delivery network working over HTTP and RTMP.

The advantages over the traditional fetch-from-bucket system is that the content is served from the nearest cloud to the requesting user, thus greatly reducing latency.

It’s all fun and games until I noticed that the streaming was actually slower then fetching from the US bucket. It may be that the servers are suffering a heavy load, but I think there’s more.

I started digging around and I think I have some explanation to that, making audiobox a non-use case.

A bit of background

The doubts in using CloudFront started when I was investigating the possibility to use this CDN as an asset server, serving css/javascript to the end user; however I did noticed that many users on Amazon forums started to ask why their assets were not in sync with the actual content of the bucket after an update.

The answer is simple, while CloudFront fetches from the S3 bucket, it does cache the file, which is in fact the purpose of a CDN.

It’s not possible, at the moment, to manually expiry a file to be re-fetched from the bucket, instead the developer is asked to either:
- wait 24 hours
- rename the asset

So?

While many developers try to find a way to expire their assets we have the opposite problem, we would like to see them there stored forever (or at least for a long time).

When making an initial request to CloudFront, the system checks the existance of the file on the bucket and then it gets transferred using the internal Amazon resources to the nearest point.

If I ask a file when I’m in Europe, CloudFront will fetch the file from the US before serving it to me, thus adding a overhead in our request.

A 25MB file needs to travel through Amazon internal network, getting stored on CloudFront server and then served to me.

Now, this solution is super-fine when streaming content to the general public, let’s say a video, because it will be requested multiple times and chances are that the file is already in the geographically located CloudFront distribution.

But for a private collection of audio, say 1000 files, this is impratical because records will expire in 24 hours. Files don’t get stored on the CDN for more. There will always be a “fetch-again” from the US bucket, adding the extra overhead.

We will continue monitoring and testing CloudFront, but for our use there are disadvantages:

- slower streaming for a private, single-user cloud
- sometimes stream starts when the file has been full downloaded
- copy on CDN is useless because chances are that if the user wants to listen to a file again (in the 24 hours range) browser cache will help there, thus making no request to the CDN, rendering useless its purpose
- effective cost (CloudFront is not free)
- coupled with the fact that Safari/Webkit suffers a HTML 5 bug where the audio and video tag src gets requested twice (even three times sometimes) it’s killer

The ideal solution would be that CloudFront proactively mirrors buckets in every of its geographic location, but that will never happen for many reasons.

Installing MySQL Ruby Gem in Snow Leopard Server

Snow Leopard Server doesn’t come with MySQL client and library files.
In order to get them we need to download a package, for Snow Leopard Server 10.6.2:
http://www.opensource.apple.com/other/MySQL-49.binaries.tar.gz (71.2MB)

The main entry page should be http://www.opensource.apple.com/release/mac-os-x-1062/ but I cannot find the package there, getting the url was guess work from reading an Apple support page: http://support.apple.com/kb/TA25017

What we are doing is to uncompress a archive directly in our root.
Once you get the file you should decompress it unless your browser already did for you.
Since Safari did it for me I’m having MySQL-49.binaries.tar in my Downloads folder.

Open up a terminal.

cd Downloads
sudo tar -xvf MySQL-49.binaries.tar -C / # use -zxvf if you still have the .tar.gz file

We should have the required libraries in order to install the MySQL Ruby Gem.

You may have already read the previous articles about MySQL and Ruby, however the procedure is different:

sudo env ARCHFLAGS="-arch x86_64" gem install mysql

This should give you a working installation of MySQL Gem.

Hosting Git repositories through WebDAV on Snow Leopard Server

This will be quick and dirty; the necessity comes from a particular source code that I can’t trust to host online to a third party vendor/provider.

I have a Snow Leopard Server, so I will make use of the tools provided, there’s little information about the subject, so here it is.

Also I choose WebDAV since my SSH is blinded to the outside world, in order to access it I require a VPN connection, but since I want to share some repo to the outside world, WebDAV comes handy.

I host my repositories in a subdomain, so before anything, open up your DNS Server Admin section and start editing by adding your new shiny hostname as you please by adding a new record there.
I assume you are using a subdomain along the lines of repositories.yourdomain.com; you don’t have to do this step if using an ip address, this is up to you.

Login into your server through ssh or the way you like and create a repositories index directory somewhere.
In my particular setup I have this structure for my websites:

/Sites/kain/mydomain.com/[www, macruby, etc]

So go where you like and create a directory, I’ll use repositories

mkdir /Sites/kain/mydomain.com/repositories

Assign this directory to the _www user

chown -R _www:_www /Sites/kain/mydomain.com/repositories
cd /Sites/kain/mydomain.com/repositories

Start creating a new directory that will host the git repository:

mkdir myproject.git

After that go in the Web section, and create (or copy) a new website.

Relevant General setup:
Host name: repositories.yourdomain.com
Web folder: /Sites/kain/mydomain.com/repositories

Relevant Options setup:
Folder listing: enabled (not really necessary but useful for testing what you see in index later)
WebDAV: enabled
Rest disabled

Relevant Realm setup:
create a new realm by clicking the plus sign
Realm name: myproject
Authentication: Basic (will try Digest/Kerberos in future)
Select Location for the select widget instead of Folder and write

/myproject.git

In the right section now add your users that will have access, after that change their permissions.
Since my repos are private I only keep the users that have write access there by setting their permissions to Browse and Read/Write WebDAV, None for Everyone.

Relevant Logging setup:
enable logging as you please

Relevant Security setup:
I choose to use SSL for this tutorial, so you are going to use a certificate and port 443.

Save and close Server Admin.

Let’s go back to our SSH server session.
Enter your project directory and exec those commands:

cd /Sites/kain/mydomain.com/repositories/myproject.git
sudo git init --bare
sudo chown -R _www:_www .
sudo mv hooks/post-update.sample hooks/post-update
sudo git update-server-info

Now switch to your client.

Edit your ~/.netrc file with those informations:

machine repositories.mydomain.com
login myusername
password mypassword

Enter your project dir and exec those commands:

cd myproject
git config --global http.sslVerify false # If you don't own a valid SSL certificate on your server, skip --global to enable this option only for this project
git init # gitify the directory
git add . # add files to git repo
git commit -m "hi" # first commit
git remote add origin https://repositories.mydomain.com/myproject.git/ # end slash is important
git push origin master --force -v

Now edit this file, always in your project root dir:

mate .git/config

And add

[branch "master"]
	 remote = origin
	 merge = refs/heads/master

This way you can do git push and git pull normally.

Done.

AudioBox.fm

Hi readers,
I’m putting together a new project, dubbed audiobox.fm.

It’s in active development; if you feel you can leave your email or follow the project on Twitter to be notified when doors open for beta stage.

I’m going to limit the initial access to the app, so hurry :)

Cheers.

My ToolBox

Being a computer junkie I usually use a certain number of tools, ranging from productivity to my amusement. Here’s a generic list.

Hardware
MacBookPro5,1 Intel Core 2 Duo 2,53 GHz 4 GB RAM
Time Capsule 1TB
Random Logitech Mouse

Development
TextMate
Safari
Firefox
Xcode and Interface Builder
Terminal.app
Navicat Premium
phpMyAdmin
phpPgAdmin
Sequel Pro
Parallels 5
GitX

System Utilities
Server Admin and Workgroup Manager
Angry IP Scanner
SuperDuper!
DiskWarrior
WhatSize

Imaging
Adobe Photoshop CS3
Skitch

Video
VLC
ReduxEncoder
IShowU HD
HandBrake

Networking
ForkLift
Tweetie
Skype
Transmission
X-Chat Aqua
Adium

Music
iTunes
CoverSutra
XLD
Mp3/Tag Studio (on Parallels) plus various home-made tools

Everyday Apps
Mail.app
Disco
AppCleaner
RAR Expander

Still something is missing, but you get the picture.