The Sad/Amazing History of a Ruby[Gems] bug

In software development bugs may sometimes live for a very long time before being really fixed. Even when you think that you've fixed them, it may happen that they suddenly re-appear somewhere else. Bugs are like viruses in a sense: they can appear, fall asleep, move, mutate, and so on. I've found such one in the ruby community recently. His story is amazing and his consequences somewhat relevant for the ruby community, so I've decided to spend a day tracking and documenting it.

The signature of the bug is easy to recognize, you simply end up with a ruby error with the following message:

/usr/local/lib/site_ruby/1.8/rubygems/requirement.rb:109:
  in `hash': bignum too big to convert into `long' (RangeError)

If you have found this post recently (around March 2011), it's very likely that you've encountered this bug in one of the following situations (if not the case, it could be the very same bug but with some mutation; read on, fixes are probably the same):

  • When running bundler install [--deployment|--local]
  • When installing a few ruby gems at once via third party tools (capistrano, rails, ...)
  • When phusion passenger attempts to start your rails/sinatra application

Temporal bug review

The pristine bug is already old (about 2 years) and is due to a Fixnum -> Bignum overflow when computing a hash code, especially on arrays. It seems that it has been discovered in REXML and first reported on the ruby-lang bug tracker the 05 of August 2009. Then it has been quickly fixed. More precisely it was already fixed when the issued has been created.

Anyway, similar bug reports rapidly appeared in rubygems bug tracker, namely the 20 of August 2009 and September 21, 2009. Note that a first bug mutation occurs there. Indeed, Gem::Specification.hash is incriminated because it may return a Bignum in certain cases. The link with ruby's pristine bug remains clear and fixes are suggested to fix the bug independently of ruby, that is, in rubygems itself. Unfortunately, root cause analysis has not been well applied, and the bug has not been fixed exactly as suggested.

After that the bug seems to fall asleep and, according to Google search, remains silent during about one year at least if you consider the main bug trackers in the ecosystem. However, during this very same year the bundler project becomes more and more mature. And the 29th August of 2010, the first stable 1.0.0 version is officially released together with Rails 3.0.

The bug mutates and re-appears in only two days, with an new entry in bundler issues on September 1, 2010. Users quickly perform root cause analysis, re-discover the bug report on rubygems then on ruby and close the issue on bundler itself.

Since that time, the bug report on rubygems gains a discussion from October to December 2010, a first request appears on rubygems help (September 21, 2010) quickly followed by a second one (November 20, 2010), a rails-related question is asked on stackoverflow (November 28, 2010), as well as a mail on bundler's mailing list (December 14, 2010) and another question when installing rspec 2.5.0 on stackoverflow (February 14, 2010).

The most recent bug report on rubygems has been closed the 1th of February 2010 with a message "This is fixed." and no particular patch applied.

Affected ruby[gems] versions

The issue has been fixed in ruby 1.8.7 in Revision 25661 on November 5, 2009 by merging changeset r22308. Strangely, the Changelog file of ruby's 1.8.7 does not contain any information about changesets applied between the 08 of April and the 26 of May. Therefore no trace of this fix has been officially kept. However, a quick bisection (thanks to rvm in passing) shows the following:

  • ruby-1.8.7-p160 [ i386 ], KO, April 18, 2009
  • ruby-1.8.7-p174 [ i386 ], KO, (no official announcement)
  • ruby-1.8.7-p248 [ i386 ], OK, December 25, 2009
  • ree-1.8.7-2011.03 [ i386 ], OK, March 01, 2011
  • ruby-1.9.2-p136 [ i386 ], OK, December 25, 2010

In spite of bug reports and patches introduced in rubygems bug tracker, no patch has been officially applied to rubygems itself so far, probably due to a lack of material to reproduce the bug and write a regression test. Therefore, unless your ruby version is greater or equal to ruby-1.8.7-p248, you are affected by the bug. In other words:

  • ruby 1.8.7-p174 + rubygems 1.5.3, KO

How to fix it?

At the time of writing, the only clean way to fix this bug is to upgrade your ruby version. As you probably know, if you are under Debian/Ubuntu and would like to rely on an official ruby package (at least for ruby itself), then ... hum ... you'll probably have to wait.

Now that a dissection of this bug has been made, I personally hope that a fix will be available in rubygems itself for users relying on Debian packages on production servers. Alternatively you can patch your rubygems version as follows (see rubygems bug tracker for details):

blambeau@aello $ pwd
/usr/local/lib/site_ruby/1.8/rubygems

blambeau@aello $ diff -u requirement.rb.orig requirement.rb
--- requirement.rb.orig 2011-03-01 19:03:20.000000000 +0100
+++ requirement.rb  2011-03-01 19:03:41.000000000 +0100
@@ -106,7 +106,7 @@
   end

   def hash # :nodoc:
-     requirements.hash
+     requirements.inject(0) { |h, r| h ^ r.first.hash ^ r.last.hash}
   end

   def marshal_dump # :nodoc:

Conclusion

I won't spend too much time concluding about all of this. Bugs happen are sometimes hard to understand and fix. However, I can't resist having a personal lecture of all of this in the light of somewhat recent discussions about ruby packaging and compatibility issues.

The fact that old bugs suddenly resurrect certainly means that users rely on ruby versions that important committers in the ecosystem already consider as 'oldies'... If OPs should certainly migrate (but need support to do so), committers should take care.

Links