blog/Ruby 2.1 Garbage Collector

    While building my twitter bot I ran into a problem with very poor performance. It turned out that this was caused by Ruby’s garbage collector, which by default runs after every 8 to 32MB of malloc() calls. Since I was dealing with single strings that were ~400MB in length, this was not appropriate. So, I discovered that I could alter this behavior by setting some environment variables:

    export RUBY_GC_OLDMALLOC_LIMIT=1503238553
    export RUBY_GC_OLDMALLOC_LIMIT_MAX=1503238553
    export RUBY_GC_MALLOC_LIMIT=1503238553
    export RUBY_GC_MALLOC_LIMIT_MAX=1503238553
    export RUBY_GC_HEAP_INIT_SLOTS=200000000
    export RUBY_GC_HEAP_FREE_SLOTS=2000000

    which improved performance by an order of magnitude. With these settings, Ruby’s GC runs after every 1.5GB of malloc()s. However, you cannot set these limits any higher than about 1.5 GB because the GC uses a signed 32-bit int counter internally, and it uses a formula like limit = limit * 1.4 and then checks if limit > max. So, above 1.5GB, limit will overflow and become negative!