Here at Salsify, many of our customers regularly import large amounts of tabular product data into the system. This data needs to be sorted prior to being handled by different parts of the import process. Since we are running on Heroku, memory is a scarce resource. Sorting these arbitrarily large tabular data files requires great care. Reading all of the data into memory at once can result in extremely long execution times due to increased pressure on the Ruby garbage collector and can starve other processes on the same system of memory.
We needed a way to sort large files using a predictable amount of memory. Big data technologies like map reduce were overkill for our data scale. Creating an offline-sort gem to do this turned out to be quite the adventure and forced me to dig deeper into how Ruby manages memory, ultimately requiring a specialized heap implementation.