There are many things to praise about the Ruby language: adherence to the object orientation paradigm, simplicity, elegance, and powerful meta-programming capabilities. Unfortunately, performance isn’t top of mind when people think about the many qualities of the language.
For years, people have been denouncing the Ruby programming language as slow. Is it? Sure, some people will spread a fair amount of FUD around. That’s the tech industry. But there’s no denying that, at least in its main implementations, performance isn’t the area where Ruby shines the most.
In today’s post, we will cover explanations for Ruby performance issues. Then we’ll cover practical tips on what to do to improve the performance of your Ruby apps. Let’s get started!
Ruby performance is sometimes slow, and since we’re not happy with that, we must learn how to speed it up. To do that, we have to understand the causes of Ruby’s slowness. And that’s what we will do in this section.
Ruby has a high memory consumption. It’s an inescapable feature of the language that stems from its design. Remember when I said earlier that one of the most often-celebrated aspects of Ruby is its loyalty to the object orientation paradigm? In Ruby, everything (or almost everything) is an object. This characteristic, in the opinion of many people including myself, makes for code that feels more predictable and consistent, causing fewer “what the heck” moments for the reader. They often call this “the principle of least astonishment,” or PLA.
But this characteristic is also one cause of poor Ruby performance. Programs need extra memory to represent data—even primitives like integers—as objects.
Finally, Ruby’s GC (garbage collector) isn’t that great—at least in versions before 2.1. The algorithm for Ruby’s GC is “mark and-sweep,” which is the slowest algorithm for a garbage collector. It also has to stop the application during garbage collection. Double performance penalty!
Since you’re reading a post called “Ruby performance tuning,” I’m sure you’re looking forward to seeing performance tuning tips! That’s what we will see in this section. We’ll cover common Ruby performance problems a developer can face when writing Ruby code and what you can do to solve them—or avoid them altogether.
Reading an entire file at once can take a heavy toll on memory. Sure, the larger the file you’re reading, the larger the amount of memory needed; even relatively small files can put a lot of pressure on your program’s performance. And why is that? Simple: more often than not, reading the file is just the first step. After that, you’ll most likely want to perform parsing to extract the data found in the file and do something useful with it. This will inevitably lead to the creation of additional objects, which takes more memory and exerts even more pressure on the GC.
Let’s see a quick example. Consider the following excerpt of code:
require 'Date'
content = File.open('dates.txt')
years = content.readlines.map {|line|line.split("/")[2].to_i}
leap_years = years.count{|y|Date.leap?(y)}
puts "The file contains #{leap_years} dates in leap years."
This is a silly example, but should suffice for our needs. We have this file called “dates.txt,” which contains—you guessed it!—lots of dates. And they’re in the “dd/mm/yyyy” format, even though this isn’t particularly relevant to the whole performance thing.
The code loads the whole file to memory. Then we use the “readlines” method, which returns all the lines in the file as an array. After that, we map through each line, executing a code block that splits each line using the “/” character as a separator. Then we retrieve the last part of the result—which refers to the year—and convert it to an integer number.
It assigns the result of all of this to the “years” variable. Finally, we use the “count” method to determine how many of the years are leap ones, and we print that information.
The code itself is simple, right? But think about it: why do we need to load the whole file in the memory before starting the process? The short answer is “we don’t.” Since we’re doing the parsing and leap year verification on a line basis, we could—and should—retrieve the lines in the file one by one and then perform the parsing:
file = File.open("dates.txt", "r")
while line = file.gets
year = line.split('/')[2].to_i
if Date.leap? year then
leap_years += 1
end
end
Even this version still has room for further Ruby performance tuning, but that can be an exercise for the reader.
You already know Ruby, because of its design, allocates more memory. You’ve also learned that Ruby’s GC, particularly in the older versions, isn’t that fast. To overcome this important roadblock to better Ruby performance, we must use strategies to save as much memory as possible.
One strategy at your disposal amounts to changing objects in place. For changing an object, it’s common for Ruby to have methods that come in two versions. One version returns a new object with the desired modification, keeping the original object unchanged. The other version changes the original object, thus avoiding the extra memory allocation. The method that changes the original object usually has the same name as the version that returns, plus an exclamation mark.
Let’s begin by covering strings. In scenarios where you won’t need the original string after the modification, consider using the versions that change in place. With strings, you should always be careful when concatenating them. To be honest, this is a common performance tuning tip for many languages, not just Ruby. The most used way of concatenating strings will create new objects in memory, which is what we’re trying to avoid:
message = "I like"
message += "Ruby"
In situations like this, favor using the “<<” (append) method instead:
message = "I like"
message << "Ruby"
That way, Ruby won’t create a new string object, and you’ll avoid the extra memory allocation.
Strings aren’t the only objects that present this memory-saving opportunity. Arrays and hashes are also objects that you can modify in place. The reasoning here is the same as with strings: change the objects in place for situations where you’ll no longer need the original ones.
Ruby iterators can be a source of bad performance because of their intrinsic characteristics. First, the object being iterated won’t be garbage-collected until the iterator is done. The implications of this can be serious. Imagine you have a big list in memory and you’re iterating over it. The whole list will stay in memory. Have no use for the items already traversed in the list? Too bad. They’ll stay in memory just the same.
Here’s the second important point about iterators: they’re methods. This is so they can create temporary objects. What does that mean? You guessed it. It means more pressure on our friend the garbage collector.
Now we understand how iterators can harm Ruby performance, let’s see tips to counter that problem.
Suppose you have a large list of a certain object. You iterate over it and use each item in some calculation, discarding the list after you’re done. It would be better to use a while loop and remove elements from the list as soon as they’re processed.
There might be negative effects of list modification inside the loop, but you shouldn’t worry about them. GC time savings will outweigh those effects if you process lots of objects, which happens when your list is large and when you load linked data from these objects — for example, Rails associations.
When dealing with iterators, algorithmic complexity matters a lot. Each millisecond you can shave off counts. So when using iterators, you can avoid certain methods known to be slow. Instead, search for alternatives that can give you the same result without the performance penalty.
The Date#parse method is bad for performance, so avoid it. One solution is to specify the expected date format when parsing. So, suppose you’re waiting for dates in the “dd/mm/yyyy” format. You could do it like this:
Date.strptime(date, '%d/%m/%Y')
This will be considerably faster than what you’d get by using the Date#parse method.
Now let’s consider type checks.
The Object#class, Object#is_a?, and Object#kind_of? methods might present bad performance when used in loops. The checks can take about 20ms, in large-ish loops. That might not sound too awful, but those times add up. Imagine a web application that performs such a comparison millions of times per request. If it’s possible, it’s advisable to move the calls to those functions away from iterators or even methods that called a lot.
I’ll finish this post of tips on how to tune Ruby performance by advising you to… not write Ruby code all the time. Yeah, I know this might sound weird and defeatist, but hear me out for a moment. Ruby is an awesome language, and it’s a general, multi-purpose language. This means that, yes, in theory, there’s nothing stopping you from using Ruby to solve any kind of problem. But just because you can, does not mean you necessarily should.
The essential point of this section is to say while Ruby is a great language, it doesn’t have to be the sole tool in your tool belt. It’s perfectly okay for you to mix and match and use other tools in areas where Ruby doesn’t shine.
With that out of the way, let’s stop talking abstractions. Instead, we’ll offer examples of realistic scenarios where maybe Ruby isn’t the greatest choice, performance-wise. We’ll also talk about better alternatives.
Since the main implementation of Ruby is written in C, rewriting a slow part of your Ruby code in C is always an alternative when you’re facing performance issues. But there are better ways of leveraging the power of C’s raw performance than writing fixes yourself. How would you do that? Simple: by using gems written by third parties in C.
Some authors identify at least two types of those gems written in C. The first type would be gems that should replace slow parts of the Ruby language or even the Ruby on Rails framework. The other type refers to gems that implement specific tasks in C.
It’s not uncommon these days for developers—especially web developers—to ignore the most advanced capabilities of databases, using them essentially as glorified data storage tools. And that’s unfortunate since databases often have sophisticated ways of dealing with complex computations using large amounts of data. This shouldn’t be that surprising, but many developers miss out on those features.
They probably opt for the comfort of relying on abstractions such as ORMs in order to not have to deal with SQL and other complex and inconvenient facets of databases. But by doing that, those developers are renouncing the data processing capabilities offered by those database systems. Understanding how to best leverages SQL databases and ORMs is a key component to successful Ruby performance tuning.
Sometimes, when doing classification or ranking of data, you find yourself in a bad performance scenario, even after installing several gems and using them exactly as told. Maybe in this situation, the best solution would be to just perform the computation in SQL and be done with it.
Sometimes developers will struggle with their ORMs or other tools when often-neglected database features such as materialized views would be a better solution.
Today’s post featured basic tips you can apply to optimize the performance of your Ruby application. The list isn’t exhaustive by any means; rather, treat it as a starting point to help you understand some of the most common Ruby performance problems.
Once you get the right mindset, you will identify and troubleshoot not only the problems covered in this post but also problems in other areas.
You don’t have to face that journey all alone though. There are tools available that can help you troubleshoot performance issues in your Ruby and Ruby on Rails applications.
One of these tools is Retrace, a leading APM tool by Stackify. Some of Retrace’s features include:
This might be just what you needed to take the performance of your Ruby apps to a whole new level.
If you would like to be a guest contributor to the Stackify blog please reach out to stackify@stackify.com