Garbage collection is a key component of many modern programming languages, including C#. It’s even hard to imagine what programming would look like in C#, and other modern languages like Java, Ruby, and many others, without this tool.
Despite being a valuable asset that makes a better programming experience, garbage collection can still give you a hard time, specifically with performance.
With that in mind, what can a C# developer do to ensure that C# garbage collection acts as a friend instead of a foe? How can you write code in such a way that you reap all the benefits of this tool without suffering from any of the issues it can cause?
That’s what today’s post is all about.
What is garbage? It’s something that was once useful, but it’s not anymore (like a broken device). Or it might be residues from some activity (vegetable peels, for instance?) In short, garbage is things you want to get rid of, because it wastes space or potentially can cause harm.
Guess what? Our programs also generate garbage. Think about objects that were created, performed their jobs and are now useless, but are still there occupying valuable space in memory. Shouldn’t we get rid of them? We should indeed, and this process is sometimes called “memory management.”
Older languages required developers to manage memory manually. They’d have to mark objects that were no longer in use as dead or inactive, freeing the memory used by these objects as available for the program.
The problem with this is that manually freeing objects could be an extremely hard and error-prone process. Failure in terminating obsolete objects often resulted in memory leaks. Terminating non-obsolete objects, on the other hand, could result in runtime errors and inconsistent behavior. In short: a real pain in the neck. Manual memory management prevented developers from fully focusing on the business logic of whatever applications they were writing. Instead, it put them in a constant state of worry. Talk about cognitive load!
In response to those problems, garbage collection was created. So here’s our definition for garbage collection:
Garbage collection is an automated process that is able to figure out which objects are no longer needed and get rid of them, freeing space in memory.
With the definition bit out of the way, we can move on to learn more about garbage collection in C#. But to fully appreciate the traits and properties of C# GC, we have to first dive a little deeper into the different types of garbage collection that exist.
The following two types—reference counting and tracing—are by no means an exhaustive list. They’re simply meant to give you an overview of the main types of GC, so later you can understand how C# garbage collection fits into the bigger picture.
It’s also important to keep in mind that these are overall strategies for GC. Each of them can be implemented through a variety of algorithms, which can vary wildly in terms of performance and other characteristics.
Reference counting, as the name suggests, is the process of counting all the references that point to a given object. Every object in the program has a field that holds the number of references pointing to it. Every time a new reference is created, the count is increased. The inverse is also true—every time a reference ceases to exist, the count is decreased. When the count for a given object reaches zero, that means the object is inaccessible. In other words, it’s garbage, and thus can be reclaimed by the collector.
Reference counting has advantages and disadvantages. Its primary advantage is that objects can be reclaimed as soon as their counting reaches zero. This way, each object has a defined lifetime, and collection can occur without long pauses, which can make for better responsiveness in the application.
Now, for the disadvantages. Reference counting obviously requires a lot of frequent updates. Sometimes the collector will claim a single object, triggering a chain reaction whose effects will reverberate throughout the whole application. In addition, this approach requires extra space to store the reference count for each object in the application.
Finally, reference counting has trouble dealing with reference cycles. Think about an object that references its children, which, in turn, reference the parent back. Such objects will have a ref count greater than zero, preventing them from being collected, even if they’re inaccessible for external objects. There are approaches that can handle this issue, but at the cost of adding more overhead and complexity.
The other main overall strategy for garbage collection is tracing. This approach basically consists of determining which objects are reachable, following a path of references that begin with certain root objects.
To greatly simplify the process, we could say it works like this: GC accesses a root object. It marks it as active, then proceeds to access the objects the root objects point to, marking those active as well. It repeats these steps until all the reachable objects have been reached. The collector then marks all the objects it couldn’t get to as unreachable and claims them.
C# garbage collection belongs to the tracing variety. It’s often called a generational approach since it employs the concept of generations to figure out which objects are eligible for collection.
Memory is divided into spaces called generations. The collector starts claiming objects in the youngest generation. Then it promotes the survivors to the next generation. The C# garbage collection uses three generations in total:
According to the Microsoft docs, the following information is what GC uses to determine if an object is live:
Before the start of a collection process, the collector stops all threads, except for the one responsible for triggering the collection. Then, the collection happens, following these steps:
Developers new to GC will sometimes ask, “When is it appropriate to force the collection to occur?” And the answer is: (almost) never. Think about it. The whole point of this GC thing is to free you from having to manually manage memory. Wouldn’t it be self-defeating to make you manually trigger the collection process?
“Fine,” you might say. “But is there anything I can do about all that? Is there some way to write code so I don’t put unnecessary stress on GC?”
Sure, there are some things you can do. You’ve just learned that generation 0 is the place where collection happens most often. And which kinds of objects live there? You guessed it: short-lived ones. So, one way to avoid putting additional pressure on the GC is to avoid excessive memory allocations, especially objects you know will have short lives.
You can also use structs. These are value objects and, as such, live on the stack. By using structs when it makes sense to, you avoid extra allocations that put more pressure on the GC.
One of the most famous aphorisms in computer science is “Premature optimization is the root of all evil.” Donald Knuth at his best.
People often misinterpret this quote, but the way I see it, it means this: Don’t just go out doing things because you think you’ll get a performance gain. In the best-case scenario, the gains will be negligible. In the worst case, you might inadvertently harm performance and decrease code quality.
Instead, learn to measure. There are tools—such as Stackify’s Retrace—that let you easily track your application’s performance. When, and if, you notice something shady going on, that’s the time to act.
The performance itself is far from being the only thing you should monitor, though. You’d be wise to also track the garbage collection process itself. When your app throws an OutOfMemory exception, for instance, this could be a sign of memory leaks.
Such errors deserve investigation and Retrace can help you with that, since it can show you how many times the collection process takes place in a single minute and also the average duration of each collection.
If you would like to be a guest contributor to the Stackify blog please reach out to stackify@stackify.com