Blog: Demystifying Ruby 3/3 : Memory Management
Garbage collection (GC) is an essential process in any programming language that manages memory for the programmer. In Ruby (this article is about the MRI), the garbage collector plays a crucial role in ensuring efficient memory usage and preventing memory leaks.
The Heap and the Stack
Memory in a Ruby program is divided into two main regions:
- The Heap: This is where dynamically allocated memory lives. Objects created at runtime, such as strings, arrays, and hashes, are stored here. The garbage collector primarily operates on the heap to free up unused memory.
- The Stack: This is where method calls, local variables, and control flow data are stored. The stack is managed automatically and follows a Last In, First Out (LIFO) structure. Each time a method is called, a new stack frame is created; when the method completes, its stack frame is removed.
Ruby’s garbage collector is responsible for:
- Memory Allocation: When a new object is created, Ruby allocates memory from the heap to store it.
- Garbage Collection: When an object is no longer referenced, it becomes eligible for garbage collection. The GC process reclaims memory from unused objects and makes it available for future allocations.
Two side of the same coin: The Tail (AKA Collection)
Let’s dive in first into Garbage Collection. When thinking about objects, we can represent them as a tree of sub-objects linked to a particular node known as the Root Set. Let’s consider this example.
data = {
person: {
name: "John",
age: 30,
address: {
city: "Paris",
country: "France"
}
}
}
In this code example, data
is stored on the stack while containing a pointer (reference) to a Hash object on the heap.
The hash can be described using this tree:
Since memory is limited, the goal of garbage collection is to scan memory for unused data chunks that can be freed and reused by the program.
Using this example, it our Ruby VM execute a line like this one
person[:address] = nil
This will be represented as
This is where the garbage collector comes in. The GC will look for memory chunks not linked to the RootSet (blocks highlighted in green) to mark those and collect them in order to free memory space. So that the memory looks like this, after collection :
There are plenty of strategies to get this done. The most basic one is mark and sweep. As the name implies, it first:
- marks: start from the root and follow edges, marking all reachable objects
then it
- sweeps: all objets that are not marked are freed
Rince and repeat, and voila, you have a garbage collector.
A Garbage Collector is a responsible for finding no longer used object and free the memory attached to it
One issue with this approach is that we need a static view of the memory space—even if only briefly—because we can’t perform garbage collection while allocating memory.
his requires a “stop the world” pause while performing GC tasks, the program cannot execute other logic. Stopping the entire program and scanning the complete object tree is slow and resource-intensive, suggesting we need a better approach.
To address the limitations of basic garbage collection methods like “mark and sweep,” Ruby employs a more advanced technique called generational garbage collection (GC). This approach is based on the observation that most objects in a program are short-lived, meaning they are created and discarded quickly.
By dividing objects into different “generations” based on their lifespan, generational GC optimizes memory management by focusing its efforts on newly created objects, which are more likely to be garbage, while less frequently scanning older objects that are more likely to persist.
This will reduce the number of node to visit since the older objects will not be visited as often as the object in the yonger ones.
A Generational Garbage Collection ensure that we focus GC efforts on short lived objects, scanning the stable less often to ensure a good balance between efficiency and performance
But there’s a hole in this—what if Gen 1 is pointing to something in Gen 0, like in this example. If we strictly scan Gen 0 memory space we don’t have the full picture to know that it’s actually referenced from a Gen 1 object and we for sure, don’t want to free it!
For this to work Ruby uses a Remembered Set, to track specific references from Gen 1 to Gen 0, meaning than when we check Gen 0, we also check the Remembered Set, in order, as the name suggest, not forget something, avoiding freeing a still in used object. Additionally, Ruby uses what we call a write barrier to track modifications during garbage collection. The write barrier monitors memory writes and updates the Remembered Set if needed.
A Rembered Set ensures that when scanning young generation memory space, we still have some hints about the existing link with older generation, to ensure we do not free an object referenced from the older generation
Is this sufficient? Partially. As mentioned earlier in this article, scanning the memory space requires a * Stop-the-World* pause for the entire duration of the process to ensure proper memory cleanup. Rails applications, by nature, tend to consume a significant amount of memory. The larger the memory footprint, the longer the pause for garbage collection.
But what if we could make garbage collection interruptible? If we could pause and resume the GC at will, we’d have much better control over execution time, ensuring we don’t spend excessive time collecting garbage when an HTTP request needs processing. This is where tri-color marking GC comes in, allowing incremental garbage collection while minimizing application pauses.
and black (live, retained in memory). This approach enables incremental garbage collection, making it * interruptible*, so the process can pause and resume as needed, reducing long application pauses. Of couse, as in the previous example, there is some new corner cases we need to handle :
- what if a new object is created while GC is processing but paused: mark them directly as black object (using a write barrier as in the remembered set)
- what is we dereference objects alraedy processed: keep them as is, they will be marked white on the next GC loop!
he evolution of Ruby’s garbage collection followed a long journey: from Stop-the-World GC, which paused the entire program, to Generational GC (Ruby 2.1), which optimized collection by focusing on young objects, and finally to * *Tri-Color Marking GC (Ruby 2.2)**, enabling incremental, interruptible garbage collection for reduced pauses and better responsiveness
Two side of the same coin: The Head (AKA Allocation)
Let’s take a look at how Ruby allocates memory for objects under the hood. In CRuby (MRI), Ruby doesn’t use the
system’s malloc
for every object. Instead, it manages its own heap, which is divided into chunks called pages.
Each page contains multiple slots, and each slot is capable of storing one Ruby object.
MRI keeps track of which slots are available using a free list. Rather than using a separate data structure, Ruby cleverly reuses the memory of free slots themselves: each free slot stores a pointer to the next available slot. This forms a linked list of free slots, making allocation fast and efficient (no need to scan the entire heap !).
Once a page is full, Ruby allocates a new page for new objects to be stored.
A slot is about 40-bytes, and a page is about 16kb (meaning 400+ objects per page). Knowing this, we now have a new question, what if we want to store a gigantic string
test = "hi" * 1_000_000
# 2 bytes * 1_000_000 = 2_000_000 bytes, so way more than 40!
For large objects like in this example (and other such as objects with many instance variables, big arrays, big hashes), Ruby will allocate the string’s data separately on the system heap. It does not store the full content of the string in the slot, but instead, it stores a pointer to the memory region where the string data is actually stored.
Now let’s bridge the gap between allocation and garbage collection. From the first paragraph, we know that there is a mechanism in place to identify data that needs to be freed, but what does this mean at a pages and slots level?
When Ruby’s GC runs, it doesn’t reclaim entire pages immediately. Instead, it works at the slot level. During the sweep phase, every slot that holds an unreachable object is marked as free and linked back into the free list. This means the slot is now ready for reuse in future allocations — super efficient, and no expensive system calls needed.
But here’s the catch: even if most of the slots in a page are free, Ruby can’t release the page itself to the system unless all the slots are free. That means pages with just a handful of live objects become partially unusable while still being reserved for Ruby and not released to the operating system for other programs to use.
To deal with this memory fragmentation issue, Ruby introduced compacting GC in 2.7+. Compaction relocates live objects from scattered pages into more densely packed ones. Once a page has no live objects left, it can be completely reclaimed, either for reuse or even returned to the system and you have to call it manually
GC.compact
With all this theory in mind, we know have want we need to experiment with the Ruby VM !
Fun with GC
After theory, comes practice, let’s look at some GC related code and how we can explore this Ruby code. objspace
will
be used as it’s the standard way in Ruby MRI to get details about the GC state and trigger GC related routines.
require 'objspace'
GC.start
puts GC.stat
10.times do
"a" * 100_000
end
GC.start
puts GC.stat
{:count=>11, :time=>4, :marking_time=>3, :sweeping_time=>0, :heap_allocated_pages=>24, :heap_sorted_length=>201, :heap_allocatable_pages=>177, :heap_available_slots=>24453, :heap_live_slots=>18007, :heap_free_slots=>6446, :heap_final_slots=>0, :heap_marked_slots=>17974, :heap_eden_pages=>24, :heap_tomb_pages=>0, :total_allocated_pages=>24, :total_freed_pages=>0, :total_allocated_objects=>62153, :total_freed_objects=>44146, :malloc_increase_bytes=>880, :malloc_increase_bytes_limit=>16777216, :minor_gc_count=>7, :major_gc_count=>4, :compact_count=>0, :read_barrier_faults=>0, :total_moved_objects=>0, :remembered_wb_unprotected_objects=>0, :remembered_wb_unprotected_objects_limit=>175, :old_objects=>17508, :old_objects_limit=>35016, :oldmalloc_increase_bytes=>880, :oldmalloc_increase_bytes_limit=>16777216}
{:count=>12, :time=>4, :marking_time=>3, :sweeping_time=>1, :heap_allocated_pages=>24, :heap_sorted_length=>201, :heap_allocatable_pages=>177, :heap_available_slots=>24453, :heap_live_slots=>18022, :heap_free_slots=>6431, :heap_final_slots=>0, :heap_marked_slots=>18020, :heap_eden_pages=>24, :heap_tomb_pages=>0, :total_allocated_pages=>24, :total_freed_pages=>0, :total_allocated_objects=>62252, :total_freed_objects=>44230, :malloc_increase_bytes=>832, :malloc_increase_bytes_limit=>16777216, :minor_gc_count=>7, :major_gc_count=>5, :compact_count=>0, :read_barrier_faults=>0, :total_moved_objects=>0, :remembered_wb_unprotected_objects=>0, :remembered_wb_unprotected_objects_limit=>179, :old_objects=>17952, :old_objects_limit=>35904, :oldmalloc_increase_bytes=>832, :oldmalloc_increase_bytes_limit=>16777216}
- One major GC occurred after the string allocations (
:major_gc_count
+1). - Only 15 new live objects remained after allocations — Ruby GC cleaned up fast.
- Heap size stayed stable (no new pages allocated).
- GC was fast — total time stayed at 4ms, with minimal sweeping.
- No compaction was triggered (:compact_count still 0).
require 'objspace'
objs = []
10_000.times do
objs << "x" * rand(100..1000)
end
# Remove some random ones to fragment the heap
5000.times { objs.delete_at(rand(objs.size)) }
GC.start
puts "Before compaction:"
puts GC.stat.slice(:heap_eden_pages, :heap_live_slots, :heap_free_slots, :heap_allocated_pages, :compact_count)
GC.compact
puts "\nAfter compaction:"
puts GC.stat.slice(:heap_eden_pages, :heap_live_slots, :heap_free_slots, :heap_allocated_pages, :compact_count)
Before compaction:
{:heap_eden_pages=>145, :heap_live_slots=>48028, :heap_free_slots=>76266, :heap_allocated_pages=>145, :compact_count=>0}
After compaction:
{:heap_eden_pages=>136, :heap_live_slots=>48054, :heap_free_slots=>68875, :heap_allocated_pages=>136, :compact_count=>1}
Value | Before | After | Commentary |
---|---|---|---|
:heap_allocated_pages |
145 | 136 | ✅ 9 pages were freed — compaction successfully reduced memory usage! |
:heap_live_slots |
48028 | 48054 | +26 |
:heap_free_slots |
76266 | 68875 | 🔻 Freed slots dropped because Ruby repacked objects more densely, using fewer pages |
:compact_count |
0 | 1 | Ensuring we have desired state |
Ruby’s garbage collector handle memory management for you, quietly keeping things running smoothly behind the scenes. It’s got all the right tools in its belt: generational collection for speed, tri-color marking for precision, and compaction to tidy up the mess. Together, these techniques ensure that memory is used efficiently without bogging down your app.
You don’t need to be a GC expert to write clean Ruby code, but knowing how it works can give you an edge — whether you’re optimizing performance, tracking down memory issues, or just curious about what’s going on under the hood. So, next time you hit that GC.start, you will have all the knowledge to understand what’s going on!1