- In C/C++, programmer is responsible for both creation and destruction of objects. Usually programmer neglects destruction of useless objects. Due to this negligence, at certain point, for creation of new objects, sufficient memory may not be available and entire program will terminate abnormally causing OutOfMemoryErrors.
- But in Java, the programmer need not to care for all those objects which are no longer in use. Garbage collector destroys these objects.
- Garbage collector is best example of Daemon thread as it is always running in background.
- Main objective of Garbage Collector is to free heap memory by destroying unreachable objects.
Unreachable objects: An object is said to be unreachable iff it doesn’t contain any reference to it. Also note that objects which are part of island of isolation are also unreachable.
Integer i = new Integer(4); // the new Integer object is reachable via the reference in 'i' i = null; // the Integer object is no longer reachable.
Eligibility for garbage collection: An object is said to be eligible for GC(garbage collection) iff it is unreachable. In above image, after
i = null; integer object 4 in heap area is eligible for garbage collection.
Ways to make an object eligible for GC
- Even though programmer is not responsible to destroy useless objects but it is highly recommended to make an object unreachable (thus eligible for GC) if it is no longer required.
- There are generally four different ways to make an object eligible for garbage collection.
- Nullifying the reference variable
- Re-assigning the reference variable
- Object created inside method
- Island of Isolation
All above ways with examples are discussed in separate article : How to make object eligible for garbage collection
Ways for requesting JVM to run Garbage Collector
- Once we made object eligible for garbage collection, it may not destroy immediately by garbage collector. Whenever JVM runs Garbage Collector program, then only object will be destroyed. But we can not expect when JVM runs Garbage Collector.
- We can also request JVM to run Garbage Collector. There are two ways to do it:
System.gc()method : System class contain static method
gc()for requesting JVM to run Garbage Collector.
Runtime.getRuntime().gc()method : Runtime class allows the application to interface with the JVM in which the application is running. Hence by using its
gc()method, we can request JVM to run Garbage Collector.
- There is no guarantee that any one of above two methods will definitely run Garbage Collector.
- The call
System.gc()is effectively equivalent to the call :
Just before destroying an object, Garbage Collector calls
finalize() method on the object to perform cleanup activities. Once
finalize() method completes, Garbage Collector destroys that object.
finalize() method is present in Object class with following signature:
protected void finalize() throws Throwable
Based on the requirement, we can override
finalize() method for perform our cleanup activities like closing connection from database.
finalize()method called by Garbage Collector, not JVM. Although Garbage Collector is one of the modules of JVM.
Object.finalize()method has empty implementation, thus it is recommended to override
finalize()method to dispose of system resources or to perform other cleanup.
finalize()method is never invoked more than once for any given object.
- If an uncaught exception is thrown by the
finalize()method, the exception is ignored and finalization of that object terminates.
How GC Works
Fragmenting and Compacting
Whenever sweeping takes place, the JVM has to make sure the areas filled with unreachable objects can be reused. This can (and eventually will) lead to memory fragmentation which, similarly to disk fragmentation, leads to two problems:
- Write operations become more time-consuming as finding the next free block of sufficient size is no longer a trivial operation.
- When creating new objects, JVM is allocating memory in contiguous blocks. So if fragmentation escalates to a point where no individual free fragment is large enough to accommodate the newly created object, an allocation error occurs.
To avoid such problems, the JVM is making sure the fragmenting does not get out of hand. So instead of just marking and sweeping, a “memory defrag” process also happens during garbage collection. This process relocates all the reachable objects next to each other, eliminating (or reducing) the fragmentation. Here is an illustration of that:
As we have mentioned before, doing a garbage collection entails stopping the application completely. It is also quite obvious that the more objects there are the longer it takes to collect all the garbage. But what if we would have a possibility to work with smaller memory regions? Investigating the possibilities, a group of researchers has observed that most allocations inside applications fall into two categories:
- Most of the objects become unused quickly
- The ones that usually survive for a (very) long time
These observations come together in the Weak Generational Hypothesis. Based on this hypothesis, the memory inside the VM is divided into what is called the Young Generation and the Old Generation. The latter is sometimes also called Tenured.
Having such separate and individually cleanable areas allows for a multitude of different algorithms that have come a long way in improving the performance of the GC.
This is not to say there are no issues with such an approach. For one, objects from different generations may in fact have references to each other that also count as “de facto” GC roots when collecting a generation.
But most importantly, the generational hypothesis may in fact not hold for some applications. Since the GC algorithms are optimized for objects which either “die young” or “are likely to live forever”, the JVM behaves rather poorly with objects with “medium” life expectancy.
The following division of memory pools within the heap should be familiar. What is not so commonly understood is how Garbage Collection performs its duties within the different memory pools. Notice that in different GC algorithms some implementation details might vary but, again, the concepts in this chapter remain effectively the same.
Eden is the region in memory where the objects are typically allocated when they are created. As there are typically multiple threads creating a lot of objects simultaneously, Eden is further divided into one or more Thread Local Allocation Buffer (TLAB for short) residing in the Eden space. These buffers allow the JVM to allocate most objects within one thread directly in the corresponding TLAB, avoiding the expensive synchronization with other threads.
When allocation inside a TLAB is not possible (typically because there’s not enough room there), the allocation moves on to a shared Eden space. If there’s not enough room in there either, a garbage collection process in Young Generation is triggered to free up more space. If the garbage collection also does not result in sufficient free memory inside Eden, then the object is allocated in the Old Generation.
When Eden is being collected, GC walks all the reachable objects from the roots and marks them as alive.
We have previously noted that objects can have cross-generational links so a straightforward approach would have to check all the references from other generations to Eden. Doing so would unfortunately defeat the whole point of having generations in the first place. The JVM has a trick up its sleeve: card-marking. Essentially, the JVM just marks the rough location of ‘dirty’ objects in Eden that may have links to them from the Old Generation. You can read more on that in Nitsan’s blog entry.
After the marking phase is completed, all the live objects in Eden are copied to one of the Survivor spaces. The whole Eden is now considered to be empty and can be reused to allocate more objects. Such an approach is called “Mark and Copy”: the live objects are marked, and then copied (not moved) to a survivor space.
Next to the Eden space reside two Survivor spaces called from and to. It is important to notice that one of the two Survivor spaces is always empty.
The empty Survivor space will start having residents next time the Young generation gets collected. All of the live objects from the whole of the Young generation (that includes both the Eden space and the non-empty ‘from’ Survivor space) are copied to the ‘to’ survivor space. After this process has completed, ‘to’ now contains objects and ‘from’ does not. Their roles are switched at this time.
This process of copying the live objects between the two Survivor spaces is repeated several times until some objects are considered to have matured and are ‘old enough’. Remember that, based on the generational hypothesis, objects which have survived for some time are expected to continue to be used for very long time.
Such ‘tenured’ objects can thus be promoted to the Old Generation. When this happens, objects are not moved from one survivor space to another but instead to the Old space, where they will reside until they become unreachable.
To determine whether the object is ‘old enough’ to be considered ready for propagation to Old space, GC tracks the number of collections a particular object has survived. After each generation of objects finishes with a GC, those still alive have their age incremented. Whenever the age exceeds a certain tenuring threshold the object will be promoted to Old space.
The actual tenuring threshold is dynamically adjusted by the JVM, but specifying
-XX:+MaxTenuringThreshold sets an upper limit on it. Setting
-XX:+MaxTenuringThreshold=0 results in immediate promotion without copying it between Survivor spaces. By default, this threshold on modern JVMs is set to 15 GC cycles. This is also the maximum value in HotSpot.
Promotion may also happen prematurely if the size of the Survivor space is not enough to hold all of the live objects in the Young generation.
The implementation for the Old Generation memory space is much more complex. Old Generation is usually significantly larger and is occupied by objects that are less likely to be garbage.
GC in the Old Generation happens less frequently than in the Young Generation. Also, since most objects are expected to be alive in the Old Generation, there is no Mark and Copy happening. Instead, the objects are moved around to minimize fragmentation. The algorithms cleaning the Old space are generally built on different foundations. In principle, the steps taken go through the following:
- Mark reachable objects by setting the marked bit next to all objects accessible through GC roots
- Delete all unreachable objects
- Compact the content of old space by copying the live objects contiguously to the beginning of the Old space
As you can see from the description, GC in Old Generation has to deal with explicit compacting to avoid excessive fragmentation.
Prior to Java 8 there existed a special space called the ‘Permanent Generation’. This is where the metadata such as classes would go. Also, some additional things like internalized strings were kept in Permgen. It actually used to create a lot of trouble to Java developers, since it is quite hard to predict how much space all of that would require. Result of these failed predictions took the form of java.lang.OutOfMemoryError: Permgen space. Unless the cause of such OutOfMemoryError was an actual memory leak, the way to fix this problem was to simply increase the permgen size similar to the following example setting the maximum allowed permgen size to 256 MB:
java -XX:MaxPermSize=256m com.mycompany.MyApplication
As predicting the need for metadata was a complex and inconvenient exercise, the Permanent Generation was removed in Java 8 in favor of the Metaspace. From this point on, most of the miscellaneous things were moved to regular Java heap.
The class definitions, however, are now loaded into something called Metaspace. It is located in the native memory and does not interfere with the regular heap objects. By default, Metaspace size is only limited by the amount of native memory available to the Java process. This saves developers from a situation when adding just one more class to the application results in the java.lang.OutOfMemoryError: Permgen space. Notice that having such seemingly unlimited space does not ship without costs - letting the Metaspace to grow uncontrollably you can introduce heavy swapping and/or reach native allocation failures instead.
In case you still wish to protect yourself for such occasions you can limit the growth of Metaspace similar to following, limiting Metaspace size to 256 MB:
java -XX:MaxMetaspaceSize=256m com.mycompany.MyApplication
Minor GC vs Major GC vs Full GC
The Garbage Collection events cleaning out different parts inside heap memory are often called Minor, Major and Full GC events. In this section we cover the differences between these events. Along the way we can hopefully see that this distinction is actually not too relevant.
What typically is relevant is whether the application meets its SLAs, and to see that you monitor your application for latency or throughput. And only then are GC events linked to the results. What is important about these events is whether they stopped the application and how long it took.
But as the terms Minor, Major and Full GC are widely used and without a proper definition, let us look into the topic in a bit more detail.
Collecting garbage from the Young space is called Minor GC. This definition is both clear and uniformly understood. But there are still some interesting takeaways you should be aware of when dealing with Minor Garbage Collection events:
- Minor GC is always triggered when the JVM is unable to allocate space for a new object, e.g. Eden is getting full. So the higher the allocation rate, the more frequently Minor GC occurs.
- During a Minor GC event, Tenured Generation is effectively ignored. References from Tenured Generation to Young Generation are considered to be GC roots. References from Young Generation to Tenured Generation are simply ignored during the mark phase.
- Against common belief, Minor GC does trigger stop-the-world pauses, suspending the application threads. For most applications, the length of the pauses is negligible latency-wise if most of the objects in the Eden can be considered garbage and are never copied to Survivor/Old spaces. If the opposite is true and most of the newborn objects are not eligible for collection, Minor GC pauses start taking considerably more time.
So defining Minor GC is easy - Minor GC cleans the Young Generation.
Major GC vs Full GC
It should be noted that there are no formal definitions for those terms - neither in the JVM specification nor in the Garbage Collection research papers. But on the first glance, building these definitions on top of what we know to be true about Minor GC cleaning Young space should be simple:
- Major GC is cleaning the Old space.
- Full GC is cleaning the entire Heap - both Young and Old spaces.
Unfortunately it is a bit more complex and confusing. To start with - many Major GCs are triggered by Minor GCs, so separating the two is impossible in many cases. On the other hand - modern garbage collection algorithms like G1 perform partial garbage cleaning so, again, using the term “cleaning” is only partially correct.
This leads us to the point where instead of worrying about whether the GC is called Major or Full GC, you should focus on finding out whether the GC at hand stopped all the application threads or was able to progress concurrently with the application threads.
This confusion is even built right into the JVM standard tools. What I mean by that is best explained via an example. Let us compare the output of two different tools tracing the GC on a JVM running with Concurrent Mark and Sweep collector (