Untitled
Untitled
A number of people whose algorithms I’ve briefly described took the time
to explain them to me. I’m particularly grateful to Christine Flood of Red
Hat and Gil Tene from Azul Systems, both of whom also took the trouble
to read the complete draft and provide feedback.
Ben Evans read an early draft of the book and provided a tremendous
amount of invaluable feedback.
Foundations ............................................................................... 9
The heap and pointers ...............................................................................10
Key terms ....................................................................................................12
Collector mechanisms................................................................................18
General trade-offs ......................................................................................21
If any of these points surprise you, or if you aren’t sure what the
differences are between, say, parallelism and concurrency, or what a
garbage-collection safe point is, or what the differences are between
HotSpot CMS, G1, and C4, and when you would choose one instead of
another, then this book can help. My goal in writing it is to provide a
quick, accessible guide for Java developers and architects who want to
understand what garbage collection is, how it works, and how it impacts
the execution of their programs.
2
to the subject was worthwhile. An InfoQ mini-book seemed like the ideal
format for this. If reading this book makes you want to learn more about
the subject, The Garbage Collection Handbook is a good place to go next,
as are the many talks and other online resources I reference in the text.
In the “Suggestions for further reading” section at the end of the book, I
provide links to the academic papers that describe in more detail many of
the algorithms I talk about here, and some other resources you may find
useful.
3
Conventions
We use a number of typographical
conventions within this book that Important additional
distinguish between different kinds of notes are shown using
information. callouts like this.
Code in the text, including database
table names, folder names, file
extensions, pathnames, dummy URLs, user input, and Twitter handles
are shown as follows:
“The size of the Java heap can be typically controlled by two flags, -Xms
for the initial size and -Xmx for the maximum size.”
List<GarbageCollectorMXBean > gcMxBeans =
ManagementFactory.getGarbageCollectorMXBeans ();
4
Reader feedback
We always welcome feedback from our readers. Let us know what you
think about this book — what you liked or disliked. Reader feedback helps
us develop titles that you get the most out of.
If you have a topic that you have expertise in and you are interested in
either writing or contributing to a book, please take a look at our mini-
book guidelines on http://www.infoq.com/minibook-guidelines.
5
Introduction
The Java Language Specification mandates the inclusion of automatic
storage management. “Typically,” the spec states, “using a garbage
collector, to avoid the safety problems of explicit deallocation (as in C’s
free or C++’s delete).”1
1 http://docs.oracle.com/javase/specs/jls/se8/html/jls-1.html
6
A corollary is the typical size of the heap. In my experience most Java
programs in production today are given heap sizes of 1 GB to 4 GB
memory because 4 GB is about the most that they can cope with whilst
having pauses of an acceptable length. Gil Tene did a quick informal
survey of the audience for his talk 2 at SpringOne 2011 and saw results in
line with this. More recently, Kirk Pepperdine did a similar exercise3 at
QCon New York, again with similar results.
Whilst 10 years ago, a 512-MB to 1-GB heap size would have been
considered substantial, and a high-end commodity server might have
shipped with 1-2 GB of RAM and a two-core CPU, a modern commodity-
hardware server typically has around 96-256 GB of memory running on
a system with 24 to 48 virtual or physical cores. Over a period of time
during which commodity hardware memory capacities have increased
a hundredfold, commonly used garbage-collector heap sizes have only
doubled. To put this another way, the performance of garbage collectors
has lagged significantly behind both the hardware and software demands
of many larger enterprise applications.
2 http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection
3 http://www.infoq.com/presentations/g1-gc-logs
7
PART
ONE
Foundations
Objects are stored in Java’s heap memory. Created when the Java Virtual
Machine (JVM) starts up, the Java heap is the run-time data area from
which memory for all object instances (including arrays) is allocated.
In addition, heap memory can be shared between threads. All instance
fields, static fields, and array elements are stored in the Java heap. In
contrast, local variables, formal-method parameters, and exception-
handler parameters reside outside the Java heap; they are never shared
between threads and are unaffected by the memory model. The heap is
mutated by a mutator, which is just a fancy name for your application.
10
FOUNDATIONS
The size of the Java heap can be typically controlled by two flags, -Xms
for the initial size and -Xmx for the maximum size. It is worth noting
that most JVMs also use heaps outside of Java for storing material other
than Java objects, such as the code cache, VM threads, VM and garbage-
collection structures and so on, so the total size of the process memory
will be larger than the maximum Java heap size you give it. Generally,
you only need to worry about this if you are working in a very memory-
constrained environment.
An example may help make this clearer. In figure 2, we have two stack
frames, for Foo and Bar. They have immediate values like 42 and 3.1416,
but they also have references to other objects allocated in the heap. As
you can see, some of these objects are referenced by a single stack frame,
some by multiple stack frames, and one of them is not referenced from
anywhere. All the garbage collector is doing is finding the objects that are
pointed to, compacting them into an area of memory so that you have
better cache-line behaviour, and freeing up space for you to allocate new
objects.
11
THE JAVA GARBAGE COLLECTION MINI-BOOK
Key terms
Before we move on to look at individual collectors, there are a few other
terms you need to be familiar with.
12
FOUNDATIONS
in a positive light and to pull attention away from the opposite qualities
they carry. Specifically, “mostly concurrent” should actually be read to
mean “sometimes stop-the-world”, “mostly incremental” should be read
to mean “sometimes monolithic”, and “mostly parallel” should be read to
mean “sometimes serial”.
13
THE JAVA GARBAGE COLLECTION MINI-BOOK
All commercial server JVMs use precise collectors and use a form of
moving collector at some point in the garbage collection cycle.
Safe points
Garbage-collection events occur at safe points. A garbage-collection safe
point is a point or range in a thread’s execution when the collector can
identify all the references in that thread’s execution stack.
Bringing a thread to safe point is the act of getting a thread to reach a safe
point and then not executing past it. This is not necessarily the same as
stopping at a safe point; you can still be using CPU cycles. For example,
if you make a call out to native code via JNI, that thread is at a safe point
while you are running in native code, since JNI cannot mess with Java
pointers.
A global safe point involves bringing all threads to a safe point. These
global safe points represent the STW behaviour commonly needed by
certain garbage-collection operations and algorithms. Their length
depends on two things: the duration of the operation to take place during
the global safe point and the time it takes to reach the safe point itself
(also known as “time to safe point”).
14
FOUNDATIONS
It’s worth saying that the terms “safe point” and “garbage-collection safe
point” are commonly used interchangeably, but there are other reasons
for taking a global safe point in a JVM — for instance, de-optimisation
safe points. The JVM can de-optimise code because an assumption it
made was wrong; a common case where this occurs is in class hierarchy
analysis. Suppose the compiler has recognised that a certain method only
has one implementer, so it optimises it, perhaps inlining it and making
it a static call instead of a virtual call. Then you load a new class that
overloads that reference, making the underlying assumption wrong; if
we keep running, we’ll end up calling the wrong function. When this
happens, the compiler needs to take something that is compiled, throw
away the frame, and reconstruct an interpretive stack with an equivalent
JVM stack. A safe point for doing this is clearly broader than the garbage-
collection safe point since you need to know where every variable is,
including the non-pointer ones, and where all the state information is.
Generational collection
Almost all commercial Java collectors take advantage of generational
collection in some way to achieve significantly more efficient collection.
JVMs that do this segregate the heap between short-lived objects and long-
lived objects. These two separate “generations” are typically physically
distinct areas or sets of regions of the heap. The young (or “new”)
generation is collected in preference to old (or “tenured”) generation,
and objects that survive long enough are promoted (or “tenured”) from
the young generation to the old. Collection of the young generation is
sometimes referred to as a “minor garbage-collection event”.
15
THE JAVA GARBAGE COLLECTION MINI-BOOK
matter (as opposed to rising with the size of the young heap). As long
as the generational hypothesis actually holds for the young generation,
the sparseness of the younger heap portions will provide such algorithms
with significantly improved efficiency in reclaiming empty space. This
generational filter typically allows the collector to maximise recovered
space (sometimes called “yield”) whilst minimising effort.
Managed run times similar to Java have been leveraging the weak
generational hypothesis since the mid 1980s and it appears to hold true
for pretty much every program we run on JVMs today. However, it is
important to understand that this powerful observation does not actually
allow us to avoid collecting the old generation altogether; instead, it
simply allows us to reduce the frequency of oldgen collections.
16
FOUNDATIONS
17
THE JAVA GARBAGE COLLECTION MINI-BOOK
Collector mechanisms
Precise garbage collectors, including all collectors in current commercial
JVMs, use tracing mechanisms for collection. Such precise collectors
can (and will) freely move objects in the heap, and will identify and
recycle any dead matter in the heap regardless of the topology of the
object graph. Cyclic object graphs are trivially and safely collected in
such collectors, allowing them to reliably collect dead matter from heaps
with an efficiency that greatly exceeds those of inaccurate techniques like
reference counting.
18
FOUNDATIONS
From a complexity point of view, the work performed during the mark
phase increases linearly with the size of the live set rather than the size of
the heap. Therefore, if you have a huge heap and a tiny live set, you are
not going to need to do additional work in the tracing part of a collector.
Sweep
Collector algorithms that include a sweep pass (e.g. mark/sweep, mark/
sweep/compact) scan through the entire heap, identify all the dead objects
(those not marked live), and recycle their space in some way — e.g. by
tracking their locations in free lists of some sort or by preparing the dead
areas for later compaction. Sweeping work correlates with heap size since
you have to look at the entire heap to find all the dead stuff; even if you
have a large heap with very little that is alive, the sweep phase still has to
cover the entire heap.
Compact/relocate
Compaction is a necessary evil in virtually all JVMs: without compaction,
memory reclaimed from dead objects of variable sizes will tend to
fragment over time. With such fragmentation, the heap will eventually
reach a point where a large amount of available memory exists but is
spread around in small chunks, meaning that there is no slot large enough
to accommodate an object you want to create. Unless you have fixed-
sized objects with fixed-population counts, this will eventually happen
to any Java heap.
19
THE JAVA GARBAGE COLLECTION MINI-BOOK
Copy
A copying collector uses a different technique. In its simplest form, a
copying collector splits a heap it is managing into two equally sized spaces,
which will alternately be referred to as “from” and “to”. The current “to”
space is always kept completely empty except during collection, while all
actual objects are in the “from” space. All allocations go into the designated
“from” space until that space is full, which triggers a collection cycle. The
copying collector performs a trace through all reachable objects, moving
all encountered objects from the “from” space to the “to” space as it goes,
and correcting all encountered references in all objects as it does so. The
copy is completed in a single pass: at the start of the copy, all objects were
in “from” space and all references pointed to “from” space, and at the end
of the copy all live objects are in “to” space, and all live references point
to “to” space. At that point, the collector reverses the roles of the “from”
and “to” spaces, the collection is done, and allocation into the newly
designated “from” space can proceed. The amount of work a copying
collector performs generally rises linearly with the size of the live set.
Copying collectors are typically monolithic: all objects must move from
the “from” to the “to” space for the collection to complete, and there is no
way back nor any means of dealing with half-completed collections that
cannot proceed due to lack of space in the “to” part of the heap. This is
why, in a single-generation copying collector, the size of the “from” and
“to” space must be equal, because at the beginning of the collection there
is no way to tell how much of the “from” space is actually alive.
20
FOUNDATIONS
General trade-offs
Each of the different collector mechanisms we’ve talked about has its own
strengths and weaknesses. A copying collector works in a single pass;
21
THE JAVA GARBAGE COLLECTION MINI-BOOK
Deciding the rate at which you promote objects from the young
generation to the old generation is where a lot of time generally gets
spent when tuning a HotSpot collector. Typically, you’ll want to keep
22
FOUNDATIONS
surviving objects in the young generation for at least one cycle before
promotion, since immediate promotion can dramatically reduce the
efficiency of the generational filter. Conversely, waiting too long can
dramatically increase the copying work.
In the next part of the book, we will introduce the generational HotSpot
collectors in more detail.
23
PART
TWO
Two-region collectors
The .NET CLR comes with two collectors: a client one and a server one.
OpenJDK and the Oracle JDK, on the other hand, each come with four,
and there are a number of other collectors available from different JVM
providers.
Since Java 2 SE 5.0, default values for the garbage collector, heap size,
and HotSpot virtual machine (client or server) are automatically chosen
based on the platform and operating system on which the application
is running. The JVM will often do a pretty decent job of selecting a
garbage collector for you and it may be that you never have to consider
the choices that it makes. You can, however, select the algorithm that the
JVM is using for your program. Knowing what each collector does and
how it works may help you in choosing the most appropriate one for
your needs, though benchmarking is also important.
If you don’t know which collector you are running then the following
program will show you:
import java.lang.management.GarbageCollectorMXBean ;
import java.lang.management.ManagementFactory ;
import java.util.List;
public class GCInformation {
List<GarbageCollectorMXBean > gcMxBeans =
ManagementFactory.getGarbageCollectorMXBeans ();
for (GarbageCollectorMXBean gcMxBean : gcMxBeans) {
System.out.println(gcMxBean.getName());
}
}
}
This will output the younger generation first and show something like:
PS Scavenge
PS MarkSweep
26
TWO-REGION COLLECTORS
G1 will output:
G1 Young Generation
G1 Old Generation
Heap structure
The collectors we’ll look at in this part of the book divide the heap into
two regions — young/new and tenured — to exploit the weak generational
hypothesis.
27
THE JAVA GARBAGE COLLECTION MINI-BOOK
PermGen
Prior to Java 8, HotSpot also had a permanent generation (“PermGen”)
contiguous with the Java heap in which the runtime stored objects that it
believed were effectively immortal, along with per-class metadata such as
hierarchy information, method data, stack and variable sizes, the runtime
constant pool, resolved symbolic reference, and Vtables.
The move to Metaspace was necessary since the PermGen was really
hard to tune. One problem was that its size was limited based on
-XX:MaxPermSize. This had to be set on the command line before
starting the JVM or it would default to 64 MB (85 MB for 64-bit scaled
pointers). Sizing depended on a lot of factors, such as the total number
of classes, the size of the constant pools, size of methods, etc., and was
notoriously difficult. By contrast, the class metadata is now allocated out
of native memory, which means that the max available space is the total
available system memory.
28
TWO-REGION COLLECTORS
Object allocation
HotSpot uses the bump-the-pointer technique to get faster memory
allocations, combined with thread-local allocation buffers (TLABs) in
multithreaded environments.
29
THE JAVA GARBAGE COLLECTION MINI-BOOK
TLAB size then objects that fit in the TLAB will not be created in the old
generation.
Serial collector
The Serial collector (-XX:+UseSerialGC ) is the simplest collector and is
a good option for single-processor systems.
30
TWO-REGION COLLECTORS
The sweep phase identifies garbage, and the collector then performs
sliding compaction, moving the objects towards the beginning of the
old/permanent-generation space, and leaving any free space in a single
contiguous free chunk at the opposite end. The compaction allows any
future allocations to use the fast bump-the-pointer technique.
You might imagine that the Serial collector isn’t that useful any more, but
this isn’t the case. The Serial collector is the collector of choice for most
applications that are run on client-style machines that do not require low
pause times. As a rough guide, at the time of writing, the Serial collector
can manage heaps of around 64 MB with worst-case pauses of less than
half a second for full collections. Perhaps more surprisingly, it is finding
some server-side use in cloud environments. Speaking at JavaOne in
2014, Christine Flood, a principal software engineer at Red Hat, said:
“The Open Shift guys at Red Hat, who want to use a minimal amount of resources
and a constrained footprint, are pretty happy with Serial GC right now. They
can give their program a small, ixed amount of memory, and they can give it
a goal, saying, ‘I want you to come back here when you can.’ So if your program
ramps up your data, and then goes down again, Serial GC is really good at
giving that memory back so that another JVM can use it.” 2
Parallel collector
The Parallel collector is the default server-side collector. It uses a
monolithic, stop-the-world copying collector for the new generation,
and a monolithic, stop-the-world mark/sweep for the old generation. It
has, though, no impact on a running application until a collection occurs.
It comes in two forms: Parallel and Parallel Old. The Parallel collector
(-XX:+UseParallelGC ) uses multiple threads to run a parallel version of
the young-generation-collection algorithm used by the Serial collector.
It is still a stop-the-world copying collector, but performing the young-
generation collection in parallel, using many threads, decreases garbage-
2 https://www.parleys.com/play/543f8a2ce4b08dc7823e5418/about
31
THE JAVA GARBAGE COLLECTION MINI-BOOK
The Parallel collector only processes weak and soft references during a
stop-the-world pause, and is therefore sensitive to the number of weak
or soft references a given application uses.
32
TWO-REGION COLLECTORS
It has a mostly concurrent multipass marker that marks the heap while
the mutator is running. Since CMS runs the mark phase concurrently,
the object graph is changing whilst marking is happening. This results
in an obvious race called, with no prizes for originality, the “concurrent
marking race”. To understand the problem, imagine that the mutator
takes a reference that the collector hasn’t seen yet and copies that
reference into a place that the collector has already visited. As far as
the collector is concerned, it already has this covered, so it never sees
the reference, doesn’t mark the object as alive, and thus the object gets
collected during the sweep phase, corrupting the heap. This race must
somehow be intercepted and closed.
There are two ways to deal with this problem: incremental update and
“snapshot at the beginning” (SATB). CMS uses SATB, which takes a
logical snapshot of the set of live objects in the heap at the beginning of
the marking cycle. This algorithm uses a pre-write barrier to record and
mark the objects that are a part of the logical snapshot.
HotSpot already has a generational collector and also has a blind write
barrier to track every store of every reference. This means that all
mutations are already tracked, and the card table reflects mutations. So
if, during marking, we clean the card table in some way, then whatever
accumulated in the card table whilst we were marking is stuff that was
changed, and the collector can revisit these and repeat the marking. Of
course, whilst it’s doing this, the object graph is still shifting, so it has to
repeat the process.
Eventually the collector will decide that the amount of work left to do is
small enough that it can perform a brief stop-the-world pause to catch
up and be done. As with the Parallel collector, CMS processes all weak
and soft references in the stop-the-world pause, so programs that make
extensive use of weak and soft references can expect to see a longer stop-
the-world pause when using CMS.
33
THE JAVA GARBAGE COLLECTION MINI-BOOK
CMS also has a concurrent sweep phase. Sweeping is actually fairly easy
to do concurrently since, in marked contrast to any zombie film you
may have watched, dead stuff doesn’t come back to life., Not on the JVM
anyway.
When CMS does a promotion to tenured space, it again makes use of the
free list, putting the objects into a place on the free list and recycling the
memory. This works for a while. However, if you see the “promotion
failure” message in the CMS GC log, or a pause of more than a second
or two, then a promotion has failed. CMS will fall back to a full stop-
the-world monolithic collection when it fails to promote objects because
either the tenured space is too fragmented or it fails a concurrent load
operation with a marker.
That CMS is mostly concurrent with the application has some other
implications you should be aware of. First, CPU time is taken by the
collector, thus reducing the CPU available to the application. The
amount of time required by CMS grows linearly with the amount of
object promotion to the tenured space. In addition, for some phases of
the concurrent GC cycle, all application threads have to be brought to
a safe point for marking GC roots and performing a parallel re-mark to
check for mutation.
To sum up then, CMS makes a full GC event less frequent at the expenses
of reduced throughput, more expensive minor collections, and a greater
footprint. The reduction in throughput can be anything from 10%-40%
compared to the Parallel collector, depending on promotion rate. CMS
also requires a 20% greater footprint to accommodate additional data
structures and floating garbage that can be missed during the concurrent
marking and so gets carried over to the next cycle.
In the next part of the book, we’ll take a look at Garbage First (or G1),
which was intended to be a replacement for CMS in most cases, IBM’s
34
TWO-REGION COLLECTORS
35
PART
THREE
Multi-region collectors
Heap structure
The collectors in this part of the book use a hybrid heap structure. Here
the heap is based on logical as opposed to physical generations, specifically
a collection of non-contiguous regions of the young generation and a
remainder in the old generation. A distinct advantage of this approach
is that neither the young nor the old generation have to be contiguous,
allowing for a more dynamic sizing of the generations. If it has a
humongous object to process — an object that is larger than one of the
regions — it can grab two or three adjacent regions and allocate the object
to these.
The figure below, based on the Garbage First collector from Oracle,
illustrates how this works.
38
MULTI-REGION COLLECTORS
Garbage First
The first version of G1 (XX:+UseG1GC ) was non-generational, but the
current version is a generational collector that groups objects of a similar
age. G1 is still comparatively new and thus in only limited deployment
so far in the real world. Like CMS, G1 attempts to address the old-
generation pauses. It also uses a monolithic, stop-the-world collector
for the young generation. It has a mostly concurrent marker that is
very similar to the one used in CMS and it has a concurrent sweep. It
uses many other techniques we’ve already looked at — for example, the
concepts of allocation, copying to survivor space, and promotion to old
generation are similar to previous HotSpot GC implementations. Eden
and survivor regions still make up the young generation. It uses SATB to
close the concurrent marking race.
The remembered sets can get pretty large. Each region has an associated
remembered set, which indicates all locations that might contain pointers
to (live) objects within the region. Maintaining these remembered sets
requires the mutator threads to inform the collector when they make
pointer modifications that might create inter-region pointers.
This notification uses a card table (basically a hash table) in which every
512-byte card in the heap maps to a 1-byte entry in the table. Each thread
has an associated-remembered-set log, a current buffer or sequence
of modified cards. In addition, there is a global set of filled RS buffers.
39
THE JAVA GARBAGE COLLECTION MINI-BOOK
In many respects, G1 looks and feels like the other generational collectors
we’ve looked at, but it behaves rather differently. For one thing, G1 has
a pause prediction model built into the collector. When it goes through
the mark phase of the collection, it calculates the live occupancy of a
particular region and uses that to determine whether the region is ripe
for collection. So when it goes to collect, it puts all of the young regions
into the collection set (generally referred to as the “CSet”) and then looks
at how long it’s going to take to collect all of these regions; if there is time
left in the budget, based on the soft-defined pause-time goal, it can add
some of the old-generation regions to the CSet as well. The pause-time
goal here is a hint, not a guarantee — if the collector can’t keep up, it will
blow the pause-time goal.
40
MULTI-REGION COLLECTORS
as many regions pointing into any one region. The complexity, and the
amount of work I have to do, both grow considerably.
Balanced
IBM’s WebSphere Application Server version 8 introduced the
new region-based Balanced garbage collector. Though developed
independently, it is similar to G1, at least at a high level. You enable it
through the command line option -Xgcpolicy:balanced . The Balanced
collector aims to even out pause times and reduce the overhead of some
of the costlier operations typically associated with garbage collection.
Aside from arrays, objects are always allocated within the bounds of
a single region so, unlike G1, the region size imposes a limit on the
maximum size of an object. An array which cannot fit within a single
region is represented using a discontiguous format known as an arraylet.
Large array objects appear as a spine, which is the central object and the
only entry that can be referenced by other objects. Actual array elements
are then held as leaves which can be scattered throughout the heap in any
position and order.
41
THE JAVA GARBAGE COLLECTION MINI-BOOK
42
MULTI-REGION COLLECTORS
43
THE JAVA GARBAGE COLLECTION MINI-BOOK
Balanced deals with this by binding each heap region to one of the system’s
NUMA nodes on startup. The heap is divided as evenly as possible so that
all nodes have approximately the same number of regions.
As threads allocate objects, they attempt to place those objects in their local
memory for faster access. If there is insufficient local memory, the thread
can borrow memory from other nodes. Although this memory is slower,
it is still preferable to an unscheduled GC or an OutOfMemoryError.
Metronome
IBM’s Metronome is an incremental mark-sweep collector with partial
on-demand compaction to avoid fragmentation. It uses a deletion write-
barrier, marking live any object whose reference is overwritten during
a write. After sweeping to reclaim garbage, Metronome compacts if
necessary to ensure that enough contiguous free space is available to
satisfy allocation requests until the next collection. Like Shenandoah,
Metronome uses Brooks-style forwarding pointers, imposing an
indirection on every mutator access.
44
MULTI-REGION COLLECTORS
Metronome has latency advantages in most cases over CMS and G1,
but may see throughput limitations and still suffers significant stop-
the-world events. It can achieve single-digit millisecond pause times via
constrained allocation rates and live set sizes.
45
THE JAVA GARBAGE COLLECTION MINI-BOOK
C4
Azul’s C4 collector, included in their HotSpot-based Zing JVM 1, is both
parallel and concurrent. It has been widely used in production systems for
several years now and has been successful at removing, or significantly
reducing, sensitivity to the factors that typically cause other concurrent
collectors to pause. Zing is currently available on Linux only, and is
commercially licensed.
Since a number of aspects of C4 are genuinely novel, I’ll spend some time
examining them.
If either of the above conditions is not met, the loaded value barrier will
trigger a “trap condition”, and the mutator will correct the reference to
adhere to the required invariants before it becomes visible to application
code. The use of the LVB test in combination with self-healing trapping
(see below) ensures safe single-pass marking, preventing the application
thread from causing live references to escape the collector’s reach. The
same barrier and trapping mechanism combination is also responsible
1 http://www.azulsystems.com/products/zing/
46
MULTI-REGION COLLECTORS
Self-healing
Key to C4’s concurrent collection is the self-healing nature of handling
barrier trap conditions. This feature dramatically lowers the time cost
of C4’s LVB (by orders of magnitude) compared to other types of read
barrier. C4’s LVB is currently the only read barrier in use in a production
JVM, and its self-healing qualities are one of the main reasons for its
evident viability in a high-throughput, ultra-low-pause collector. When
a LVB test indicates that a loaded reference value must be changed before
the application code proceeds, the value of both the loaded reference and
of the memory location from which it was loaded will be modified to
adhere to the collector’s current invariant requirements (e.g. to indicate
that the reference has already been marked through or to remap the
reference to a new object location). By correcting the cause of the trap in
the source memory location (possible only with a read barrier, such as the
LVB, that intercepts the source address), the GC trap has a self-healing
effect: the same object references will not re-trigger additional GC traps
for this or other application threads. This ensures a finite and predictable
amount of work in a mark phase, as well as the relocate and remap
phases. Azul coined the term “self-healing” in their first publication of
the pauseless GC algorithm in 2005, and Tene believes this self-healing
aspect is still unique to the Azul collector.
47
THE JAVA GARBAGE COLLECTION MINI-BOOK
At the start of the mark phase, the marker’s work list is primed with a
root set that includes all object references in application threads. As is
common to all markers, the root set generally includes all refs in CPU
registers and on the threads’ stacks. Running threads collaborate by
marking their own root set, while blocked (or stalled) threads get marked
in parallel by the collector’s mark-phase threads.
2 http://www.srl.inf.ethz.ch/papers/pldi06-cgc.pdf
48
MULTI-REGION COLLECTORS
Rather than use a global, stop-the-world safe point (where all application
threads are stopped at the same time), the marker algorithm uses a
checkpoint mechanism. Each thread can immediately proceed after its
root set has been marked (and expected-NMT flipped) but the mark
phase cannot proceed until all threads have crossed the checkpoint.
After all root sets are marked, the algorithm continues with a parallel and
concurrent marking phase. It pulls live refs from the work lists, marks
their target objects as live, and recursively works on their internal refs.
The mark phase continues until all objects in the marker work list are
exhausted, at which point all live objects have been traversed. At the end
of the mark phase, only objects that are known to be dead are not marked
“live” and all valid references have their NMT bit set to “marked through”.
During relocation, sets of pages (starting with the sparsest pages) are
selected for relocation and compaction. Each page in the set is protected
from mutator access, and all live objects in the page are copied out and
relocated into contiguous, compacted pages. Forwarding information
49
THE JAVA GARBAGE COLLECTION MINI-BOOK
During and after relocation, any attempt by the mutator to use references
to relocated objects is intercepted and corrected. Attempts by the mutator
to load such references will trigger the loaded value barrier’s trapping
condition, at which point the stale reference will be corrected to point
to the object’s proper location, and the original memory location from
which the reference was loaded will be healed to avoid future triggering
of the LVB condition.
Using a feature that Tene calls “quick release”, the C4 collector immediately
recycles memory page resources without waiting for remapping to
complete. By keeping all forwarding information outside the original
page, the collector is able to safely release physical memory immediately
after the page contents have been relocated and before remapping has
been completed. A compacted page’s virtual-memory space cannot be
freed until no more stale references to that page remain in the heap
(which will only be reliably true at the end of the next remap phase)
but the physical-memory resources backing that page are immediately
released by the relocate phase and recycled at new virtual-memory
locations as needed. The quickly released physical resources are used to
satisfy new object allocations as well as the collector’s own compaction
pipeline. With “hand over hand” compaction along with quick release,
the collector can use page resources released by the compaction of one
page as compaction destinations for additional pages, and the collector is
able to compact the entire heap in a single pass without requiring prior
empty target memory to compact into.
During the remap phase, which follows the relocate phase, GC threads
complete reference remapping by traversing the object graph and
executing a LVB test on every live reference found in the heap. If a
reference is found to be pointing to a relocated object, it is corrected to
point to the object’s proper location. Once the remap phase completes, no
live heap reference can exist that would refer to pages protected by the
previous relocate phase, and at that point the virtual memory for those
pages is freed.
Since the remap phase traverses the same live object graph as a mark
phase would, and because the collector is in no hurry to complete
the remap phase, the two logical phases are rolled into one in actual
implementation, known as the “combined mark and remap phase”.
In each combined mark/remap phase, the collector will complete the
50
MULTI-REGION COLLECTORS
Shenandoah
Red Hat’s Shenandoah is, at time of writing, still under development. It
is open source, and is expected to be incorporated into OpenJDK via JEP
1893 at some point in the future — most likely Java 10 or later.
The collector uses a regional heap structure similar to that in G1 and C4,
but with no generational filter applied, at least in the current version.
Project lead Christine Flood told me that this might change in the future,
although development on a generational version hasn’t started at the
time of writing:
“The initial thinking was that we didn’t need to treat young-generation objects
diferently. The garbage collector would pick the regions with the most space to
reclaim, regardless of whether the application it the generational hypothesis or
not. This was based on a survey of currently trendy applications that claimed
that they had mostly medium-lived objects, which would not beneit from a
generational collector. However, SpecJVM is still important, and it does behave
in a generational manner, so we are reconsidering whether it makes sense to
make a generational Shenandoah. It’s basically added complexity for a payof
for some applications.”
3 http://openjdk.java.net/jeps/189
51
THE JAVA GARBAGE COLLECTION MINI-BOOK
4 https://www.youtube.com/watch?v=QcwyKLlmXeY
52
MULTI-REGION COLLECTORS
does another pass over the root set to make sure everything currently
in the root set is marked live. This may require some draining of SATB
queues, but most of that work should have already been done. The
collector only pushes unmarked objects onto a queue to be scanned; if an
object is already marked, the collector doesn’t need to push it.
Eventually, a thread will run out of objects. When that happens, the GC
work thread attempts to steal from another thread’s queue. When all
threads are done, i.e. no thread can steal from any other thread any more,
the mark phase is complete.
After the concurrent marking phase is done, Shenandoah stops the world
and processes the SATB list by marking all objects in it, traversing their
references where necessary.
After the whole marking phase is done, the collector knows for each
region how much live data (and therefore garbage) each region contains,
and all reachable objects are marked as such.
5 http://www.memorymanagement.org/glossary/f.html#forwarding.pointer
53
THE JAVA GARBAGE COLLECTION MINI-BOOK
Known trade-offs
Since Shenandoah is very much in development, not all the trade-offs
are yet known. However, we can list some strengths and weaknesses as
it currently stands.
54
MULTI-REGION COLLECTORS
Since CAS involves a write operation, which can only happen in “to”
space, a CAS operation is expensive for Shenandoah. Three things have
to be copied from “from” space to “to” space: the original object, the new
object, and the compare value. Copies are, however, bounded by region
size, with objects larger than a region handled separately.
55
PART
FOUR
General monitoring
and tuning advice
Instead, this part of the book will examine the three major attributes
of garbage-collection performance, how to select the best collector for
your application, fundamental garbage-collection principles, and the
information to collect when tuning a HotSpot garbage collector.
However, for most applications, one or two of these are more important
than the others. Therefore, as with any performance-tuning exercise,
you need to decide which is the most important for you before you start,
agree on targets for throughput and latency, and then tune the JVM’s
garbage collector for two of the three.
58
GENERAL MONITORING AND TUNING ADVICE
Getting started
Once you’ve got targets,
start by developing a set of With a set of performance tests,
representative load tests you can start by benchmarking
that can be run repeatedly. your application, and then resort to
A mistake that I’ve seen tuning if you need to.
several organisations make
is to have the development
team run performance testing and tune the system before releasing it to
production for the first time, and then never look at it again. Obtaining
optimum performance is not a one-time task; as your application and
its underlying data changes, you will almost certainly need to repeat the
exercise regularly to keep the system running correctly.
Choosing a collector
The following is a simple approach to choosing a collector:
Benchmark your application with the Parallel collector in case it’s okay,
and as a useful reference as you look at other collectors.
If CMS doesn’t hit your performance goals, try benchmarking with G1.
If G1 also doesn’t hit your performance goals, you need to decide whether
to spend time and effort tuning your application and collector to hit your
requirements, or try a low-pause-time collector.
59
THE JAVA GARBAGE COLLECTION MINI-BOOK
The C4 collector used in Azul’s Zing product is, as far as I know, the
only Java collector currently available that can be truly concurrent for
collection and compaction while maintaining high throughput for all
generations. As we’ve seen, it does, though, need specific OS-level hooks
in order to work. It’s perhaps also worth pointing out that although the
published C4 algorithm is properly pauseless (in that it requires no global
stop-the-world safe point), the actual GC implementation of C4 in Zing
does include actual pauses that deal with other JVM-related details and
phase changes. While C4 will pause up to four times per GC cycle, these
stop-the-world events are used for phase shifts in the collection cycle and
do not involve any work that is related to live-set size, heap size, allocation
rate, or mutation rate. The resulting phase-shift pauses are sufficiently
small to be hard to detect among the stalls that are regularly caused by the
operating system itself; it is common for applications running on a well-
tuned Linux system running Zing for weeks to pause for 2 ms at worst.
Measured stalls larger than that are generally dominated by OS-related
causes (scheduling, swapping, file-system lockups, power management,
etc.) that have not been tuned out, and which would affect any process
on the system.
Tuning a collector
For GC tuning, the starting
point should be the GC logs. I recommend enabling GC logging
Collecting these basically while running your production
has no overhead and, applications. The overhead for
assuming your application this is minimal and it produces
is already in production, invaluable tuning data.
will provide you with the
best information you can get. If enabling the logs does affect application
60
GENERAL MONITORING AND TUNING ADVICE
The JVM-tuning decisions made in the tuning process utilise the metrics
observed from monitoring garbage collections.
If you have latency spikes outside your acceptable range, try to correlate
these with the GC logs to determine if GC is the issue. It is possible that
other issues may be causing the latency spike, such as a poorly performing
database query, or a bottleneck on the network.
61
THE JAVA GARBAGE COLLECTION MINI-BOOK
package com.conissaunce.gcbook;
import java.util.HashMap;
public class SimpleGarbageMaker {
public static void main(String[] args) {
System.out.println(“InfoQ GC minibook test
program”);
String stringDataPrefx = “InfoQ GC minibook test”;
{
/**
* Using HashMap
*/
HashMap stringMap = new HashMap();
for (int i = 0; i < 5000000; ++i) {
String newStringData = stringDataPrefx +
“ index_” + i;
stringMap.put(newStringData,
String.valueOf(i));
}
System.out.println(“MAP size: “ + stringMap.
size());
for (int i = 0; i < 4000000; ++i) {
String newStringData = stringDataPrefx +
“ index_” + i;
stringMap.remove(newStringData);
}
System.out.println(“MAP size: “ +
stringMap.size());
System.gc();
}
}
}
62
GENERAL MONITORING AND TUNING ADVICE
Compile and run the program. The system.out log should show this:
InfoQ GC minibook test program
MAP size: 5000000
MAP size: 1000000
The numbers before and after the first arrow (e.g. “66048K->10736K”
from the first line) indicate the combined size of live objects before and
after garbage collection, respectively. The next number, in parentheses,
is the committed size of the heap: the amount of space usable for Java
objects without requesting more memory from the operating system.
So in this case:
• 66,048 kB live objects used before GC, 10,736 kB after GC. The
committed size of the heap is 76,800 kB.
• 66,048 kB heap used before GC, 45,920 kB after GC, and total heap
size is 251,392 kB.
63
THE JAVA
JAVA GARBAGE COLLECTION MINI-BOOK
CMS
As we noted in Part 2, CMS makes a full GC event less frequent at the
expense of reduced throughput. For this test, switching to CMS reduced
the number of full GC events to a single one, but with 22 pauses.
The log file includes details of the various phases of the collection process.
It is similar but not identical to the Parallel collector.
64
GENERAL MONITORING AND TUNING ADVICE
Marking took a total 1.632 seconds of CPU time and 4.525 seconds of
wall time (that is, the actual time that step took to execute its assigned
tasks).
Next we have the pre-cleaning step, which is also a concurrent phase. This
is where the collector looks for objects that got updated by promotions
from young generation, along with new allocations and anything that got
updated by mutators, whilst it was doing the concurrent marking.
Start of pre-cleaning:
2015-03-07T22:36:31.589+0000: 5.231: [CMS-concur-
rent-preclean-start]
End of pre-cleaning:
2015-03-07T22:36:32.166+0000: 5.808: [CMS-concur-
rent-preclean: 0.530/0.530 secs] [Times: user=1.11
sys=0.05, real=0.58 secs]
Concurrent pre-cleaning took 0.530 seconds of total CPU time and the
same amount of wall time.
This is a stop-the-world
s top-the-world phase:
2015-03-07T22:36:32.181+0000: 5.823: [Rescan (paral-
lel) , 0.0159590 secs]2015-03-07T22:36:32.197+0000:
5.839: [weak refs processing, 0.0000270 secs]2015-
03-07T22:36:32.197+0000: 5.839: [scrub string table,
0.0001410 secs] [1 CMS-remark: 1201997K(1203084K)]
1360769K(1509772K), 0.0161830 secs] [Times: user=0.07
sys=0.00, real=0.01 secs]
65
THE JAVA GARBAGE COLLECTION MINI-BOOK
This phase rescans any residual updated objects in CMS heap, retraces
from the roots, and also processes Reference objects. Here, the rescanning
work took 0.0159590 seconds and weak reference-object processing
took 0.0000270 seconds. This phase took a total of 0.0161830 seconds to
complete.
2015-03-07T22:36:32.197+0000: 5.840: [CMS-concurrent-
sweep-start]
The CMS log file may also show a number of error conditions. Some of
the most common are:
66
GENERAL MONITORING AND TUNING ADVICE
Garbage First
Repeating the test with G1 had 17 young collections, i.e. collections from
Eden and survivor regions, and one full collection. The G1 log file starts
like this:
2015-03-08T08:03:05.171+0000: 0.173: [GC pause (young)
Desired survivor size 1048576 bytes, new threshold 15
(max 15)
, 0.0090140 secs]
As with the CMS log, the G1 log breaks out each step and provides a great
deal of information.
Parallel time is the total elapsed time spent by all the parallel GC worker
threads. The indented lines indicate the parallel tasks performed by these
worker threads in the total parallel time, which in this case is 8.7 ms.
67
THE JAVA
JAVA GARBAGE COLLECTION MINI-BOOK
Ext Root Scanning gives us the minimum, average, maximum, and diff
of the times all the worker threads spent scanning the roots (globals,
registers, thread stacks, and VM data structures).
s tructures).
Then:
• Scan RS: The time each worker thread had spent in scanning the
remembered sets.
68
GENERAL MONITORING AND TUNING ADVICE
• Clear CT: This is the time spent in clearing the card table. This task
is performed in serial mode.
• Other: Total time spent performing other tasks. These are then
broken out as individual subtasks:
Finally, the Eden line details the heap size changes caused by the
evacuation pause. This shows that Eden had an occupancy of 14 MB and
its capacity was also 14 MB before the collection. After the collection, its
occupancy got reduced to 0 MB (since everything is evacuated/promoted
from Eden during a collection) and its target size shrank to 10 MB. The
new Eden capacity of 10 MB is not reserved at this point; rather, this
value is the target size of Eden. Regions are added to Eden as the demand
is made and when it reaches the target size, it starts the next collection.
Like CMS, G1 can fail to keep up with promotion rates, and will fall
back to a stop-the-world, full GC. Like CMS and its “concurrent mode
failure”, G1 can suffer an evacuation failure, seen in the logs as “to-space
69
THE JAVA GARBAGE COLLECTION MINI-BOOK
overflow”. This occurs when there are no free regions into which objects
can be evacuated, which is similar to a promotion failure.
If the problem persists, increasing the size of the heap may help. If you
can’t do this for some reason then you’ll need to do some more analysis:
1. If you notice that the marking cycle is not starting early enough
for G1 GC to be able to reclaim the old generation, drop your
-XX:InitiatingHeapOccupancyPercent . The default for this is
45% of your total Java heap. Dropping the value will help start the
marking cycle earlier.
4. If the availability of “to” space survivor region is the issue, thenincrease the
-XX:G1ReservePercent . The default is 10% of the Java heap. A false
70
GENERAL MONITORING AND TUNING ADVICE
Balanced
IBM recommends that you restrict tuning Balanced to adjusting the size
of Eden using Xmnx<size> to set its maximum size. As a general rule,
for optimal performance the amount of data surviving from Eden space
in each collection should be kept to approximately 20% or less. Some
systems might be able to tolerate parameters outside these boundaries,
based on total heap size or number of available GC threads in the system.
If you set the size of Eden too small, your system may pause for GC
more frequently than it needs to, reducing performance. It will also have
consequences if there is a spike in workload and the available Eden space
is exceeded. Conversely, if Eden is too large, you reduce the amount of
memory available to the general heap. This forces the Balanced collector
to incrementally collect and defragment large portions of the heap in each
partial collection cycle in an effort to keep up with demand, resulting in
long GC pauses.
71
THE JAVA GARBAGE COLLECTION MINI-BOOK
tal=”10628366336” percent=”82”>
<mem type=”Eden” free=”0” total=”671088640” per-
cent=”0” />
<numa common=”10958264” local=”1726060224” non-lo-
cal=”0” non-local-percent=”0” />
<remembered-set count=”352640” freebytes=”422080000”
totalbytes=”424901120”
percent=”99” regionsoverowed=”0” />
</mem-info>
</gc-start>
<allocation-stats totalBytes=”665373480” >
<allocated-bytes non-tlh=”2591104” tlh=”662782376” ar-
rayletleaf=”0”/>
<largest-consumer threadName=”WXYConnec-
tion[192.168.1.1,port=1234] ”
threadId=”0000000000C6ED00 ” bytes=”148341176” />
</allocation-stats >
<gc-op id=”142” type=”copy forward” timems=”71.024” con-
textid=”139”
timestamp=”2015-03-22T16:18:32.527 ”>
<memory-copied type=”Eden” objects=”171444”
bytes=”103905272”
bytesdiscarded=”5289504” />
<memory-copied type=”other” objects=”75450”
bytes=”96864448” bytesdiscarded=”4600472” />
<memory-cardclean objects=”88738” bytes=”5422432” />
<remembered-set-cleared processed=”315048”
cleared=”53760” durationms=”3.108” />
<fnalization candidates=”45390” enqueued=”45125” />
<references type=”soft” candidates=”2” cleared=”0” en-
queued=”0” dynamicThreshold=”28”
maxThreshold=”32” />
<references type=”weak” candidates=”1” cleared=”0” en-
queued=”0” />
</gc-op>
<gc-op id=”143” type=”classunload” timems=”0.021” con-
textid=”139”
timestamp=”2015-03-22T16:18:32.527 ”>
<classunload-info classloadercandidates=”178” class-
loadersunloaded =”0”
classesunloaded=”0” quiescems=”0.000” setupms=”0.018”
scanms=”0.000” postms=”0.001” />
</gc-op>
<gc-end id=”144” type=”partial gc” contextid=”139” dura-
tionms=”72.804”
timestamp=”2015-03-22T16:18:32.527 ”>
72
GENERAL MONITORING AND TUNING ADVICE
Metronome
Metronome can struggle to keep up with allocation rate. If both the
target utilisation and allocation rate are high, the application can run
out of memory, forcing the GC to run continuously and dropping the
utilisation to 0% in most cases. If this scenario is encountered, you must
choose to decrease the target utilisation to allow for more GC time,
increase the heap size to allow for more allocations, or a combination
of both. The relationship between utilisation and heap size is highly
application dependent, and striking an appropriate balance requires
iterative experimentation with the application and VM parameters.
73
THE JAVA GARBAGE COLLECTION MINI-BOOK
</verbosegc>
74
GENERAL MONITORING AND TUNING ADVICE
The trigger GC events correspond to the GC cycle’s start and end points.
They’re useful for delimiting batches of heartbeat GC events, which
roll up the information of multiple GC quanta into one summarised
verbose event. Note that this is unrelated to the alarm-thread heartbeat.
The quantumcount attribute corresponds to the amount of GC quanta
rolled up in the heartbeat GC. The <quantum> tag represents timing
information about the GC quanta rolled up in the heartbeat GC. The
<heap> and <immortal> tags contain information about the free
memory at the end of the quanta rolled up in the heartbeat GC. The
<gcthreadpriority> tag contains information about the priority of the
GC thread when the quanta began.
The quantum time values correspond to the pause times seen by the
application. Mean quantum time should be close to 500 microseconds,
and the maximum quantum times must be monitored to ensure they fall
within the acceptable pause times for the real-time application. Large
pause times can occur when other processes in the system preempt the GC
and prevent it from completing its quanta and allowing the application
to resume, or when certain root structures in the system are abused and
grow to unmanageable sizes.
75
THE JAVA GARBAGE COLLECTION MINI-BOOK
and alarm threads while time is actually taxed to the application because
the GC for that VM is inactive.
1 http://www-01.ibm.com/software/support/isa/
76
GENERAL MONITORING AND TUNING ADVICE
example, it displays the application utilisation over time and inspects the
time taken for various GC phases.
2 https://github.com/chewiebug/GCViewer
3 http://www.jclarity.com/censum/
77
THE JAVA GARBAGE COLLECTION MINI-BOOK
4 http://visualvm.java.net
78
GENERAL MONITORING AND TUNING ADVICE
The Classes view displays a list of classes and the number and percentage
of instances referenced by that class. You can view a list of the instances
of a specific class by right-clicking the name and choosing “Show in
Instances View”.
79
THE JAVA GARBAGE COLLECTION MINI-BOOK
The Instances view displays object instances for a selected class. When
you select an instance from the Instance pane, VisualVM displays the
fields of that class and references to that class in the respective panes.
In the References pane, you can right-click an item and choose “Show
Nearest GC Root” to display the nearest GC root object.
5 http://www.azulsystems.com/jHiccup
80
GENERAL MONITORING AND TUNING ADVICE
81
THE JAVA GARBAGE COLLECTION MINI-BOOK
sessions tend to get collected in the young generation. The larger spike
will be an old-generation pause.
82
GENERAL MONITORING AND TUNING ADVICE
doesn’t depend on how much empty memory there is but on how much
live is left.
Between these two extremes, you’ll more or less follow a 1/x curve. This
typically means that doubling the empty memory halves the work the
garbage collector has to do: if your collector is using 50% of the CPU to
do its job, then doubling the empty memory drops that to 25%. Double
it again and it will drop to 12.5%, and so on. This is the most powerful
tool you have for controlling your collector’s consumption of CPU cycles.
83
THE JAVA GARBAGE COLLECTION MINI-BOOK
When the heap grows or shrinks, the JVM must recalculate the sizes of
the old and new generations to maintain a predefined NewRatio ; server-
side applications can sometimes have the values of -Xms and -Xmx set
equal to each other for a fixed heap size.
The following are important guidelines for sizing the Java heap:
• The old generation must typically be larger than the new generation.
84
GENERAL MONITORING AND TUNING ADVICE
larger heap-size to live-set ratio. Given memory’s relatively low cost and
the huge memory spaces available to today’s systems, footprint is seldom
an issue on the server side.
Survivor ratio
The SurvivorRatio parameter controls the size of the two survivor
spaces. For example, -XX:SurvivorRatio=6 sets the ratio between each
survivor space and Eden to be 1:6, so each survivor space will occupy one
eighth of the young generation.
“Adhering to this principle helps reduce the number and frequency of full garbage
collections experienced by the application. Full garbage collections typically have
the longest duration and as a result are the number one reason for applications
not meeting their latency or throughput requirements”.
Conclusion
GC tuning can become a highly skilled exercise that often requires
application changes to reduce object allocation rates or object lifetimes.
If this is the case, then a commercial trade-off between time and resource
spent on GC tuning and application changes versus purchasing one of
6 http://www.amazon.co.uk/Java-Performance-Addison-Wesley-Charlie-Hunt/
dp/0137142528
85
THE JAVA GARBAGE COLLECTION MINI-BOOK
“This is the hell that the JVM developers created, giving you options so you
could paint yourself into a corner. We left the Java developers alone. I’m sorry.”
— Eva Andreasson, “Garbage Collection Is Good”, QCon London 20147
7 http://www.infoq.com/presentations/garbage-collection-benefits
86
PART
FIVE
Programming for
less garbage
1. Live-set size: The amount of live data in the heap will generally
determine the amount of work a collector must do in a given cycle.
When a collector performs operations that depend on the amount of
live data in the heap during stop-the-world pauses, the application’s
responsiveness becomes sensitive to the live-set size. Operations
that tend to depend on live-set size are marking (in mark/sweep or
mark/compact collectors), copying (in copying collectors), and the
remapping of references during heap compaction (in any compacting
collector).
88
PROGRAMMING FOR LESS GARBAGE
There are a number of techniques that can help reduce the frequency of
GC events. Using a profiler can help identify the hot methods in your
application that may be worth examining for code that can be optimised
for this purpose. There are a few basic principles to keep in mind.
89
THE JAVA GARBAGE COLLECTION MINI-BOOK
However, for projects that have latency performance goals that are less
extreme, tuning at a code level, as opposed to adjusting the JVM GC
values, can be worthwhile.
1 https://groups.google.com/forum/#!msg/mechanical-sympathy/jdIhW0TaZQ4/
UyXPDGQVVngJ
90
PROGRAMMING FOR LESS GARBAGE
Using primitives
The primitive data types in Java use memory that does need to be
reclaimed, but the overhead of doing so is smaller: it is reclaimed when
holding the object and so has no additional impact. For example, an object
with just one instance variable containing an int is reclaimed in one object
reclaim. If the same object holds an Integer, the garbage collector needs to
reclaim two objects. Moreover, temporary primitive data types exist only
on the stack and do not need to be garbage collected at all.
Similarly, you can hold a Date object as an int (or long), thus creating
one less object and saving the associated GC overhead. Of course, this is
another trade-off since those conversion calculations may take up more
time.
91
THE JAVA GARBAGE COLLECTION MINI-BOOK
This is being worked on for a future version of Java but until then, there
are some libraries that provide primitive trees, maps, and lists for each of
Java’s primitive types. Trove2 is one example that I’ve used and found to
be particularly good. The GS Collections3 framework that Goldman Sachs
open sourced in January 2012 also has support for primitive collections as
well as optimised replacements for the standard JDK collections classes
like ArrayList , HashSet, and HashMap.
Array-capacity planning
The convenience of Java’s dynamic collections, such as ArrayLists ,
make it easy to overuse them. ArrayLists , HashMaps , and TreeMaps
are implemented using underlying Object[] arrays. Like Strings
(which are wrappers over char[] arrays), array size is immutable.
int x = 20;
2 http://trove.starlight-systems.com
3 https://github.com/goldmansachs/gs-collections
92
PROGRAMMING FOR LESS GARBAGE
The value of x determines the size of the ArrayList once the loop has
finished, but this value is unknown to the ArrayList constructor which
therefore allocates a new Object[] array of default size. Whenever the
capacity of the internal array is exceeded, it is replaced with a new array
of sufficient length, making the previous array garbage.
To avoid this, whenever possible allocate lists and maps with an initial
capacity:
List<MyObject> items = new ArrayList<MyObject>(len);
93
THE JAVA GARBAGE COLLECTION MINI-BOOK
If you are writing your own parser, writing your own string-handling
classes could be worthwhile. At the very least, use StringBuffer in
preference to a string-concatenation operator (+), and be aware of which
methods alter objects directly as opposed to making copies and which
ones return a copy of the object. For example, any String method that
changes the string, such as String.trim(), returns a new string object,
whilst a method like Vector.setSize() does not. If you do not need a copy,
use (or create) methods that do not return a copy of the object.
Weak references
Back in Part 1, we introduced the heap and pointers with a fairly simplified
model, and stated that garbage collection works through following chains
of pointers. Things do get a little more complicated than this, however.
One thing to keep in mind is that a reference may be either strong or
weak. A strong reference is an ordinary Java reference. A line of code
such as:
StringBuffer buffer = new StringBuffer();
Elsewhere in your code, you can use weakWidget.get() to get a strong
reference to the actual widget object. Since the weak reference itself
isn’t strong enough to prevent garbage collection, you may find that the
widget.get() suddenly starts returning null. Once a WeakReference
starts returning null, the object it pointed to has become garbage and
94
PROGRAMMING FOR LESS GARBAGE
95
THE JAVA GARBAGE COLLECTION MINI-BOOK
associated object, it will return null, which means they can be instantly
cleaned up.
Try-with-resources
Java 7 introduced the try-
with-resources statement. A try-with-resources statement can
This is somewhat analogous have catch and finally blocks just
to C#’s ”using” statement like an ordinary try statement. In a
or the approach used try-with-resources statement, any
in C++ where the class catch or finally block is run after the
implementor would define resources declared have been closed.
a destructor function that
performs the cleanup whenever an object of that class goes out of scope.
The advantage of this approach is that the user of the object can’t forget to
clean it up — the destructor gets called automatically, even if an exception
is thrown. This approach is known by the frankly terrible name of RAII,
for “resource acquisition is initialisation”.
public class tryWithResources {
int data = input.read();
while(data != -1){
System.out.print((char) data);
96
PROGRAMMING FOR LESS GARBAGE
data = input.read();
}
}
}
}
Distributing programs
A widely used technique amongst enterprise Java teams is to distribute
programs. This can both keep the heap size smaller, making the pauses
shorter, and allow some requests to continue whilst others are paused.
In certain specific situations, this may be the correct thing to do from
an engineering standpoint as well — the example I cited in the preface,
of a web application that was also required to perform ad hoc batch-job-
type functions, absolutely required breaking into parts and distributing
in order for it to work.
97