Investigating Java Heap memory and Native memory leaks
What is a memory leak?
Wikipedia definition: In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in a way that memory that is no longer needed is not released. A memory leak may also happen when an object is stored in memory but cannot be accessed by the running code. A memory leak has symptoms similar to several other problems and generally can only be diagnosed by a programmer with access to the program’s source code.
In this article, we discuss a memory leak issue we had in one of our JAVA applications where we used the SDK library built by third-party security providers. The library was built using C++, so we used JNI (Java Native Interface) to interact between JAVA and C++ code.
Before we dive into the issue discussion and understanding, let’s go through the JAVA memory model and JNI to understand them better.
JAVA Memory model
Generally speaking, the JVM Memory Model is the architecture that defines the interaction of various run-time data areas used during the execution of a program:
- Heap: The heap area represents the runtime data area, from which the memory is allocated for all class instances and arrays; it is created during the virtual machine startup. This heap is divided into further smaller regions for performance and effective ways to handle garbage collection.
EDEN space: This is the space where all the objects are created.
Survivor space: Objects which survived the multiple Garbage collection cycles in the EDEN space will be moved here.
Old Generation: Objects which survived multiple garbage collection cycles in the survivor space will be moved to the Old generation space and these objects are known as long-lived objects.
- Stack: Each of the JVM threads has a private stack created at the same time as that of the thread. The stack stores frames. A-frame is used to store data and partial results and to perform dynamic linking, return values for methods, and dispatch exceptions.
- Metaspace: Place where all the class files are stored.
- Code cache: Java runnable is bytecode, but during the run time JVM(JIT) converts this bytecode to machine instructions, once JVM figures out the best path for any particular code it will cache those instructions for further usage avoiding bytecode to instruction conversion further.
- Shared libs: The shared library region enables address spaces to share dynamic link library (DLL) files. All the native code which is not written as JAVA will be loaded here(.so files in UNIX).
How JAVA allocates memory
Every time we create a new object using the new
keyword, the JVM allocates memory by calling the malloc()
function from the dynamic memory allocator library GLIBC/LIBC(UNIX). The latter internally makes sbrk/brk/mmap system calls to allocate a new segment or to extend/enlarge an existing segment of the memory.
As JAVA is making these memory allocation calls using new
keywords, it keeps track of all the allocated objects' detail(references). This bookkeeping data is then used by Garbage collectors to free memory by removing unused objects.
More details about Garbage collection can be found in Plumbr GC Handbook
JAVA application memory details using YourKit profiler
Heap Memory leak
Heap memory leak in Java refers to the holding on to unwanted memory; if this memory keeps growing, it slows down the application because:
- There is not enough memory to allocate new objects in heap.
- As the memory increases, the garbage collector tries to free some memory by collecting un-reference objects.
As the memory usage increases, the application will eventually go out of memory and will throw the OutOfMemory exception and terminate the process.
How to find a memory leak
In JVM, all the objects are created in the EDEN space. As they survive the garbage collection cycles, they get promoted to the OLD generation.
Objects that have survived multiple (configured) GC cycles are moved to the old generation heap block.
If there was a memory leak in the JAVA code, we should see an increase in OLD generation memory in the HEAP — see the screenshot below that shows such increases in the application memory. The screenshot also implies that something is holding the memory.
If we display only the memory of OLD Gen space, the memory increase is getting even clearer:
Showing same Old gen heap details using YourKit
Also, as the memory increases, the garbage collector will spend more time to free up some of the memory for usage, which will make the application threads stop during the garbage collection. Garbage collection, in its turn, runs quite often because of memory getting full, so the user requests keep spiking. The garbage collector details are shown below:
How to determine which part of the code is holding on to the memory
We can take the heap dump of the memory to check for the dominating objects and the reference to those objects, which will tell us from where these objects are held.
The application used for the demo is built to have a memory leak where it stores items in an unbounded list.
Below you can see how the heap dump looks (Yourkit is used here to analyze the heap dump):
As you can see from the example application, the DataStore
has a list that stores the User
, which keeps growing.
We have allocated 1000MB as max heap Xmx1000m
, so as the memory keeps increasing, the application will eventually terminate with OutOfMemory
error, as shown below:
So, below is one way to avoid this:
- We can limit the number of items.
- A
Set
can be used instead of aList
to avoid duplicates.
Native memory leak
The JNI enables programmers to write native methods to handle situations when an application cannot be written entirely in the Java programming language.
For example, if we have a library written in C or C++, we can generate the .os
or .dylib
and call it from Java using a native
keyword.
More details about JNI can be found here: Oracle JNI documentation
In one of our applications, we use SDK from our partner to generate a DRM license which is written in C/C++. Since our applications are written in JAVA, we get the .so
file from our partner and we interact with this native code from Java using the native
keyword.
The problem with using native code in Java applications is that the memory allocation in the native code is not handled by the JVM.
The JVM does not keep the record of memory allocation in the native code; as a result, the application memory will grow, but the HEAP increase won’t be shown, as this memory allocation is not recorded by the JVM.
Below we provide an example to demonstrate this issue and our method to profile the memory leak in native code. We are using a simple Spring-boot application that has a Rest endpoint. When a user sends a request to this endpoint, the JAVA code calls an underlying code from the .so
machine code library generated from C++ code.
We can provide our native code to the JAVA application in two manners:
- Using
LD_LIBRARY_PATH
, becauseLD_LIBRARY_PATH
is the predefined environmental variable in Linux/Unix which sets the path which the linker should look into while linking dynamic libraries/shared libraries. Example:
LD_LIBRARY_PATH=”{path_to_os_files}” java -jar application.jar
- Loading the library using a system loader at the application startup. Example:
java -jar -Djava.library.path={path_to_so_file} application.jar#And in the application load the .so file.static {
System.loadLibrary(“datastore”);
}
For more details about how to call native code from java using native
keyword, there is a well-explained article that you can follow: Call C++ code from JAVA
For our native memory leak finding, we will use our example code, which accepts HTTP requests and forwards that request to underlying C++ code using native
calls.
This C++ code leaks the memory, and we will see how to find this using the tools at our exposure.
Let’s run the application and see how it behaves:
java -jar -Djava.library.path={path_to_so_file} -Xmx500m -Xms250m application.jar
The application memory is increasing steadily, but when we look at the heap details we do not see it is increasing, as shown below:
As we can see, the OLD generation is stable, which means objects are garbage collected after they are created in Eden space and survivor space after some GC cycles.
When we saw this behavior, we were sure we do not have a memory leak in the Java code. Yet, we saw that the overall process memory was still increasing, so we concluded that there was a memory leak, though not in the JVM heap, but somewhere in the native memory.
Once we were confident we have a memory leak in the native code, we started checking our code base for the code which creates objects (memory) that is not accounted for by the JVM. Then we realized we use SDK from a third party DRM vendor for our flow, and this SDK is written in C++ and we use JNI to interact between C++ code and Java code,
So, we needed to find where exactly in the C++ native code the memory was leaking.
To figure out the native leak, we found a very helpful library JEMALLOC
which provides a detailed view of the whole process of how memory is allocated and which block of the code is allocating more memory and holding it. For more details, see jemalloc leak checking
By default, UNIX uses glibc/libc
library to allocate memory using malloc
that calls system-cal internally.
So at first, we needed to build the jemalloc
lib by taking the source code from GitHub jemalloc GitHub
Once libc
is replaced jemalloc
for memory allocation, we can instruct the latter to keep profiling the memory while running our process, as shown below:
# we export our libjemalloc as shared library using LD_PRELOAD export LD_PRELOAD=/usr/local/lib/libjemalloc.so # configuration to tell jemalloc to write heap details as file at regular interval in given location
export MALLOC_CONF=prof:true,lg_prof_interval:31,lg_prof_sample:17,prof_prefix:/tmp/heap/jeprof# Running application
java -Djava.library.path={so_lib_folder_path} -Xmx500m -Xms250m -jar application.jar
Since we instruct the jemalloc
to dump the heap details every 31 seconds, it will create a heap details file every 31 seconds.
We can use jeprof
to generate reports using these heap details files created, as follows:
#!/bin/bash
jeprof --svg /tmp/heap/jeprof.$1.* > /tmp/heap/$1-report.svg 2>/dev/null
jeprof --text /tmp/heap/jeprof.$1.* > /tmp/heap/$1-report.txt 2>/dev/null
Now we can check these reports to figure out where is the memory leak.
The graph and the text document prove that most of the memory of the process is created and held in Java_example_data_CppDataStore_storeData
(99.5%), and this method is not part of the JAVA code. Because of this, we cannot see the JVM HEAP increase, as this method is from the underlying C++ library, and the memory leak is from C++ shared library.
The C++ code of this method shows that it takes a string coming from JAVA, converts it to cpp_string,
and then creates a MyDataStore
object. However, the code does not delete/free the cpp_string
and MyDataStore
objects created in each call, which is also shown as MyDataStore::MyDataStore
(77.1%) in the jemalloc report.
JNIEXPORT jlong JNICALL Java_example_data_CppDataStore_storeData (JNIEnv * env, jobject thisObject, jstring data) {
const char *char_string = env->GetStringUTFChars(data, NULL);
std::string cpp_string = std::string(char_string);
env->ReleaseStringUTFChars(data, char_string);
MyDataStore *d = new MyDataStore(cpp_string);
return 1;
}
We can be sure that fixing these issues will make the application stable.
An important point to remember: If we have a stateful object created in C++ through JAVA and we started setting a state to it through JAVA using multiple native method calls, once we have done using the object, we need to clean it by providing another native method.
Example: In our case, we were doing the below steps
1. Create stateful object in C++ using a native method and return the object to JAVA
2. Set some data to the object in C++ by calling the second native method
3. Once we are done with using the object, we should call the terminate native method of C++ which delete/free the memory used by the session object.
So, as you can see, it is a bit hard to find the native memory leaks which are outside the JVM managed memory(HEAP), but tools like jemalloc
will help you to find these issues and fix them.
Resources used for the demo and investigation
The example code used for both JVM and native memory leak scenarios are present in the Github repository Memory Leak Demo Example Github
To generate load to the application we used a lightweight load generator: Hey Load Generator
Profilers used: