Java Profiling, continued
admin originally posted this on Wayne Adams' Blog.
I've been building a Java profiler lately, partly to address what I think are shortcomings in the current set of free/shareware profilers, and also to enjoy the flexibility of obtaining only the profiling information I need. If you've spent much time with profilers, you already know they can have quite an impact on the application they're profiling! My goal with my own profiler is to obtain a narrow set of information, having more control over the size and frequency of the data collection.One feature I really like about hprof (see http://java.sun.com/developer/technicalArticles/Programming/HPROF.html) is the heap=sites option. If you're not familiar with this option, it is described in the above link, but in short, hprof can provide a profile of the allocation of objects per site, where a site is a unique Java stack trace of a fixed depth. This can be very handy information when you are investigating the heap usage of an allocation, but it is extremely costly to get with hprof (in one recent case I was investigating, starting an application with the hprof agent slowed the application down by several orders of magnitue).
Today I'll discuss how I get this same information without the overhead of hprof. I'll build on my last blog entry about the java.lang.instrument package and combine bytecode injection with the ThreadMXBean to get this same information.
The basic outline is:
- Create a java.lang.instrument profiling agent.
- Intercept all constructors as they are loaded, instrumenting them on constructor entry and constructor exit. Information passed back to the profiler application will include the thread ID and the size of the object created.
- In the profiling agent, keep track of the total memory allocated for each object type, as well as a list of stack trace signatures at object instantiation, with frequency of each. The stack trace is obtained via the ThreadMXBean of the JVM.
- Rank objects by total amount of memory allocated, the set of unique stack trace signatures for each constructor, and the number of times the object was instantiated via each stack trace.
This information will approximate the heap=sites output of hprof, and can be obtained more economically for a number of reasons:
- We can restrict the classes (constructors) instrumented. Note that if we do not instrument every class, we may not see every instance of a particular object's instantiation, but in practice what you usually care about are objects instantiated by your application. Learning in hprof that 20 MB of java.lang.Objects were created as a side effects of a java.* method call usually doesn't help diagnose performance problems.
- The custom profiler can get this information in a very small fraction of the same time that hprof would require to get it, as it is a much more targeted application.
- With the custom profiler, we have a lot more control over the interval during which the data is collected. For example, we can clear our profiler's data, initiate an operation in the profiled application, and see just the allocation history for the time period of interest. I actually do not know how to obtain this information with hprof; the heap=sites option dumps its summary at application termination (as a snapshot of the heap at that point in time), which may not be what you want.
Object size is a pesky issue in Java, especially if you have experienced the convenience of the sizeof() operator. I use the java.lang.instrument.Instrumentation instance, supplied by the JVM in the call to the profiler's premain() method, to get the size of the objects (using getObjectSize(Object)). It isn't exactly clear what size this method returns; the javadocs are a little cagey and it's safe to safe this value isn't a deep copy. While it is possible to drill down through object references in a class and attempt to accumulate a deep-copy value, this operation itself can be very time-consuming, incomplete (due to inaccessible attributes) and slightly misleading (static references, anyone?). Not to mention you can easily get a stack overflow error in the process. It may not be perfect, but I have chosen to simply use the value returned by Instrumentation.getObjectSize(Object), and to keep in mind that what I'm seeing is useful in a rough, rather than a precise, sense.
Since the profiler runs in the same JVM as the profiled application, it is fairly simple to get a reference to the JVM ThreadMXBean, using java.lang.management.ManagementFactory.getThreadMXBean(). Given a thread ID (which you can arrange for your instrumented constructor to pass back to your profiler), you can get the stack trace for the thread with ThreadMXBean.getThreadInfo(long,int). Note that there are several signatures for this method call; remember to pick the one where you specify the maximum depth of the stack trace, as the default value is zero!
The value of collecting information like this in a profiler should be self-evident; however, there are greater opportunities available for this data. For example, a variety of profiling information can be used to generate events which can be used to monitor the health of the profiled system. These don't all need to be just events generated by the profiler; they could include SCM and system-maintenance events, with the result being that maybe you could see that your ASN.1 parser suddenly takes three times as long to parse a switch file since the last change was checked in and rolled out to production. In a later posting, I'll look at how such a system could be implemented, looking at the open-source tools available to analyze this kind of data and make associations between events that can provide much faster resolution of operational and performance issues.