/ Java EE Support Patterns: CPU
Showing posts with label CPU. Show all posts
Showing posts with label CPU. Show all posts

5.02.2012

Java High CPU troubleshooting guide – part 1

This article is part 1 of a series that will provide you with a comprehensive guide on how you can troubleshoot and identify root cause of Java high CPU problems.

This guide is also applicable for standalone Java programs but designed to help individuals involved in day to day Java EE enterprise production support. It will also include the list of the most common high CPU problems along with high level resolution strategies.

Production problems resolution mindset review

Before we go any further, it is important to review your production problem resolution mindset. One of the most common “reflexes” that I have seen in my experience with Java EE production support teams is that Java VM / middleware restart is often the first recovery action that is performed. While a premature restart can quickly eliminate the business impact, it can also prevent you to get all the technical facts; reducing your capability to identify the root cause and exposing the platform to future re-occurrences of the problem.

Before pulling the trigger and shutdown your Java VM process, ask yourself the following question: do I have all the data available to perform a root cause analysis post restart? If the answer is no then my recommendation to you is to review and improve your current platform monitoring and / or troubleshooting approaches. Proper performance data gathering before and during a high CPU problem is critical.

Java high CPU – what is it exactly?

Now back to our original topic, a high CPU problem is defined by an observation of one or many Java VM processes consuming excessive CPU utilization from your physical host(s). Excessive CPU can also be described by an abnormal high CPU utilization vs. a known & established baseline. Ex: if the average CPU utilization of your Java VM under peak load condition is 40% then excessive CPU threshold can be set around 80%.


A typical Java VM process contains several Java Threads, some waiting to do work and others currently executing tasks. The # of Threads can be very low in the event of a single Java program and very high for Java EE enterprise platforms processing heavy concurrent transactions.

In order to understand and identify the source of high CPU of one or many of your Java processes, you will need to understand and perform a full breakdown of all Threads of your Java VM so you can pinpoint the biggest contributors. This analysis exercise can be visualized as per below diagram.


Understand your average CPU utilization

As I mentioned in the earlier section, it is very important that you understand your current average CPU utilization, what I refer as a baseline. This is crucial data that needs to be monitored on a regular basis as part of a comprehensive and ongoing platform capacity planning strategy.

Proper understanding and tracking of your average and “healthy” CPU utilization observed from your Java VM processes will allow you to quickly detect abnormal CPU surge scenarios and correlate with possible root causes (problem introduced by a project, unexpected load increase etc.). Finally, this will provide you with proper threshold values necessary to configure pro-active CPU related alerts using monitoring tools of your choice.

Understand your production environment and available tools

As the middleware and / or application support prime, you really need to understand your production environment, including the out-of-the-box tools available for you to perform low level troubleshooting tasks. This may be trivial for some individuals but if you just recently started to work on a new Java or Java EE platform for a new client, my recommendation is that you should spend enough time understand your client’s environment specifications & business situation as per below:

-        Physical & virtual host configuration and capacity (total # of assigned CPU cores, RAM etc.)
-        OS vendor, version and patch level
-        Middleware vendor, versions and patch level
-        Java vendor & versions (including 32-bit vs. 64-bit); including patch level
-        Third party API’s used within the Java or Java EE applications
-        Existing monitoring tools that you can leverage for historical data and trend analysis
-        History of the environment, known issues, resources utilization etc.
-        Business traffic breakdown per application along with average and peak traffic level of the platform; including peak business periods

Collecting all proper facts as per above will definitely help your root cause analysis process going forward; including for high CPU related problems.

Your homework before jumping to the part 2

Before we jump into the part 2 of this CPU troubleshooting guide, I highly recommend that you study and understand each the following articles below. Focus on the ones applicable for your environment. Each of these articles will provide you with a technical step by step on how to breakdown CPU per Thread for a Java VM; a key troubleshooting skill to acquire in order to investigate Java CPU related problems. This technique is common on some aspects and specific depending of the OS.

# CPU per Thread analysis on Solaris

# CPU per Thread analysis on Linux

# CPU per Thread analysis on AIX

# CPU per Thread analysis on Windows

I hope this article has provided with a good starting point on Java CPU problems. The part 2 will provide you with the troubleshooting guide which will include flow diagrams allowing you to choose the right investigation path depending of your problem case.

Please do not hesitate to post any comment or question.

4.25.2012

Java Thread CPU analysis on Windows

This article will provide you with a tutorial on how you can quickly pinpoint the Java thread contributors to a high CPU problem on the Windows OS. Windows, like other OS such as Linux, Solaris & AIX allow you to monitor the CPU utilization at the process level but also for individual Thread executing a task within a process.

Tutorials are also available from some other OS as per below:
For this tutorial, we created a simple Java program that will allow you to learn this technique in a step by step manner.

Troubleshooting tools

The following tools will be used below for this tutorial:

-        Windows Process Explorer (to pinpoint high CPU Thread contributors)
-        JVM Thread Dump (for Thread correlation and root cause analysis at code level)

High CPU simulator Java program

The simple program below is simply looping and creating new String objects. It will allow us to perform this CPU per Thread analysis. I recommend that you import it in an IDE of your choice e.g. Eclipse and run it from there. You should observe an increase of CPU on your Windows machine as soon as you execute it.

package org.ph.javaee.tool.cpu;

/**
 * HighCPUSimulator
 * @author Pierre-Hugues Charbonneau
 * http://javaeesupportpatterns.blogspot.com
 *
 */
public class HighCPUSimulator {
      
       private final static int NB_ITERATIONS = 500000000;
      
       // ~1 KB data footprint
       private final static String DATA_PREFIX = "datadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadata";

      
       /**
        * @param args
        */
       public static void main(String[] args) {
            
             System.out.println("HIGH CPU Simulator 1.0");
             System.out.println("Author: Pierre-Hugues Charbonneau");
             System.out.println("http://javaeesupportpatterns.blogspot.com/");

             try {


                    for (int i = 0; i < NB_ITERATIONS; i++) {
                          
                           // Perform some String manipulations to slowdown and expose looping process...
                           String data = DATA_PREFIX + i;                
                          
                    }

             } catch (Throwable any) {
                    System.out.println("Unexpected Exception! " + any.getMessage()
                                        + " [" + any + "]");
                   
             }

             System.out.println("HighCPUSimulator done!");
       }

}

Step #1 – Launch Process Explorer

The Process Explorer tool visually shows the CPU usage dynamically. It is good for live analysis. If you need historical data on CPU per Thread then you can also use Windows perfmon with % Processor Time & Thread Id data counters. You can download Process Explorer from the link below:
http://technet.microsoft.com/en-us/sysinternals/bb896653

In our example, you can see that the Eclipse javaw.exe process is now using ~25% of total CPU utilization following the execution of our sample program.



Step #2 – Launch Process Explorer Threads view

The next step is to display the Threads view of the javaw.exe process. Simply right click on the javaw.exe process and select Properties. The Threads view will be opened as per below snapshot:



-        The first column is the Thread Id (decimal format)
-        The second column is the CPU utilization % used by each Thread
-        The third column is also another counter indicating if Thread is running on the CPU

In our example, we can see our primary culprit is Thread Id #5996 using ~ 25% of CPU.

Step #3 – Generate a JVM Thread Dump

At this point, Process Explorer will no longer be useful. The goal was to pinpoint one or multiple Java Threads consuming most of the Java process CPU utilization which is what we achieved. In order to go the next level in your analysis you will need to capture a JVM Thread Dump. This will allow you to correlate the Thread Id with the Thread Stack Trace so you can pinpoint that type of processing is consuming such high CPU.

JVM Thread Dump generation can be done in a few manners. If you are using JRockit VM you can simply use the jrcmd tool as per below example:


Once you have the Thread Dump data, simply search for the Thread Id and locate the Thread Stack Trace that you are interested in.

For our example, the Thread “Main Thread” which was fired from Eclipse got exposed as the primary culprit which is exactly what we wanted to demonstrate.

"Main Thread" id=1 idx=0x4 tid=5996 prio=5 alive, native_blocked
 at org/ph/javaee/tool/cpu/HighCPUSimulator.main
   (HighCPUSimulator.java:31)
 at jrockit/vm/RNI.c2java(IIIII)V(Native Method)
 -- end of trace

Step #4 – Analyze the culprit Thread(s) Stack Trace and determine root cause

At this point you should have everything that you need to move forward with the root cause analysis. You will need to review each Thread Stack Trace and determine what type of problem you are dealing with. That final step is typically where you will spend most of your time and problem can be simple such as infinite looping or complex such as garbage collection related problems.

In our example, the Thread Dump did reveal the high CPU originates from our sample Java program around line 31. As expected, it did reveal the looping condition that we engineered on purpose for this tutorial.

       for (int i = 0; i < NB_ITERATIONS; i++) {
                          
                           // Perform some String manipulations to slowdown and expose looping process...
                           String data = DATA_PREFIX + i;                
                          
                    }

I hope this tutorial has helped you understand how you can analyze and help pinpoint root cause of Java CPU problems on Windows OS. Please stay tuned for more updates, the next article will provide you with a Java CPU troubleshooting guide including how to tackle that last analysis step along with common problem patterns.

2.29.2012

HashMap.get High CPU – Case Study

This article will describe the complete root cause analysis and solution of a HashMap High CPU problem (infinite looping) affecting a Weblogic 10.0 environment running on the Java HotSpot VM 1.5.

This case study will again demonstrate this importance of mastering Thread Dump analysis skill and CPU correlation techniques such as Solaris prstat.

Environment specifications

-        Java EE server: Oracle Weblogic Portal 10.0
-        Middleware OS: Solaris 10
-        Java VM: Java HotSpot VM 1.5
-        Platform type: Portal application

Monitoring and troubleshooting tools

-        JVM Thread Dump (HotSpot format)
-        Solaris prstat (CPU contributors analysis)

Problem overview

-        Problem type: High CPU observed from our Weblogic production environment

A high CPU problem was observed from our Solaris physical servers hosting a Weblogic Portal 10 environment. Users also reporting major slowdown of the portal application.

Gathering and validation of facts

As usual, a Java EE problem investigation requires gathering of technical and non-technical facts so we can either derived other facts and/or conclude on the root cause. Before applying a corrective measure, the facts below were verified in order to conclude on the root cause:

·        What is the client impact? HIGH
·        Recent change of the affected platform? No
·        Any recent traffic increase to the affected platform? Yes
·        How does this high CPU manifest itself?  A sudden CPU increase was observed and is not going down; even after load goes down e.g. near zero level.
·        Did an Oracle OSB recycle resolve the problem? Yes, but problem is returning after few hours or few days (unpredictable pattern)

-        Conclusion #1: The high CPU problem appears to be intermittent vs. pure correlation with load
-        Conclusion #2: Since high CPU remains after load goes down, this typically indicates either the presence of some infinite looping or heavy processing Threads

Solaris CPU analysis using prstat

Solaris prstat is a powerful OS command allowing you to obtain the CPU per process but more importantly CPU per Thread within a process. As you can see below from our case study, the CPU utilization was confirmed to go up as high as 100% utilization (saturation level).

## PRSTAT (CPU per Java Thread analysis)
prstat -L -p 8223 1 1

PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME      CPU  PROCESS/LWPID
8223 bea10      2809M 2592M sleep   59    0  14:52:59        38.6% java/494
8223 bea10      2809M 2592M sleep   57    0   12:28:05       22.3% java/325
8223 bea10      2809M 2592M sleep   59    0   11:52:02       28.3% java/412
8223 bea10      2809M 2592M sleep   59    0   5:50:00         0.3% java/84
8223 bea10      2809M 2592M sleep   58    0   2:27:20         0.2% java/43
8223 bea10      2809M 2592M sleep   59    0   1:39:42         0.2% java/41287
8223 bea10      2809M 2592M sleep   59    0   4:41:44         0.2% java/30503
8223 bea10      2809M 2592M sleep   59    0   5:58:32         0.2% java/36116
……………………………………………………………………………………

As you can see from above data, 3 Java Threads were found using together close to 100% of the CPU utilization.

For our root cause analysis, we did focus on Thread #494 (decimal format) corresponding to 0x1ee (HEXA format).

Thread Dump analysis and PRSTAT correlation

Once the culprit Threads were identified, the next step was to correlate this data with the Thread Dump data (which was captured exactly at the same time as prstat).

A quick search within the generated Thread Dump file did reveal the Thread Stack Trace (Weblogic Stuck Thread #125) for 0x1ee as per below.

"[STUCK] ExecuteThread: '125' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=1 tid=0x014c5030 nid=0x1ee runnable [0x536fb000..0x536ffc70]
       at java.util.HashMap.get(HashMap.java:346)
       at org.apache.axis.encoding.TypeMappingImpl.getClassForQName(TypeMappingImpl.java:715)
       at org.apache.axis.encoding.TypeMappingDelegate.getClassForQName(TypeMappingDelegate.java:170)
       at org.apache.axis.encoding.TypeMappingDelegate.getClassForQName(TypeMappingDelegate.java:160)
       at org.apache.axis.encoding.TypeMappingImpl.getDeserializer(TypeMappingImpl.java:454)
       at org.apache.axis.encoding.TypeMappingDelegate.getDeserializer(TypeMappingDelegate.java:108)
       at org.apache.axis.encoding.TypeMappingDelegate.getDeserializer(TypeMappingDelegate.java:102)
       at org.apache.axis.encoding.DeserializationContext.getDeserializer(DeserializationContext.java:457)
       at org.apache.axis.encoding.DeserializationContext.getDeserializerForType(DeserializationContext.java:547)
       at org.apache.axis.encoding.ser.BeanDeserializer.getDeserializer(BeanDeserializer.java:514)
       at org.apache.axis.encoding.ser.BeanDeserializer.onStartChild(BeanDeserializer.java:286)
       at org.apache.axis.encoding.DeserializationContext.startElement(DeserializationContext.java:1035)
       at org.apache.axis.message.SAX2EventRecorder.replay(SAX2EventRecorder.java:165)
       at org.apache.axis.message.MessageElement.publishToHandler(MessageElement.java:1141)
       at org.apache.axis.message.RPCElement.deserialize(RPCElement.java:236)
       at org.apache.axis.message.RPCElement.getParams(RPCElement.java:384)
       at org.apache.axis.client.Call.invoke(Call.java:2467)
       at org.apache.axis.client.Call.invoke(Call.java:2366)
       at org.apache.axis.client.Call.invoke(Call.java:1812)
…………………………………………………………………………………………………………………………………………

Thread Dump analysis – HashMap.get() infinite loop condition!

As you can see from the above Thread Stack Trace, the Thread is currently stuck in an infinite loop over a java.util.HashMap that originates from the Apache Axis TypeMappingImpl Java class.

This finding was quite revealing. The 2 others Threads using high CPU also did reveal infinite looping condition within the same Apache Axis HashMap Object.

Root cause: non Thread safe HashMap in Apache Axis 1.4

Additional research did reveal this known defect affecting Apache Axis 1.4; which is the version that our application was using.

As you may already know, usage of non Thread safe / non synchronized HashMap under concurrent Threads condition is very dangerous and can easily lead to internal HashMap index corruption and / or infinite looping. This is also a golden rule for any middleware software such as Oracle Weblogic, IBM WAS, Red Hat JBoss which rely heavily on HashMap data structures from various Java EE and caching services.

Such best practice is also applicable for any Open Source third party API such as Apache Axis.

The most common solution is to use the ConcurrentHashMap data structure which is designed for that type of concurrent Thread execution context.

Solution

Our team did apply the proposed patch from Apache (synchronize the non Thread safe HashMap) which did resolve the problem. We are also currently looking at upgrading our application to a  newer version of Apache Axis.

Conclusion

I hope this case study has helped you understand how to pinpoint the root cause of high CPU Threads and the importance of proper Thread safe data structure for high concurrent Thread / processing applications.

Please don’t hesitate to post any comment or question.

Find office supplies promo codes to save money for your department's bottom line, so you can spend more on the latest software.

2.03.2012

PRSTAT Linux – How to pinpoint high CPU Java VM Threads

This article will provide you with an equivalent approach, for JVM on Linux OS, of the powerful Solaris OS prstat command; allowing you to quickly pinpoint the high CPU Java VM Thread contributors.

One key concept to understand for a Java VM running on the Linux OS is that Java threads are implemented as native Threads, which results in each thread being a separate Linux process.

Ok thanks for the info but why is this related to prstat?

Well this key concept means that you don’t need a prstat command for Linux. Since each Java VM Thread is implemented as a native Thread, each Thread CPU % can simply be extracted out-of-the-box using the top command.

You still need to generate Thread Dump data of your JVM process in order to correlate with the Linux top command output.

Thanks for this explanation. Now please show me how to do this

Please simply follow the instructions below:

1)     Execute the top command (press SHIFT-H to get the Threads toggle view) or use -H option (to show all Threads) and find the PID associated with your affected  / high CPU WLS process(es) (remember, many may show up since each Java Thread is implemented as a separate Linux process)
2)     Immediately after, generate a few Thread Dump snapshots using kill –3 <PID> of the parent WLS process. Thread Dump provides you with the complete list with associated Stack Trace of each Java Thread within your JVM process
3)     Now, convert the PID(s) extracted from the top command output to HEX format
4)     The next step is to search from the Thread Dump data for a match nid=<HEX PID>
5)     The final step is to analyze the affected Thread(s) and analyze the Stack Trace so you can determine where in the code is the problem (application code, middleware itself, JDK etc.)

Example: top command captured of a Weblogic Server Java Thread running at 40% CPU utilization along with Thread Dump data via kill -3 <PID>

## top output sample 
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
...........
22111 userWLS 9 0 86616 84M 26780 S 0.0 40.1 0:00 java

  • Decimal to HEX conversion of Java Thread (native Thread) 22111 >> 0x565F
  • Now using the HEX value, we can search within the Thread Dump for the following keyword: nid=0x565F

## Thread Dump output sample Thread as per the above search criteria nid=0x565F
"ExecuteThread: '0' for queue: 'default'" daemon prio=1 tid=0x83da550 nid=0x565F waiting on monitor [0x56138000..0x56138870]
  at java.util.zip.ZipFile.getEntry(Native Method)
  at java.util.zip.ZipFile.getEntry(ZipFile.java:172)
  at java.util.jar.JarFile.getEntry(JarFile.java:269)
  at java.util.jar.JarFile.getJarEntry(JarFile.java:252)
  at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:989)
  at sun.misc.URLClassPath$JarLoader.findResource(URLClassPath.java:967)
  at sun.misc.URLClassPath.findResource(URLClassPath.java:262)
  at java.net.URLClassLoader$4.run(URLClassLoader.java:763)
  at java.security.AccessController.doPrivileged(AccessController.java:224)
  at java.net.URLClassLoader.findResource(URLClassLoader.java:760)
  at java.lang.ClassLoader.getResource(ClassLoader.java:444)
  at java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:504)
  ............................................

In the above example, the problem was related to an excessive class loading / IO problem.

As you can see, the approach did allow us to quickly pinpoint high CPU Thread contributor but you will need to spend additional time to analyze the root cause which is now your job.

Need any additional help?

I hope this short tutorial has helped you understand how you can pinpoint high CPU Thread contributors for your JVM running on the Linux OS.

For any question or additional help please simply post a comment or question below this article. You can also email me directly @[email protected].