Find and fix the top java performance problems in seconds

18
How to find and fix the top java performance problems in seconds

Transcript of Find and fix the top java performance problems in seconds

Page 1: Find and fix the top java performance problems in seconds

How to find and fix the top java performance problems in

seconds

Page 2: Find and fix the top java performance problems in seconds

Table of Contents

Introduction 3 Thread deadlock 4 Memory leaks 11 DB pool contention & gridlocks 15 Summary 18

�2

Page 3: Find and fix the top java performance problems in seconds

IntroductionThis article pretends to be a set of tips I have learnt during my life fixing performance problems for big enterprises. I will describe, according to my opinion, a systematical way to find and fix the three most impacting.

According to the article “Top 10 Java Performance Problems” (http://info.appdynamics.com/rs/appdynamics/images/Top_10_Java_Performance_Problems_eBook.pdf) about the top 10 java performance problems, these are the the most common and harmful ones:

- thread deadlocks- memory leaks- slow methods: I will use the problem of database connection pools as an example, but it also

applies to how to find thread gridlocks or the death by 1000 cuts. As a consequence, I will be describing how to find and fix five of the top 10 performance problems described in the mentioned article

There are several tools out there you can use to identify this kind of issues. hHowever I will focus in javOSize (http://www.javosize.com/) because it is free, compact and incredibly powerful in the “fixing arena”.

One of the most interesting aspects of the tool is its ability to attach to any running JVM without previous configuration or restart. I like this approach in the troubleshooting missions because when you arrive you can start working without losing any time or more importantly the problematic state, that usually gets vanished if a restart is applied.

Let’s see how to face this bull.

�3

Page 4: Find and fix the top java performance problems in seconds

Thread deadlockProblem Signature: This problem is usually evidenced by really low CPU usage, because system is block, and some user request never finishing. System seems like halted or totally unresponsive.

Deadlocks are a classical problem when dealing with concurrency. They occur when two or more threads blocks mutually because of the order they access exclusively shared resources.

Let’s imagine two guys trying to eat two steaks with 1 fork and a 1 knife only. They pick up a fork, pick up the knife, eat a piece of steak an release both. So in the first iteration guy A gets the knife, then picks up the fork, slices a piece of steak and releases both resources.

Up to here there is no lock, system is flowing naturally. Now guy B picks the knife and guy A picks the fork immediately after. Guy B is waiting for guy A to release the fork and guy A is waiting for guy B to release the knife. None of them can keep eating, their steaks get cold and the system is in “Dead Lock”.

As the article about the top 10 performance problems mentions, deadlocks are the most harmful problems and very difficult to troubleshoot because they are hard to reproduce in non production environments.

So the first thing I have to do is identifying if there is deadlock in my system. I will use javOSize’s capability of auto detecting deadlocks, so just by executing the “ls” command in the PROBLEMS entity I get this information:

One thing I find really useful is that I can call javOSize from en external script that can check periodically if there is a deadlock and send me a notification in case there is:

�4

Page 5: Find and fix the top java performance problems in seconds

So now I know there is a deadlock, it’s time to find where the threads are locked. This is usually something I can find using a thread dump. However, they are sometimes a little bit messy and hard to read because they show to much information. That’s why I prefer javOSize for retrieving this information, it gives me only the information I need: first of all, I need about a summary about the threads:

Here I can easily see that there are two threads BLOCKED. One of them is “eater-thread2”, which is blocked by “eater-thread1” and the other one is “eater-thread1” which is blocked by “eater-thread2”. This is the classical deadlock pattern, as we have described it in the introduction of this section. A waits for B while B waits for A.

Next thing I need is full stack trace of the blocked threads. Again, this is something I can get with a thread dump, but using it I will also get information about threads I’m not interested in. I want to focus only in the blocked threads, so I will use the command “cat” for dumping these details:

�5

Page 6: Find and fix the top java performance problems in seconds

My threads are stuck in the method “com.acme.test.Eater.eat”. Problem identified. But I said “find and fix”. Let’s see how.

Having a look at the source code it’s useful for understanding why these threads are blocked. I can use the “vi” editor in the CLASSES entity for that:

I can easily see the synchronized block where my threads get the object “fork” and then try to get the object “knife”. Up to here it doesn't seem I have any problem, but let’s take a look at the class that is starting the threads. Since I know the architecture of the application, I know the threads are created in “com.acme.test.DeadLock”:

�6

Page 7: Find and fix the top java performance problems in seconds

I can see that calling the threads this way is making that each of them tries to get a different resource first. So I will change the class and invert the order “thread2” takes the resources. Once I write the changes, the class will be hot swapped in the application server, so the next request will not lock the threads.

Next thing I may think about is killing the threads that are blocked. Let me just say that killing threads is absolutely not recommended because of the unknown effects it may have on the system and although javOSize allows you to do it, there are some scenarios where the JVM may prevent the threads to be killed, for example, when they are blocked in native locks, that is, in a “synchronized” statement on an object. This is a know limitation of JVM’s and as a consequence, these two threads will be blocked until the next application server restart, but since I have edited the class, next requests will not generate any deadlock.

Deadlock solved.

There is another kind of deadlocks that are created by threads in WAITING status that are never notified. They are not detected as deadlocks by the JVM, because the threads are waiting for a notification that may come.However, if we see threads in this state during a long time it may happen that they are never notified, so these threads will never end. As a consequence, it is a deadlock.

Let’s see how to find and fix this kind of deadlocks. Again, I will list the status of the threads and I can now see two threads that are in WAITING status:

�7

Page 8: Find and fix the top java performance problems in seconds

Checking the details of each of the waiting threads I can see they are both blocked in “com.acme.test.Resource.getSharedObject”, which is called from “com.acme.test.Eater2.eat”:

Let’s have a look at the class “com.acme.test.Eater2”. I can see that the threads try to get the fork first, but if it’s busy, they will try with the knife. If one thread gets the fork and the other one gets the knife, they will be waiting for the other to release the resource, and this will never happen:

�8

Page 9: Find and fix the top java performance problems in seconds

There are two actions I should do in order to fix this problem:1) Edit the class “com.acme.test.Eater2” to avoid this lock to happen again by following one of

the these two options:1) Make the threads get the fork and the knife in the same order2) Make the threads release the resource they have if they have been waiting for the other

one for a certain period of time2) Remove the current lock

The first action is something I can do using the hot swap capability of javOSize that we have seen previously.

For the second action I will use another capability: executing any arbitrary code in the JVM.

I can see in the previous decompilation that both fork and knife can be released using the method “release()”. As a consequence, I will force the release of one of these resources so the thread

�9

Page 10: Find and fix the top java performance problems in seconds

holding it finishes. Doing so, the other thread will be able to get the second resource and finish as well:

The first thing I have done is set class loader to the same our application is using. Then, I execute the method “release()” on the knife, so the thread holding the knife releases it and the other one can get the resource and finish. I can finally check that after executing this code, the deadlock has disappear.

Deadlock solved.

Note that forcing the release of the knife will make the thread holding it to finish inconsistently, because we are forcing it to release the knife before getting the fork so we are making the thread eating only with the fork. However, it’s the only way to unblock the threads.

�10

Page 11: Find and fix the top java performance problems in seconds

Memory leaksProblem Signature: This problem is usually evidenced by a progressive CPU increase not directly related to load but elapsed time. There is a high heap usage where memory is never released and progressive performance degradation fixed after restart.

It is the second most impacting problem according to the article of the to 10 performance problems.

Memory leaks are basically a way to unintentionally block memory that can not be released by Garbage Collector. There may be a lot of controversy about what a leak is in Java, and I do not want to enter in that polemic right now, so to keep things simple let’s say that a leak is when you drop memory and you prevent the GC to do his work.

The main reason why this happens is because I am placing objects in the GC Roots (GC Root is a sophisticated way to say referenced objects). There are two easy ways to do that:

1) I create an static field in a class. Every static field in the class is never removed unless I explicitly clean it meanwhile the class is not unloaded. For sake of simplicity let’s assume this never happens.

2) I have a method variable in a method that never finishes. The most usual pattern is #1, mainly when I add elements to collection that is an static field in a given class and you forget about removing them.

Let’s say for instance I have a servlet where I declare a cache as an static collection. If I don't set a limit for the cache, it may grow indefinitely until the process runs out of memory.

javOSize allows me to easily find and fix the memory leaks that are related with static fields with a big size.

Let’s say I start suspecting I have a memory leak in my application. I can easily confirm it by looking at the heap usage graphic. The typical pattern it will follow if I have a memory leak is:

�11

Page 12: Find and fix the top java performance problems in seconds

where I can see my application is consuming more and more memory each time, which means it is not released somewhere.

There is a preloaded recipe in javOSize’s repository for identifying the top consuming static variables: TOP_FAT_STATIC_VARIABLES.

So I will search for the top 10 memory consuming objects within all the classes in my system:

I can see that there is an object called “userId” in the class “com.acme.test.MemoryLeak.java” consuming nearly 1.4 GB of memory. I can also see that the type of the object is a “java.util.ArrayList”. It really looks like the root cause of my memory leak, but let’s have a look at this class.

Checking the code of “com.acme.test.MemoryLeak.java” I can see that I’m adding content to a static field every time a new user asks for this URL but I never remove it:

�12

Page 13: Find and fix the top java performance problems in seconds

I will make a little change to the class so before adding any new element to the ArrayList I check its size and in case it’s higher than a threshold I will delete the oldest entries:

�13

Page 14: Find and fix the top java performance problems in seconds

If I finally check the heap usage after the change, I can see that now the memory is released when the JVM does the garbage collection, so I have fixed the memory leak:

Memory leak solved.

�14

Page 15: Find and fix the top java performance problems in seconds

DB pool contention & gridlocksProblem Signature: This problem is evidenced by some user transactions taking more time than usual to complete or not completing and usually a drop down in used CPU. It is also usual to see the system behaving in a burst fashion.

DB pool contention and GridLocks are quite similar in nature, and similar to deadlocks too. The main difference is that after some time they tend to recover themselves. In the case of DB Pool Contention this happens because any of the next reasons:

1) The application is not returning back the connection to the pool. This is very common when not including the close call to the connection inside a finally block.

2) I have too many threads and concurrent users that connects to the database and the threads take a long time to return the connection back.

Let’s say I have some transactions running very slow. I will look for transactions slower than 500 milliseconds during a sample period of 10 seconds that execute any method inside “com.acme.*” packages:

I can see that when my users ask for “http://localhost:8080/sample/DbAccess” there is a bottleneck and the thread is waiting in “org.apache.tomcat.dbcp.pool.impl.GenericObjectPool.borrowObject” which is called from “org.apache.tomcat.dbcp.dbcp.BasicDataSource.getConnection”. My application calls this method from “com.acme.test.BBDDutils.check”.So, what I see here is that every time I try to get a connection to the DB I’m querying from “com.acme.test.BBDDutils.check” I have to wait a lot of time. This is the consequence of a non

�15

Page 16: Find and fix the top java performance problems in seconds

existing or an insufficient DB pool. This is the problem #2 that I have described at the beginning of this section.

The first thing I can do to fix this problem is check the configuration of the pool. I can do it by checking the JMX Mbean that stores it’s configuration:

I can see I have a pool with just one active connection. I can change the maximum number of connections to the DB just by executing a “set” on the attribute I want to change. After doing so, I will check that the configuration of the pool has changed:

�16

Page 17: Find and fix the top java performance problems in seconds

If I access the application after this change, I will see the response time is now much lower.

DB pool contention solved.

Although the article about the top 10 performance problems does not mention slow methods explicitly, using the same methodology I can also identify and fix slow methods due to the death by 1000 cuts or threads gridlocks.

�17

Page 18: Find and fix the top java performance problems in seconds

SummaryNowadays, any java application may easily suffer from thread deadlocks, memory leaks or slow methods (caused by database pool contention, threads gridlocks or the death by 1000 cuts amongst others). These problems belong to the top 10 performance problems due to their impact they have on your business and/or the difficulty to troubleshoot them.

And this article has shown, as I promised in the tittle, not only how to find them but also fix them. In an easy and fast way.

- javOSize Evangelist -

�18