Friday, June 15, 2012

Java – Threading & Synchronization Issues

Of the many issues affecting the performance of Java/.NET applications, synchronization ranks near the top.  Issues arising from synchronization are often hard to recognize and their impact on performance can be become significant. What’s more, they are often, at least in principle, avoidable.
The fundamental need to synchronize lies with Java’s support for concurrency. This is implemented by allowing the execution of code by separate threads within the same process. Separate threads can share the same resources, objects in memory. While being a very efficient way to get more work done (while one thread waits for an IO operation to complete, another thread gets the CPU to run a computation), the application is also exposed to interference and consistency problems.
The JVM/CLR does not guarantee an execution order of the code running in concurrent threads. If multiple threads reference the same object there is no telling what state that object will be in at a given moment in time. The repercussions of that simple fact can be enormous with, for example, one thread running calculations and returning wrong results because a concurrent thread accessing and modifying shared bits of information at the same time.
To prevent such a scenario (a program needs to execute correctly, after all) a programmer uses the “synchronize” keyword in his/her program to force order on concurrent thread execution. Using “synchronize” prevents threads from obtaining the same object at the same time.
In practice, however, this simple mechanism comes with substantial side effects. Modern business applications are typically highly multi-threaded. Many threads execute concurrently, and consequently “contend” heavily for shared objects. Contention occurs when a thread wants to access a synchronized object that is already held by another thread. All threads contending effectively “block,” halting their execution until they can acquire the object. Synchronization effectively forces concurrent processing back into sequential execution.
With just a few metrics we can show the effects of synchronization on an application’s performance. For instance, take a look at the graph below.


While increasing load (number of users = blue), we see that at some point midway the response time (yellow) takes an upward curve, while at the same time resource usage (CPU = red) somewhat increases to eventually plateau and even recedes. It almost looks like the application runs with the “handbrake on,” a classic, albeit high-level, symptom of an application that has been “over-synchronized.”
With every new version of the JVM/CLR improvements are made to mitigate this issue. However, while helpful, these improvements can’t fully resolve the issue and address the application’s negative performance.
Also, developers have come to adopt “defensive” coding practices, synchronizing large pieces of code to prevent possible problems. In large development organizations this problem is further magnified as no one developer or team has full ownership of an application’s entire code base. The practice to err on the side of safety can quickly exacerbate with large portions of synchronized code significantly impacting the performance of an application’s potential throughput.
It is often too arduous a task to maintain a locking strategy fine grained enough to ensure that only the necessary minimum of execution paths are synchronized. New approaches to better manage state in a concurrent environment are available in newer versions of Java such as readWriteLocks, but they are not widely adopted yet.  These approaches promise a higher degree of concurrency, but it will always be up to the developer to implement and use the mechanism correctly.
Is synchronization, then, always going to result in a high MTTR?
New technologies exist on the horizon that may lend some relief.  Software Transactional Memory Systems (STM), for example, might become a powerful weapon for dealing with synchronization issues. They may not be ready for prime time yet, but given what we’ve seen with database systems, they might be the key to taming the concurrency challenges affecting applications today. Check out JVSTM, Multiverse and Clojure for examples of STMs.
For now, the best development organizations are the ones that can walk the fine line of balancing code review/rewrite burdens and concessions to performance. APM tools can help quite a lot in such scenarios, allowing to monitor application execution under high load (aka “in production”) and quickly pinpoint to the execution times for particular highly contended Objects, Database connections being a prime example. With the right APM in place, the ability to identify thread synchronization issues become greatly increased—and the overall MTTR will drop dramatically.


No comments:

Post a Comment