23 June, 2011

Beware debugging Java 1.3 synchronised sections!


Pinch point hazard

Single-stepping through interlocking Java threads in JDK1.3 can result in irresolvable deadlock. Specifically if one thread is blocked stepping-into a section synchronized on an object and another thread that already holds the lock steps-over a wait() on that object.

As a bit of background, I became aware of this problem over the past couple of days debugging a data-comms protocol - that is, I've been bashing my head against it and wondering why it's not working. The reason for using the precambrian JDK1.3 in this age of steam is that some embedded Java environments require using JDK1.3. Java6 (the latest release) doesn't seem to exhibit this problem (and Java7 is just around the corner!)

There are possibly other interlock conditions that can cause this problem and it may well happen in other JDK versions (Like I said above, 1.6 seems OK though), but this scenario "works for me". Note that I've only tested this running on Windows7 64-bit platforms in variants of the Eclipse IDE (2 different ones) as-yet.



Here's a demonstration program:

package Scratch;
public class Threadly {

 public static void main(String[] args) throws InterruptedException {
  Worker worker1 = new Worker( "One" );
  Worker worker2 = new Worker( "Two" );
  worker1.start();
  worker2.start();
  worker1.join();
  worker2.join();
 }

 static class Worker extends Thread {
  public Worker( String name ) {
   super( name );
  }
  
  public void run() {
   while (true) {
    synchronized (lock) { // Breakpoint here
     System.out.println(getName() + " inner");
     try {
      lock.wait(1); // Breakpoint here
     } catch (InterruptedException e) {
      // Ignored
     }
    }
    System.out.println(getName() + " outer");
   }
  }
 }

 static Object lock = new Object();
}

Load this into Eclipse (Make sure you have a 1.3 JDK installed and this is the one used for the project containing this code), put breakpoints in as indicated and "debug as Java application". Step thread "One" into the synchronised section and over the wait() call, this should block and the debug-focus moves to thread "Two". Try to step thread "Two" over the wait() and it should block also. You should now end-up with a debug pane something like this:


The code pane should indicate the program is inside the wait() call:


You can try resuming thread "two" and the debugger will display the thread as "Running", however tread "one" will still be "Stepping". At this stage I can find no way to recover the debug session and it has to be stopped and restarted to resume testing.

Note that this problem can also occur when stepping-into a synchronised section that something else is holding the lock for, however this is not exercised in the procedure above.

Workaround:
The trick is not to step-into a synchronised section or step-over a wait(). Instead place a new breakpoint after the synchronisation edge and just run-through the critical synchronisation  (i.e. "Resume" a thread stopped ahead of the synch operation). In the case of the example above, this would be a breakpoint on the second println() call.


Hopefully this can save you some hair-pulling in the future.