Java – Eclipse remote debugging is not available for Hadoop in pseudo-distributed mode

Eclipse remote debugging is not available for Hadoop in pseudo-distributed mode… here is a solution to the problem.

Eclipse remote debugging is not available for Hadoop in pseudo-distributed mode

When running Hadoop in standalone mode, I had no problem with remote debugging of Eclipse. However, when I run Hadoop in pseudo-distributed mode, it doesn’t work. Here’s how I tried Eclipse remote debugging with Hadoop in pseudo-distributed mode:

I added a line to my hadoop script like this:

#added this line to enable remote debugging
HADOOP_OPTS="$HADOOP_OPTS -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5000"

# run it
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS "$@"

Then I create a remote debug configuration like this:

creating a remote debugging configuration

I ran the job from the command line and it explains what it should do :

Listening for transport dt_socket at address: 5000

Then I go back to Eclipse and run the debug configuration. It goes into my main() function as it should:

enter image description here

However, it didn’t hit any breakpoints I set in the mapper or reducer.

What’s the problem here? Why does it work with Hadoop in standalone mode instead of pseudo-distributed mode? Can pseudo-distributed Hadoop be debugged remotely? If not, what is the “correct” way to debug mapreduce code in Eclipse?

Solution

See Lorand’s review above. Remote debugging can only work in standalone mode.

Related Problems and Solutions