Eclipse remote debugging is not available for Hadoop in pseudo-distributed mode
When running Hadoop in standalone mode, I had no problem with remote debugging of Eclipse. However, when I run Hadoop in pseudo-distributed mode, it doesn’t work. Here’s how I tried Eclipse remote debugging with Hadoop in pseudo-distributed mode:
I added a line to my hadoop script like this:
#added this line to enable remote debugging
HADOOP_OPTS="$HADOOP_OPTS -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5000"
# run it
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS "$@"
Then I create a remote debug configuration like this:
I ran the job from the command line and it explains what it should do :
Listening for transport dt_socket at address: 5000
Then I go back to Eclipse and run the debug configuration. It goes into my main() function as it should:
However, it didn’t hit any breakpoints I set in the mapper or reducer.
What’s the problem here? Why does it work with Hadoop in standalone mode instead of pseudo-distributed mode? Can pseudo-distributed Hadoop be debugged remotely? If not, what is the “correct” way to debug mapreduce code in Eclipse?
Solution
See Lorand’s review above. Remote debugging can only work in standalone mode.