Python – Hadoop and Python: View Error

Hadoop and Python: View Error… here is a solution to the problem.

Hadoop and Python: View Error

I’m using Hadoop streaming to run some Python code. I noticed that if there is a bug in my Python code (e.g. in mapper.py), I won’t be notified about that error. Instead, the mapper program will not run, and the job will be terminated after a few seconds. Looking at the logs, the only error I see is that the mapper.py run failed or could not be found, which is obviously not the case.

My question is, is there a specific log file for me to examine to see the actual errors that may exist in the mapper.py code? (For example, will tell me if the import command fails).

Thanks!

Edit: Command used:

bin/hadoop jar contrib/streaming/hadoop-streaming.jar \ -file /hadoop/mapper.py -mapper /hadoop/mapper.py -file /hadoop/reducer.py -reducer /hadoop/reducer.py -input / hadoop/input.txt -output /hadoop/output

And I want to see the wrong quote post:
Hadoop and NLTK: Fails with stopwords

Solution

Regarding the log issue, it helps to see this:

MapReduce: Log file locations for stdout and std err

I guess if the python file fails to run, then the interpreter should print to the standard output, and you’ll see it in the standard output log for that node.

Related Problems and Solutions