The Hadoop mapreduce task fails with 143… here is a solution to the problem.
The Hadoop mapreduce task fails with 143
I’m currently learning to use Hadoop mapred, but I’m getting this error :
packageJobJar: [/home/hduser/mapper.py, /home/hduser/reducer.py, /tmp/hadoop-unjar4635332780289131423/] [] /tmp/streamjob8641038855230304864.jar tmpDir=null
16/10/31 17:41:12 INFO client. RMProxy: Connecting to ResourceManager at /192.168.0.55:8050
16/10/31 17:41:13 INFO client. RMProxy: Connecting to ResourceManager at /192.168.0.55:8050
16/10/31 17:41:15 INFO mapred. FileInputFormat: Total input paths to process : 1
16/10/31 17:41:17 INFO mapreduce. JobSubmitter: number of splits:2
16/10/31 17:41:18 INFO mapreduce. JobSubmitter: Submitting tokens for job: job_1477933345919_0004
16/10/31 17:41:19 INFO impl. YarnClientImpl: Submitted application application_1477933345919_0004
16/10/31 17:41:19 INFO mapreduce. Job: The url to track the job: http://master:8088/proxy/application_1477933345919_0004/
16/10/31 17:41:19 INFO mapreduce. Job: Running job: job_1477933345919_0004
16/10/31 17:41:38 INFO mapreduce. Job: Job job_1477933345919_0004 running in uber mode : false
16/10/31 17:41:38 INFO mapreduce. Job: map 0% reduce 0%
16/10/31 17:41:56 INFO mapreduce. Job: map 100% reduce 0%
16/10/31 17:42:19 INFO mapreduce. Job: Task Id : attempt_1477933345919_0004_r_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
I can’t figure out how to fix this error and have been searching on the internet. The code I used for the mapper is:
Import the system
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print '%s\t%s' % (word, 1)
The code for the reducer is:
from operator import itemgetter
import sys
current_word = None
current_count = 0
word = None
for line in sys.stdin:
line = line.strip()
word, count = line.split('\t', 1)
try:
count = int(count)
except ValueError:
continue
if current_word == word:
current_count += count
else:
if current_word:
print '%s\t%s' % (current_word, current_count)
current_count = count
current_word = word
if current_word == word:
print '%s\t%s' % (current_word, current_count)
In order to run the task I’m using:
hduser@master:/opt/hadoop-2.7.3/share/hadoop/tools/lib $ hadoop jar hadoop-streaming-2.7.3.jar -file /home/hduser/mapper.py -mapper "python mapper.py" -file /home/ hduser/reducer.py -reducer "python reducer.py" -input ~/testDocument -output ~/results1
Any help would be greatly appreciated as I am new to Hadoop. If you need more logs or information, feel free to ask.
Solution
Review the logs to find errors in your Python code. FOR EMR/YARN, YOU CAN FIND YOUR LOGS FROM THE WEB UI OR THE CLUSTER MAIN SHELL AS SHOWN BELOW (YOUR APPLICATION ID WILL BE DIFFERENT FROM WHAT YOU PRINTED AT THE START OF THE JOB). There is a lot of output, redirect it to the file I show and look for the python stack trace.
$ yarn logs -applicationId application_1503951120983_0031 > /tmp/log