Python - The Hadoop mapreduce task fails with 143

The Hadoop mapreduce task fails with 143… here is a solution to the problem.

The Hadoop mapreduce task fails with 143

I’m currently learning to use Hadoop mapred, but I’m getting this error :

packageJobJar: [/home/hduser/mapper.py, /home/hduser/reducer.py, /tmp/hadoop-unjar4635332780289131423/] [] /tmp/streamjob8641038855230304864.jar tmpDir=null
16/10/31 17:41:12 INFO client. RMProxy: Connecting to ResourceManager at /192.168.0.55:8050
16/10/31 17:41:13 INFO client. RMProxy: Connecting to ResourceManager at /192.168.0.55:8050
16/10/31 17:41:15 INFO mapred. FileInputFormat: Total input paths to process : 1
16/10/31 17:41:17 INFO mapreduce. JobSubmitter: number of splits:2
16/10/31 17:41:18 INFO mapreduce. JobSubmitter: Submitting tokens for job: job_1477933345919_0004
16/10/31 17:41:19 INFO impl. YarnClientImpl: Submitted application application_1477933345919_0004
16/10/31 17:41:19 INFO mapreduce. Job: The url to track the job: http://master:8088/proxy/application_1477933345919_0004/
16/10/31 17:41:19 INFO mapreduce. Job: Running job: job_1477933345919_0004
16/10/31 17:41:38 INFO mapreduce. Job: Job job_1477933345919_0004 running in uber mode : false
16/10/31 17:41:38 INFO mapreduce. Job:  map 0% reduce 0%
16/10/31 17:41:56 INFO mapreduce. Job:  map 100% reduce 0%
16/10/31 17:42:19 INFO mapreduce. Job: Task Id : attempt_1477933345919_0004_r_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

I can’t figure out how to fix this error and have been searching on the internet. The code I used for the mapper is:

Import the system

for line in sys.stdin:
    line = line.strip()
    words = line.split()

for word in words:
        print '%s\t%s' % (word, 1)

The code for the reducer is:

from operator import itemgetter
import sys

current_word = None
current_count = 0
word = None

for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)

try:
        count = int(count)
    except ValueError:
        continue

if current_word == word:
        current_count += count
    else:
        if current_word:
            print '%s\t%s' % (current_word, current_count)
        current_count = count
        current_word = word

if current_word == word:
    print '%s\t%s' % (current_word, current_count)

In order to run the task I’m using:

hduser@master:/opt/hadoop-2.7.3/share/hadoop/tools/lib $ hadoop jar hadoop-streaming-2.7.3.jar -file /home/hduser/mapper.py -mapper "python mapper.py" -file /home/ hduser/reducer.py -reducer "python reducer.py" -input ~/testDocument -output ~/results1

Any help would be greatly appreciated as I am new to Hadoop. If you need more logs or information, feel free to ask.

Solution

Review the logs to find errors in your Python code. FOR EMR/YARN, YOU CAN FIND YOUR LOGS FROM THE WEB UI OR THE CLUSTER MAIN SHELL AS SHOWN BELOW (YOUR APPLICATION ID WILL BE DIFFERENT FROM WHAT YOU PRINTED AT THE START OF THE JOB). There is a lot of output, redirect it to the file I show and look for the python stack trace.

$ yarn logs -applicationId application_1503951120983_0031 > /tmp/log

Python – The Hadoop mapreduce task fails with 143

The Hadoop mapreduce task fails with 143

Solution

Related Problems and Solutions