Python – Example mapping in python for hives reduces script exceptions

Example mapping in python for hives reduces script exceptions… here is a solution to the problem.

Example mapping in python for hives reduces script exceptions

I’m learning Hive. I set up a table called records. The schema is as follows:

year        : string
temperature : int
quality     : int

Here are the example lines

1999 28 3
2000 28 3
2001 30 2

Now I’ve written a sample map reduce script in Python exactly as described in the book The Definitive Guide to Hadoop:

import re
import sys

for line in sys.stdin:
    (year,tmp,q) = line.strip().split()
    if (tmp != '9999' and re.match("[01459]",q)):
        print "%s\t%s" % (year,tmp)

I ran it with the following command:

ADD FILE /usr/local/hadoop/programs/sample_mapreduce.py;
SELECT TRANSFORM(year, temperature, quality)
USING 'sample_mapreduce.py'
AS year,temperature;

Execution failed. On the terminal I get this :

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2012-08-23 18:30:28,506 Stage-1 map = 0%,  reduce = 0%
2012-08-23 18:30:59,647 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201208231754_0005 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201208231754_0005_m_000002 (and more) from job job_201208231754_0005
Exception in thread "Thread-103" java.lang.RuntimeException: Error while reading from task log url
    at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
    at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
    at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://master:50060/tasklog?taskid=attempt_201208231754_0005_m_000000_2&start=-8193
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    at java.net.URL.openStream(URL.java:1010)
    at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
    ... 3 more

I go to the list of failed jobs and this is the stack trace

java.lang.RuntimeException: Hive Runtime Error while closing operators
    at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hit error while closing ..
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:452)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
    at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
    ... 8 more

The same trajectory was repeated more than 3 times.

Please, can someone help me with this? What’s the problem here? I followed the book exactly. As if something the problem. There seem to be two bugs. On the terminal it says it cannot read from the task log URL. In the list of failed jobs, the exception says something different. Please help

Solution

I went to the stedrr log from the hadoop admin interface and found that python had a syntax error. Then I found out that when I created the Hive table, the field delimiter was a tab character. I didn’t mention it in split(). So I changed it to split('\t') and it works fine!

Related Problems and Solutions