Java – Error in apache Pig when running on yarn “org.apache.hadoop.ipc.Client – Retrying connect to server: tasktracker3/10.201.2.169:50000”

Error in apache Pig when running on yarn “org.apache.hadoop.ipc.Client – Retrying connect to server: tasktracker3/10.201.2.169:50000″… here is a solution to the problem.

Error in apache Pig when running on yarn “org.apache.hadoop.ipc.Client – Retrying connect to server: tasktracker3/10.201.2.169:50000”

I’m running Apache Pig 0.11.2 and Hadoop 2.2.0.

Most of the simple jobs I run in Pig run fine.

However, whenever I try to use the GROUP BY or LIMIT operator on a large dataset, I get the following join error:

2013-12-18 11:21:28,400 [main] INFO org.apache.hadoop.ipc.Client –
Retrying connect to server: tasktracker2/10.201.2.145:54957. Already
tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2013-12-18 11:21:29,402 [main] INFO org.apache.hadoop.ipc.Client –
Retrying connect to server: tasktracker2/10.201.2.145:54957. Already
tried 1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2013-12-18 11:21:30,403 [main] INFO org.apache.hadoop.ipc.Client –
Retrying connect to server: tasktracker2/10.201.2.145:54957. Already
tried 2 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2013-12-18 11:21:30,507 [main] INFO
org.apache.hadoop.mapred.ClientServiceDelegate – Application state is
completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job
history server 2013-12-18 11:21:31,703 [main] INFO
org.apache.hadoop.ipc.Client – Retrying connect to server:
tasktracker1/10.201.2.20:49528. Already tried 0 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1
SECONDS) 2013-12-18 11:21:32,704 [main] INFO
org.apache.hadoop.ipc.Client – Retrying connect to server:
tasktracker1/10.201.2.20:49528. Already tried 1 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1
SECONDS) 2013-12-18 11:21:33,705 [main] INFO
org.apache.hadoop.ipc.Client – Retrying connect to server:
tasktracker1/10.201.2.20:49528. Already tried 2 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1
SECONDS) 2013-12-18 11:21:33,809 [main] INFO
org.apache.hadoop.mapred.ClientServiceDelegate – Application state is
completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job
history server 2013-12-18 11:21:34,890 [main] INFO
org.apache.hadoop.ipc.Client – Retrying connect to server:
tasktracker3/10.201.2.169:50000. Already tried 0 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1
SECONDS) 2013-12-18 11:21:35,891 [main] INFO
org.apache.hadoop.ipc.Client – Retrying connect to server:
tasktracker3/10.201.2.169:50000. Already tried 1 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1
SECONDS) 2013-12-18 11:21:36,893 [main] INFO
org.apache.hadoop.ipc.Client – Retrying connect to server:
tasktracker3/10.201.2.169:50000. Already tried 2 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1
SECONDS) 2013-12-18 11:21:36,996 [main] INFO
org.apache.hadoop.mapred.ClientServiceDelegate – Application state is
completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job
history server 2013-12-18 11:21:37,152 [main] INFO
org.apache.hadoop.mapred.ClientServiceDelegate – Application state is
completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job
history server

Strangely, after these errors persist for a few minutes, they stop and show the correct output at the bottom.

So Hadoop works well and calculates the correct output. The problem is just that these connection errors keep popping up. This causes the execution time of the script to increase.

One thing I’ve noticed is that whenever this error occurs, the job creates and runs multiple JAR files during the job. However, a few minutes after these messages pop up, the correct output finally appears.

I have 5 node clusters, 1 name node, and 4 data nodes. All daemons are running fine.

Any suggestions on how to delete these messages?

Solution

It looks like your job history server is not running.

  1. Turn on log aggregation (you may have done it, just missing a server) – put it into your yarn-site.xml:

    <property>
       <name>yarn.log-aggregation-enable</name>
       <value>true</value>
    </property>
    
  2. Run job history server:

    $HADOOP_INSTALL/sbin/mr-jobhistory-daemon.sh start historyserver
    
  3. Try running the Pig script again

Related Problems and Solutions