Java – Hadoop second reducer is not called when linking two jobs

Hadoop second reducer is not called when linking two jobs… here is a solution to the problem.

Hadoop second reducer is not called when linking two jobs

I

have a hadoop program where I want to link two jobs, e.g. input -> mapper1 -> reducer1 -> mapper2 -> reducer2 -> output. The first half worked fine and I got the correct intermediate output. The problem lies in the second job. In particular, I believe that in the second job, the mapper didn’t call the correct reducer for some reason because I got a type mismatch.
This is the main code I have to set up the job :

    //JOB 1
    Path input1 = new Path(otherArgs.get(0));
    Path output1 =new Path("/tempBinaryPath");

Job job1 = Job.getInstance(conf);
        job1.setJarByClass(BinaryPathRefined.class);
        job1.setJobName("BinaryPathR1");

FileInputFormat.addInputPath(job1, input1);
    FileOutputFormat.setOutputPath(job1, output1);

job1.setMapperClass(MyMapper.class);
    job.setCombinerClass(MyReducer.class);
    job1.setReducerClass(MyReducer.class);

job1.setInputFormatClass(TextInputFormat.class);

job1.setOutputKeyClass(Text.class);
    job1.setOutputValueClass(Text.class);

job1.waitForCompletion(true);

 JOB 2
    Path input2 = new Path("/tempBinaryPath/part-r-00000");
    Path output2 =new Path(otherArgs.get(1));

Job job2 = Job.getInstance(conf2);
        job2.setJarByClass(BinaryPathRefined.class);
        job2.setJobName("BinaryPathR2");

FileInputFormat.addInputPath(job2, input2);
    FileOutputFormat.setOutputPath(job2, output2);

job2.setMapperClass(MyMapper2.class);
    job.setCombinerClass(MyReducer.class);
    job2.setReducerClass(MyReducer2.class);

job2.setInputFormatClass(TextInputFormat.class);

job2.setOutputKeyClass(Text.class);
    job2.setOutputValueClass(Text.class);

job2.waitForCompletion(true);

Mapper and reducer are of the form:

public static class MyMapper extends Mapper<LongWritable, Text, Text, Text>{
...
}

public static class MyReducer extends Reducer<Text, Text, Text, Text>{
...
}

public static class MyMapper2 extends Mapper<LongWritable, Text, Text, IntWritable>{
...
}

public static class MyReducer2 extends Reducer<Text, IntWritable, Text, Text>{
...
}

The first job runs fine, while the second job has an error:

Type mismatch in value from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable

Any ideas?

Solution

When you call only setOutputKeyClass and setOutputValueClass, Hadoop assumes that Mapper and Reducer have the same output type. In your case, you should explicitly set the type of output produced by Mapper:

job2.setOutputKeyClass(Text.class);
job2.setMapOutputValueClass(IntWritable.class);

Related Problems and Solutions