Unable to access counters in MapReduce’s reducer class… here is a solution to the problem.
Unable to access counters in MapReduce’s reducer class
I incremented the counter of the mapper by
public static class TokenizerMapper
extends Mapper<Object, Text, Text, FloatWritable>{
public static enum MyCounters { TOTAL };
context.getCounter(MyCounters.TOTAL).increment(1);
.
I’m trying to get the value of this counter in the reducer class in the following way.
@Override
public void setup(Context context) throws IOException ,InterruptedException{
Configuration conf = context.getConfiguration();
Cluster cluster = new Cluster(conf);
Job currentJob = cluster.getJob(context.getJobID());
Counters counters = currentJob.getCounters();
Counter counter = counters.findCounter(TokenizerMapper.MyCounters.TOTAL);
But when I run the code,
It always gives a
java.lang.NullPointerException at the last line
cluster.getJob(context.getJobID())
It always returns null.
I tried other methods to access the counter incremented in the mapper of the reducer, but without success.
Can someone explain to me what exactly the problem is and how I can access the counter from reducer. I need the value of the total count to calculate the percentage of words.
Here is my driver code.
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FloatWritable.class);
job.setNumReduceTasks(1);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
Solution
I’m using Hadoop 2.7.0.
You do not need to instantiate the cluster to access counters in the reducer.
In my mapper I have the following code:
// Define a enum in the Mapper class
enum CustomCounter {Total};
In the map() method, increment the counter for each record
context.getCounter(CustomCounter.Total).increment(1);
In my reducer, I access the counter as follows:
Counter counter = context.getCounter(CustomCounter.Total);
It suits me perfectly.
Here are my maven dependencies:
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
</dependencies>