Java – Pass custom values to reducer

Pass custom values to reducer… here is a solution to the problem.

Pass custom values to reducer

I want/need to pass rowkey to Reducer because rowkey is precomputed and the information is no longer available at that stage. (Reducer executes a Put).

First I tried to use only the inner class, eg

public class MRMine {
  private byte[] rowkey;
  public void start(Configuration c, Date d) {
     calc rowkey based on date
    TableMapReduceUtil.initTableMapperJob(...);
    TableMapReduceUtil.initTableReducerJob(...);
  }
  public class MyMapper extends TableMapper<Text, IntWritable> {...}
  public class MyReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> {...}
}

And both MyMapper and MyReducer define default constructors. However, this approach results in the following exception:

java.lang.RuntimeException: java.lang.NoSuchMethodException: com.mycompany.MRMine$MyMapper.<init>()
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.lang.NoSuchMethodException: com.company.MRMine$MyMapper.<init>()
    at java.lang.Class.getConstructor0(Class.java:2730)
    at java.lang.Class.getDeclaredConstructor(Class.java:2004)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)

I declare the inner class as static ( to get rid of the exception. But then I also have to set rowkey to static, and I’m running multiple jobs in parallel.

I found https://stackoverflow.com/a/6739905/1338732 where Reducer's The configure method is overridden, but it doesn’t seem to be available anymore. I can’t pass the value anyway.

I’m considering (mistakenly) using (?) Configuration, just add a new key-value pair, is this feasible, and the right way?

Is there a way to pass any custom values to the reducer?

The version I used was: hbase:0.94.6.1, hadoop:1.0.4

Solution

Your problem statement is a bit unclear, but I think you’re looking for something like this.

The way I currently use to pass information to the reducer is through the configuration.

In the job settings, do the following

conf.set("someName","someValue");

This will create a label in the configuration with the name someName and the value someValue. You can retrieve it later in Mapper/Reducer by doing the following:

Configuration conf = context.getConfiguration();
String someVariable = conf.get("someName");

The current code sets the value of someVariable to “someValue”, allowing information to be passed to the reducer.

To pass multiple values, use setStrings(). I haven’t tested this feature yet, but according to documentation should use one of the following two options (the documentation is a bit unclear, so try both and use which one works):

conf.setStrings("someName","value1,value2,value3");
conf.setStrings("someName","value1","value2","value3");

Retrieved using:

Configuration conf = context.getConfiguration();
String someVariable = conf.getStrings("someName");

Hope that helps

Related Problems and Solutions