Pass custom values to reducer
I want/need to pass rowkey to Reducer
because rowkey is precomputed and the information is no longer available at that stage. (Reducer
executes a Put
).
First I tried to use only the inner class, eg
public class MRMine {
private byte[] rowkey;
public void start(Configuration c, Date d) {
calc rowkey based on date
TableMapReduceUtil.initTableMapperJob(...);
TableMapReduceUtil.initTableReducerJob(...);
}
public class MyMapper extends TableMapper<Text, IntWritable> {...}
public class MyReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> {...}
}
And both MyMapper
and MyReducer
define default constructors. However, this approach results in the following exception:
java.lang.RuntimeException: java.lang.NoSuchMethodException: com.mycompany.MRMine$MyMapper.<init>()
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.lang.NoSuchMethodException: com.company.MRMine$MyMapper.<init>()
at java.lang.Class.getConstructor0(Class.java:2730)
at java.lang.Class.getDeclaredConstructor(Class.java:2004)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
I declare the inner class as static ( to get rid of the exception. But then I also have to set rowkey
to static, and I’m running multiple jobs in parallel.
I found https://stackoverflow.com/a/6739905/1338732 where Reducer's
The configure
method is overridden, but it doesn’t seem to be available anymore. I can’t pass the value anyway.
I’m considering (mistakenly) using (?) Configuration, just add a new key-value pair, is this feasible, and the right way?
Is there a way to pass any custom values to the reducer?
The version I used was: hbase:0.94.6.1
, hadoop:1.0.4
Solution
Your problem statement is a bit unclear, but I think you’re looking for something like this.
The way I currently use to pass information to the reducer is through the configuration.
In the job settings, do the following
conf.set("someName","someValue");
This will create a label in the configuration with the name someName and the value someValue. You can retrieve it later in Mapper/Reducer by doing the following:
Configuration conf = context.getConfiguration();
String someVariable = conf.get("someName");
The current code sets the value of someVariable to “someValue”, allowing information to be passed to the reducer.
To pass multiple values, use setStrings(). I haven’t tested this feature yet, but according to documentation should use one of the following two options (the documentation is a bit unclear, so try both and use which one works):
conf.setStrings("someName","value1,value2,value3");
conf.setStrings("someName","value1","value2","value3");
Retrieved using:
Configuration conf = context.getConfiguration();
String someVariable = conf.getStrings("someName");
Hope that helps