How to generalize all key/value pairs in Hadoop… here is a solution to the problem.
How to generalize all key/value pairs in Hadoop
I’m new to hadoop and I’m trying to do some map/reduce tasks in Java. I would like to know how we can do a reduction operation on all key/value pairs.
For example, let’s say we have the highest temperature of the day for each day of the month. We use date as key and temperature as value, I want to get the key/value of the highest temperature for the whole month.
I hope my question is clear!
Thanks for your help.
Solution
Yes, it is possible. Simply > by Configure your work to use a single reducer, which will iterate through all key/value pairs. In the reduce
() method, you simply search for the maximum value, while in the cleanup()
method, you output the final result. (k, v) = (year, temperature)
Example:
public class MaxTemperatureReducer extends Reducer<IntWritable, DoubleWritable, IntWritable, DoubleWritable> {
private static int year = 0;
private static double maxTemp = 0.0;
@Override
public void reduce(IntWritable key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {
for (DoubleWritable value : values) {
if (value.get() > maxTemp) {
year = key.get();
maxTemp = value.get();
}
}
}
@Override
public void cleanup(Context context) throws IOException, InterruptedException {
context.write(new IntWritable(year), new DoubleWritable(maxTemp));
}
}