Java – Java gets the number of inputs and outputs for MapReduce

Java gets the number of inputs and outputs for MapReduce… here is a solution to the problem.

Java gets the number of inputs and outputs for MapReduce

I want to get the number of inputs and outputs for the map stage and the reduce phase and the time of the full map/reduce job in Java. These statistics are written on the terminal, but I need to get it in Java code and write it on my own interface, just after the line :

job_blocking.waitForCompletion(true);

Solution

After this line, you can get the number of MAP_INPUT_RECORDS and REDUCE_OUTPUT_RECORDS (also MAP_OUTPUT_RECORDS:) by getting the values of these counters

long map_input_records = job.getCounters()
    .findCounter("org.apache.hadoop.mapreduce.Task$Counter","MAP_INPUT_RECORDS")
    .getValue();
long map_output_records = job.getCounters()
    .findCounter("org.apache.hadoop.mapreduce.Task$Counter","MAP_OUTPUT_RECORDS")
    .getValue();
long reduce_input_records = job.getCounters()
    .findCounter("org.apache.hadoop.mapreduce.Task$Counter","REDUCE_INPUT_RECORDS")
    .getValue();
long reduce_output_records = job.getCounters()
    .findCounter("org.apache.hadoop.mapreduce.Task$Counter","REDUCE_OUTPUT_RECORDS")
    .getValue();

I don’t know if there is another way (simpler) for the time it takes to run the job than to set a long variable with the current time before and after execution and get their differences.

Related Problems and Solutions