Java – How to turn off hadoop speculative execution from Java

How to turn off hadoop speculative execution from Java… here is a solution to the problem.

How to turn off hadoop speculative execution from Java

After reading Hadoop speculative task execution I’m trying to turn off speculative execution using the new Java api, but it doesn’t have any effect.

This is my main class:

public class Main {

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();

old api:
    conf.setBoolean("mapred.map.tasks.speculative.execution", false);
    new api:
    conf.setBoolean("mapreduce.map.speculative", false);

int res = ToolRunner.run(conf, new LogParserMapReduce(), args);
    System.exit(res);
  }
}

Here’s how my MapReducer started:

@Override
public int run(String[] args) throws Exception {
    Configuration conf = super.getConf();

/*
     * Instantiate a Job object for your job's configuration.  
     */
    Job job = Job.getInstance(conf);

But when I look at the logs, I see:

2014-04-24 10:06:21,418 INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat (main): Total input paths to process : 16
2014-04-24 10:06:21,574 INFO org.apache.hadoop.mapreduce.JobSubmitter (main): number of splits:26

If I understand then it means that the speculative execution is still going on, otherwise if I only have 16 input files, why is there a 26 split. Am I wrong?

Note: I believe I’m

using the new API because I’m seeing these warnings in the logs:

2014-04-24 10:06:21,590 INFO org.apache.hadoop.conf.Configuration.deprecation (main): mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files

Solution

“16 files = 16 mappers” This is a false assumption.

“16 files = minimum 16 mappers” This is correct.

If some of the 16 files are larger than the block size, they will be split into multiple mappers. Therefore, your 16 file generation 26 mappers may not be due to speculative execution.

Setting a value in a Conf certainly works. You can verify this by checking your job.xml

Related Problems and Solutions