Java – Hadoop options have no effect (mapreduce.input.lineinputformat.linespermap, mapred.max.map.failures.percent)

Hadoop options have no effect (mapreduce.input.lineinputformat.linespermap, mapred.max.map.failures.percent)… here is a solution to the problem.

Hadoop options have no effect (mapreduce.input.lineinputformat.linespermap, mapred.max.map.failures.percent)

I’m trying to implement a MapReduce job where each mapper will take up 150 lines of text file and all mappers will run at the same time; Also, no matter how many map tasks fail, it should not fail.

Here is the configuration part:

        JobConf conf = new JobConf(Main.class);
        conf.setJobName("My mapreduce");

conf.set("mapreduce.input.lineinputformat.linespermap", "150");
        conf.set("mapred.max.map.failures.percent","100");

conf.setInputFormat(NLineInputFormat.class);

FileInputFormat.addInputPath(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

The problem is that hadoop creates a mapper for each line of text, they seem to run sequentially, and if one fails, the job fails.

From this, I infer that the settings I applied have no effect.

What am I doing wrong?

Solution

I’m assuming you’re using Hadoop 0.20. In 0.20, the configuration parameter is “mapred.line.input.format.linespermap” and you are using “mapreduce.input.lineinputformat.linespermap”. If no configuration parameter is set, it defaults to 1, so you see the behavior mentioned in the query.

This is the code snippet for 0.20 NLineInputFormat.

public void configure(JobConf conf) { N = conf.getInt(“mapred.line.input.format.linespermap”, 1); }

Hadoop configurations are sometimes really painful and not properly documented, and I’ve observed that configuration parameters also sometimes keep changing between versions. The best thing to do is to look at the code when you are unsure of some configuration parameters.

Related Problems and Solutions