Java – With Hadoop, how do I change the number of mappers for a given job?

With Hadoop, how do I change the number of mappers for a given job?… here is a solution to the problem.

With Hadoop, how do I change the number of mappers for a given job?

So I have two assignments, Job A and Job B. For Job A, I want to have up to 6 mappers per node. However, Job B is a bit different. For Job B, I can only run one mapper per node. The reason for this is not important – we are simply saying that the request is non-negotiable. I want to tell Hadoop, “For Job A, schedule up to 6 mappers per node. However, for Job B, a maximum of 1 mapper is scheduled per node. “Is this possible?

The only solution I can think of is:

1) There are two folders outside the Hadoop home folder, conf. JobA and conf. JobB。 Each folder has its own copy of mapred-site.xml. conf. JobA/mapred-site.xml has a value of 6 for mapred.tasktracker.map.tasks.maximum. conf. JobB/mapred-site.xml has a value of 1 for mapred.tasktracker.map.tasks.maximum.

2) Before I run Job A:

2a) Close my task tracker

2b) Merge conf. JobA/mapred-site.xml Copy to the Hadoop conf folder, replacing the existing mapred-site .xml there

2c) Restart my tasktrackers

2d) Wait for the task tracker to finish starting

3) Run Job A

Then do something similar when I need to run job B.

I don’t really like this solution; It looks clumsy and prone to failure. Is there a better way to accomplish what I need to do?

Solution

In the Java code of the custom jar itself, you can set this configuration mapred.tasktracker.map.tasks.maximum for two jobs.

Do something like this:

Configuration conf = getConf();

 set number of mappers
conf.setInt("mapred.tasktracker.map.tasks.maximum", 4);

Job job = new Job(conf);

job.setJarByClass(MyMapRed.class);
job.setJobName(JOB_NAME);

job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(MapJob.class);
job.setMapOutputKeyClass(Text.class);
job.setReducerClass(ReduceJob.class);
job.setMapOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.setInputPaths(job, args[0]);

boolean success = job.waitForCompletion(true);
return success ? 0 : 1;

Edit:

You also need to set the property mapred.map.tasks to a value derived from
The following formula (mapred.tasktracker.map.tasks.maximum * the number of tasktracker nodes
in your cluster).

Related Problems and Solutions