Java – Hadoop: Can you silently discard a failed map task?

Hadoop: Can you silently discard a failed map task?… here is a solution to the problem.

Hadoop: Can you silently discard a failed map task?

I’m using hadoop MapReduce to process large amounts of data. The problem is that corrupted files occasionally cause the Map task to throw Java heap space errors or similar errors.

If possible, it’s best to throw away whatever the map task is doing, kill it, and get on with the work, not caring about the lost data. I don’t want the whole M/R job to fail because of this.

Is this feasible in Hadoop? How to achieve this?

Solution

You can modify the mapreduce.max.map.failures.percent parameter. The default value is 0. Increasing this parameter will allow a certain percentage of map tasks to fail without failing the job.

You can set this parameter in mapred-site.xml, which applies to all jobs, or on a job-by-job basis, which may be more secure.

Related Problems and Solutions