Java – Submit multiple hadoop jobs via Java

Submit multiple hadoop jobs via Java… here is a solution to the problem.

Submit multiple hadoop jobs via Java

I need to submit multiple jobs to Hadoop that are all related (that’s why they’re started by the same driver class) but completely independent of each other. Now I start working like this:

int res = ToolRunner.run(new Configuration(), new MapReduceClass(params), args);

It runs a job, gets the return code, and continues.

What I want to do is submit several of these jobs to run in parallel, retrieving the return code for each job.

The obvious idea (to me) is to start multiple threads, each responsible for a hadoop job, but I wonder if there is a better way for hadoop to accomplish this? I have no experience writing concurrent code, so I’d rather not spend a lot of time learning its intricacies unless necessary.

Solution

This may be a suggestion, but implies the code, so I’ll take it as an answer.

In this code (personal code), I just iterated through some variables and submitted an assignment multiple times (the same job).

Using job.waitForCompletion(false) will help you submit multiple jobs.

while (processedInputPaths < inputPaths.length) {

if (processedInputPaths + inputPathsLimit < inputPaths.length) {
        end = processedInputPaths + inputPathsLimit - 1;
    } else {
        end = inputPaths.length - 1;
    }
    start = processedInputPaths;

Job job = this.createJob(configuration, inputPaths, cycle, start, end, outputPath + "/" + cycle);

boolean success = job.waitForCompletion(true);

if (success) {
        cycle++;
        processedInputPaths = end + 1;
    } else {
        LOG.info("Cycle did not end successfully :" + cycle);
        return -1;
    }

}

Related Problems and Solutions