Java – Distributed job scheduling, management, and reporting

Distributed job scheduling, management, and reporting… here is a solution to the problem.

Distributed job scheduling, management, and reporting

I recently played Hadoop and was impressed with the scheduling, management, and reporting of its MapReduce jobs. It seems to make the distribution and execution of new work very seamless, allowing developers to focus on the implementation of their work.

I wonder if there is something in the Java world that is not easily represented as a MapReduce problem for distributed job execution? For example:

  • Work that requires task coordination and synchronization. For example, they may involve sequential execution of tasks, but concurrent execution of some tasks is possible:

                   .-- B --.
            .--A --|       |--.
            |      '-- C --'  |
    Start --|                 |-- Done
            |                 |
            '--D -------------'
    
  • CPU-intensive tasks that you want to allocate but do not provide any output to reduce – such as image conversion/resizing.

So is there a Java framework/platform that provides such a distributed computing environment? Or is this kind of thing acceptable/achievable with Hadoop – and if so, are there any patterns/guidelines for this type of work?

Solution

I’ve found Spring Batch and Spring Batch Integration, which seems to satisfy many of my requirements. I’ll let you know how I’m going.

Related Problems and Solutions