I wonder if it is possible to define a hierarchical MapReduce job?.
In other words, I want a map-reduce job that will call a different MapReduce job during the mapper phase. Is it possible? Do you have any suggestions for how to do this?
I want to do this in order to have a higher level of parallelism/distribution in my program.
Hadoop definitive guide book contains many recipes related to the MapReduce job chain. Sample code and detailed instructions are included. In particular, a section called “Advanced API Usage” or something similar.
I personally managed to replace a complex map-reduce job with several HBase tables that were used as a source with a handcrafted
TableInputFormat extension. The result is an input format that combines the source data with minimal reduction, so the job is converted into a single mapper step. So I suggest you look in this direction as well.