Using multiple mapper inputs in one stream job on hadoop?… here is a solution to the problem.
Using multiple mapper inputs in one stream job on hadoop?
In java I would use:
MultipleInputs.addInputPath(conf, path, inputFormatClass, mapperClass)
Add a different mapper for each input.
I
now write a streaming job in Hadoop in python, can I make a similar job?
Solution
You can use the multiple -input option to specify multiple input paths:
hadoop jar hadoop-streaming.jar -input foo.txt -input bar.txt ...