Python – Using multiple mapper inputs in one stream job on hadoop?

Using multiple mapper inputs in one stream job on hadoop?… here is a solution to the problem.

Using multiple mapper inputs in one stream job on hadoop?

In java I would use:

MultipleInputs.addInputPath(conf, path, inputFormatClass, mapperClass)

Add a different mapper for each input.

I

now write a streaming job in Hadoop in python, can I make a similar job?

Solution

You can use the multiple -input option to specify multiple input paths:

hadoop jar hadoop-streaming.jar -input foo.txt -input bar.txt ...

Related Problems and Solutions