Hadoop cluster: map task run only on one machine and not all
I have a Hadoop cluster
of three machines, one of which acts as both master and slave.
When I run wordcount-example, It runs Map Task on two machines – Worker1 and Worker2
. But when I run my own code, it only runs on one machine —
worker1
, how can I get map task to run on all machines?
Input Split Locations
/default-rack/master
/default-rack/worker1
/default-rack/worker2
Fixed !!!
I added the following in the config of mapred-site.xml
and fixed it
<property>
<name>mapred.map.tasks</name>
<value>100</value>
</property>
Solution
How big is your input? Hadoop splits the job into input splits, and if your file is too small, it will have only one split.
Try a larger file – say about 1GB in size – and see how you get the mapper.
You can also check to make sure that each TaskTracker is correctly reported to JobTracker. If there is a TaskTracker that is not properly connected, it will not fetch the task:
$ hadoop job -list-active-trackers
This command should output all 3 hosts.