Do Mappers and Reducers in Hadoop have to be static classes?
I tried to do something simple in Hadoop and found that when writing mappers and reducers, everywhere is defined as static. My task will be broken down into several map
parts and a final reduce
. If I define the mapper class as internal static
, can I use it in other work? Also, important issues may require more and more complex mappers, so it can be bad to put them all in one huge file when maintaining.
Is there any way to get the mapper and reducer to be regular classes (maybe even in a separate jar) instead of the job itself?
Solution
Your question is whether the class has to be static, can it be static, can it be internal, or should it be internal?
Hadoop itself needs to be able to instantiate your Mapper
or Reducer
through reflection, given the class reference/name configured in your job
. If it is a non-static inner class, this will fail because an instance can only be created in the context of some of your other classes that may not be known to Hadoop. (I guess unless the inner class extends its closed class.) )
So answer the first question: it shouldn’t be non-static, as that would almost certainly make it unusable. Answer the second and third: it can be a static (internal) class.
To me, Mapper
or Reducer
is clearly a top-level concept that deserves a top-level class. Some people like to have them internally static to pair them with the “runner” class. I don’t like this because it’s really what the subpacks are for. You notice another design reason to avoid this. For the fourth question: No, I don’t think inner classes are good practice.
One final question: Yes, the Mapper
and Reducer
classes can be in separate JAR files. You tell Hadoop which JAR files contain all this code, and that’s what it’s going to send to the workers. worker doesn’t need your work
. However, they need anything that Mapper
and Reducer
depend on in their same JAR.