Do Mappers and Reducers in Hadoop have to be static classes?
I tried to do something simple in Hadoop and found that when writing mappers and reducers, everywhere is defined as static. My task will be broken down into several
map parts and a final
reduce. If I define the mapper class as internal
static, can I use it in other work? Also, important issues may require more and more complex mappers, so it can be bad to put them all in one huge file when maintaining.
Is there any way to get the mapper and reducer to be regular classes (maybe even in a separate jar) instead of the job itself?
Your question is whether the class has to be static, can it be static, can it be internal, or should it be internal?
Hadoop itself needs to be able to instantiate your
Reducer through reflection, given the class reference/name configured in your
job. If it is a non-static inner class, this will fail because an instance can only be created in the context of some of your other classes that may not be known to Hadoop. (I guess unless the inner class extends its closed class.) )
So answer the first question: it shouldn’t be non-static, as that would almost certainly make it unusable. Answer the second and third: it can be a static (internal) class.
Reducer is clearly a top-level concept that deserves a top-level class. Some people like to have them internally static to pair them with the “runner” class. I don’t like this because it’s really what the subpacks are for. You notice another design reason to avoid this. For the fourth question: No, I don’t think inner classes are good practice.
One final question: Yes,
the Mapper and
Reducer classes can be in separate JAR files. You tell Hadoop which JAR files contain all this code, and that’s what it’s going to send to the workers. worker doesn’t need your
work. However, they need anything
that Mapper and
Reducer depend on in their same JAR.