Run the MapReduce java program on a hadoop cluster
I’m learning to work on a hadoop cluster. I worked on hadoop streaming for a while, I wrote map-reduce scripts in perl/python and ran this work.
However, I didn’t find any good explanation for running the java map reduce job.
For example:
I have the following program-
http://www.infosci.cornell.edu/hadoop/wordcount.html
Can someone tell me how to actually compile this program and run this job?
Solution
Create a directory to hold the compiled class:
mkdir WordCount_classes
Compile your class:
javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d WordCount_classes WordCount.java
Create a jar file from the compiled class:
jar -cvf $HOME/code/hadoop/WordCount.jar -C WordCount_classes/ .
Create a directory for your input, copy all input files to it, and run your job as follows:
bin/hadoop jar $HOME/code/WordCount.jar WordCount ${INPUTDIR} ${OUTPUTDIR}
THE
OUTPUT OF THE JOB IS PLACED IN THE ${OUTPUTDIR} DIRECTORY. This directory is created by a Hadoop job, so make sure it doesn’t exist before running the job.
See here for a complete example.