Java – Run the MapReduce java program on a hadoop cluster

Run the MapReduce java program on a hadoop cluster… here is a solution to the problem.

Run the MapReduce java program on a hadoop cluster

I’m learning to work on a hadoop cluster. I worked on hadoop streaming for a while, I wrote map-reduce scripts in perl/python and ran this work.
However, I didn’t find any good explanation for running the java map reduce job.
For example:
I have the following program-

http://www.infosci.cornell.edu/hadoop/wordcount.html

Can someone tell me how to actually compile this program and run this job?

Solution

Create a directory to hold the compiled class:

mkdir WordCount_classes

Compile your class:

javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d WordCount_classes WordCount.java

Create a jar file from the compiled class:

jar -cvf $HOME/code/hadoop/WordCount.jar -C WordCount_classes/ .

Create a directory for your input, copy all input files to it, and run your job as follows:

bin/hadoop jar $HOME/code/WordCount.jar WordCount ${INPUTDIR} ${OUTPUTDIR}

THE

OUTPUT OF THE JOB IS PLACED IN THE ${OUTPUTDIR} DIRECTORY. This directory is created by a Hadoop job, so make sure it doesn’t exist before running the job.

See here for a complete example.

Related Problems and Solutions