How do I compile Hadoop for a 64-bit Linux machine?
I have downloaded the latest stable binaries for Hadoop (2.2.0). Just as I was initializing HDFS, I got this warning:
WARN util. NativeCodeLoader: Unable to load native-hadoop library for
your platform… using builtin-java classes where applicable
I
knew I could fix this by compiling from source, so I downloaded the source package from Hadoop. I know the basic process of compiling but get confused after reading the README. A quick google show I have to use maven for this, a tool for building java-based projects.
So my question is, how do I compile Hadoop from source using Maven? Should I go into each directory and compile each module? The step-by-step guide will be very helpful and we would appreciate it.
Solution
After extracting the source code, you will find a Super Pom in the following location.
\hadoop-2.2.0-src.tar\hadoop-2.2.0-src\hadoop-2.2.0-src\pom.xml
This will build all modules.
You can build: mvn clean install using the command
You should notice the following log.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] Apache Hadoop Main
[INFO] Apache Hadoop Project POM
[INFO] Apache Hadoop Annotations
[INFO] Apache Hadoop Project Dist POM
[INFO] Apache Hadoop Assemblies
[INFO] Apache Hadoop Maven Plugins
[INFO] Apache Hadoop Auth
[INFO] Apache Hadoop Auth Examples
[INFO] Apache Hadoop Common
[INFO] Apache Hadoop NFS
[INFO] Apache Hadoop Common Project
[INFO] Apache Hadoop HDFS
[INFO] Apache Hadoop HttpFS
[INFO] Apache Hadoop HDFS BookKeeper Journal
[INFO] Apache Hadoop HDFS-NFS
[INFO] Apache Hadoop HDFS Project
[INFO] hadoop-yarn
[INFO] hadoop-yarn-api
[INFO] hadoop-yarn-common
[INFO] hadoop-yarn-server
[INFO] hadoop-yarn-server-common
[INFO] hadoop-yarn-server-nodemanager
[INFO] hadoop-yarn-server-web-proxy
[INFO] hadoop-yarn-server-resourcemanager
[INFO] hadoop-yarn-server-tests
[INFO] hadoop-yarn-client
[INFO] hadoop-yarn-applications
[INFO] hadoop-yarn-applications-distributedshell
[INFO] hadoop-mapreduce-client
[INFO] hadoop-mapreduce-client-core
[INFO] hadoop-yarn-applications-unmanaged-am-launcher
[INFO] hadoop-yarn-site
[INFO] hadoop-yarn-project
[INFO] hadoop-mapreduce-client-common
[INFO] hadoop-mapreduce-client-shuffle
[INFO] hadoop-mapreduce-client-app
[INFO] hadoop-mapreduce-client-hs
[INFO] hadoop-mapreduce-client-jobclient
[INFO] hadoop-mapreduce-client-hs-plugins
[INFO] Apache Hadoop MapReduce Examples
[INFO] hadoop-mapreduce
And there’s more….
However, if you just want to use Hadoop, it’s a long process.
You should be able to use an existing library.
There could be some configuration issues.
The other option is Cloudera. I have installed it on RedHat Linux.
Good luck.