Unable to load libraries from Distributed Cache Hadoop
I’m doing this :
DistributedCache.createSymlink(job.getConfiguration());
DistributedCache.addCacheFile(new URI
("hdfs:/user/hadoop/harsh/libnative1.so"),job.getConfiguration());
In the mapper:
System.loadLibrary("libnative1.so");
(I tried it too.)
System.loadLibrary(“libnative1”);
System.loadLibrary(“native1”);
But I got this error :
java.lang.UnsatisfiedLinkError: no libnative1.so in java.library.path
I
had absolutely no idea that I should set java.library.path to:
I tried setting it to /home and copying each .so from the distributed cache to /home/ but it still doesn’t work 🙁
Any suggestions/solutions?
Solution
Use the Hadoop ToolRunner interface; This will enable you to add the shared library to the distributed cache via command-line arguments and set the Java library path correctly on the task node before the mapper starts. This is how I set up the mapper to use the shared library :
Have the job class (which contains the main() method) implement the org.apache.hadoop.util.Tool interface. Like this:
public class Job extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
/* create the Hadoop Job here */
}
public static void main(String[] args) {
int ret;
try {
ret = ToolRunner.run(new Job(), args);
} catch (Exception e) {
e.printStackTrace();
ret = -1;
}
System.exit(ret);
}
}
When you run a Hadoop job, provide all shared libraries (local copies) as command-line arguments. Make sure to list the actual files as well (if these are symbolic links). Before starting the job, Hadoop copies all files given in the -files parameter to the distributed cache.
hadoop jar Job.jar -files libnative1.so,libnative1.so.0,libnative1.so.0.1
Mapper does not require any special calls to set up java.library.path; it is handled by Hadoop.