Hadoop gets relative paths from absolute and base paths
I want to get a relative path from an absolute path given an absolute base path. Are there any Hadoop Java APIs that can do this?
For example, if my absolute HDFS path is abs_path = hdfs://name-node/level1/level2/level3
and my absolute base path is abs_base_path = hdfs://name -node/level1
, I want to start with abs_path
Extract the relative path, i.e. rel_path = level2/level3
. I’m familiar with using the path constructor to combine two paths.
For example, if I have rel_path
and abs_base_path
, I can build it using one of the overloaded constructors in the Path class http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/Path
abs_path
But I can’t find an API to do the opposite.
Solution
This is actually done in the source code of FileOutputCommitter
. The related function is
/**
* Find the final name of a given output file, given the job output directory
* and the work directory.
* @param jobOutputDir the job's output directory
* @param taskOutput the specific task output file
* @param taskOutputPath the job's work directory
* @return the final path for the specific output file
* @throws IOException
*/
private Path getFinalPath(Path jobOutputDir, Path taskOutput,
Path taskOutputPath) throws IOException {
URI taskOutputUri = taskOutput.toUri();
URI relativePath = taskOutputPath.toUri().relativize(taskOutputUri);
if (taskOutputUri == relativePath) {
throw new IOException("Can not get the relative path: base = " +
taskOutputPath + " child = " + taskOutput);
}
if (relativePath.getPath().length() > 0) {
return new Path(jobOutputDir, relativePath.getPath());
} else {
return jobOutputDir;
}
}
The idea is to create a URI for the base directory and then create a new path for this new relativized URI.
Hope this helps.