Java – Hadoop gets relative paths from absolute and base paths

Hadoop gets relative paths from absolute and base paths… here is a solution to the problem.

Hadoop gets relative paths from absolute and base paths

I want to get a relative path from an absolute path given an absolute base path. Are there any Hadoop Java APIs that can do this?

For example, if my absolute HDFS path is abs_path = hdfs://name-node/level1/level2/level3 and my absolute base path is abs_base_path = hdfs://name -node/level1, I want to start with abs_path Extract the relative path, i.e. rel_path = level2/level3. I’m familiar with using the path constructor to combine two paths.

For example, if I have rel_path and abs_base_path, I can build it using one of the overloaded constructors in the Path class http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/Path abs_path But I can’t find an API to do the opposite.

Solution

This is actually done in the source code of FileOutputCommitter. The related function is

   /**
   * Find the final name of a given output file, given the job output directory
   * and the work directory.
   * @param jobOutputDir the job's output directory
   * @param taskOutput the specific task output file
   * @param taskOutputPath the job's work directory
   * @return the final path for the specific output file
   * @throws IOException
   */
  private Path getFinalPath(Path jobOutputDir, Path taskOutput, 
                            Path taskOutputPath) throws IOException {
    URI taskOutputUri = taskOutput.toUri();
    URI relativePath = taskOutputPath.toUri().relativize(taskOutputUri);
    if (taskOutputUri == relativePath) {
      throw new IOException("Can not get the relative path: base = " + 
          taskOutputPath + " child = " + taskOutput);
    }
    if (relativePath.getPath().length() > 0) {
      return new Path(jobOutputDir, relativePath.getPath());
    } else {
      return jobOutputDir;
    }
  }

The idea is to create a URI for the base directory and then create a new path for this new relativized URI.

Hope this helps.

Related Problems and Solutions