Java – When copying a file to HDFS, how do I control which nodes the file resides on?

When copying a file to HDFS, how do I control which nodes the file resides on?… here is a solution to the problem.

When copying a file to HDFS, how do I control which nodes the file resides on?

I’m working on a weird use case where I need to make sure file A is local to machine A, file B is local to machine B, etc. When copying a file to HDFS, is there a way to control which machines the file will reside on? I know that any given file will be copied on three machines, but I need to be able to say “file A definitely exists on machine A”. I don’t care much about the other two machines – they can be any machine on my cluster.

Thank you.

Solution

I don’t think so, because normally when the file is larger than 64MB (block size), the primary copy of the file block will reside on multiple servers.

Related Problems and Solutions