Java – Hadoop directory with spaces

Hadoop directory with spaces… here is a solution to the problem.

Hadoop directory with spaces

I’m having trouble providing Hadoop with directories that contain spaces.

For example

inputDir = /abc/xyz/folder name/abc.txt

Hadoop somehow didn’t know that “folder name” was the name of a folder with spaces between words.

I

get the following error when I do this

java.io.FileNotFoundException: File does not exist: /abc/xyz/folder

Also, I tried to provide the encoded URL.

java.io.FileNotFoundException: File does not exist: /abc/xyz/folder%20name/abc.txt

But still throws the same error.

Does anyone know a solution to this problem?

Thanks for any help.

Solution

Replacing spaces with %20 works for the Hadoop shell. Such as

sed 's/ /\%20/g'

In the actual put command

hadoop fs -put "$inputDir" $putDest

Without %20, you will get a URI exception. (This gives me a clue to using %20 on the escape character \.) )

I know you’re doing it through Java. The fact that you get a java.io.FileNotFoundException makes me wonder if the code is doing something else with inputDir than just as a parameter to hadoop put, or the equivalent command for put. If it does any kind of check on inputDir outside of the Hadoop command, it will fail. Java treats this as a path. Hadoop treats it as a URI.

Related Problems and Solutions