Java – MalformedURLException appears when reading files from HDFS

MalformedURLException appears when reading files from HDFS… here is a solution to the problem.

MalformedURLException appears when reading files from HDFS

I have the following test program to read files from HDFS.

public class FileReader {
    public static final String NAMENODE_IP = "172.32.17.209";
    public static final String FILE_PATH = "/notice.html";

public static void main(String[] args) throws MalformedURLException,
            IOException {
        String url = "hdfs://" + NAMENODE_IP + FILE_PATH;

InputStream is = new URL(url).openStream();
        InputStreamReader isr = new InputStreamReader(is);
        BufferedReader br = new BufferedReader(isr);
        String line = br.readLine();
        while(line != null) {
            System.out.println(line);
            line = br.readLine();
        }
    }
}

It gives java.net.MalformedURLException

Exception in thread "main" java.net.MalformedURLException: unknown protocol: hdfs
    at java.net.URL.<init>(URL.java:592)
    at java.net.URL.<init>(URL.java:482)
    at java.net.URL.<init>(URL.java:431)
    at in.ksharma.hdfs.FileReader.main(FileReader.java:29)

Solution

Register a URL handler for Hadoop. Standard URL handlers don’t know how to handle hdfs:// scenarios.

Try this :

public static void main(String[] args) throws MalformedURLException,
            IOException {
        URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());

String url = "hdfs://" + NAMENODE_IP + FILE_PATH;

InputStream is = new URL(url).openStream();
        InputStreamReader isr = new InputStreamReader(is);
        BufferedReader br = new BufferedReader(isr);
        String line = br.readLine();
        while(line != null) {
            System.out.println(line);
            line = br.readLine();
        }
    }

Related Problems and Solutions