MalformedURLException appears when reading files from HDFS… here is a solution to the problem.
MalformedURLException appears when reading files from HDFS
I have the following test program to read files from HDFS.
public class FileReader {
public static final String NAMENODE_IP = "172.32.17.209";
public static final String FILE_PATH = "/notice.html";
public static void main(String[] args) throws MalformedURLException,
IOException {
String url = "hdfs://" + NAMENODE_IP + FILE_PATH;
InputStream is = new URL(url).openStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
String line = br.readLine();
while(line != null) {
System.out.println(line);
line = br.readLine();
}
}
}
It gives java.net.MalformedURLException
Exception in thread "main" java.net.MalformedURLException: unknown protocol: hdfs
at java.net.URL.<init>(URL.java:592)
at java.net.URL.<init>(URL.java:482)
at java.net.URL.<init>(URL.java:431)
at in.ksharma.hdfs.FileReader.main(FileReader.java:29)
Solution
Register a URL handler for Hadoop. Standard URL handlers don’t know how to handle hdfs:// scenarios.
Try this :
public static void main(String[] args) throws MalformedURLException,
IOException {
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
String url = "hdfs://" + NAMENODE_IP + FILE_PATH;
InputStream is = new URL(url).openStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
String line = br.readLine();
while(line != null) {
System.out.println(line);
line = br.readLine();
}
}