Java – How do I get metadata information for an HDFS server in the java client?

How do I get metadata information for an HDFS server in the java client?… here is a solution to the problem.

How do I get metadata information for an HDFS server in the java client?

I need to build a utility class to test the connection to HDFS. The test should display the server-side version of HDFS and any additional metadata. Although, there are many client-side demos available, but there are no demos on extracting server metadata. Can someone help?

Note that my client is a remote java client and does not have a hadoop and HDFS configuration file to initialize the configuration. I need to do this by dynamically connecting to the HDFS name node service using its URL.

Solution

Hadoop exposes some information over HTTP that you can use. See Cloudera’s article.
Probably the easiest way is to connect to the NN UI and parse the content
Server returns:

URL url = new URL("http://myhost:50070/dfshealth.jsp");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
...

On the other hand, if you know the addresses of NN and JT, you can connect to them
Use a simple client like this (Hadoop 0.20.0-r1056497):

import java.net.InetSocketAddress;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hdfs.DFSClient;
import org.apache.hadoop.hdfs.protocol.ClientProtocol;
import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
import org.apache.hadoop.hdfs.protocol.FSConstants.DatanodeReportType;
import org.apache.hadoop.mapred.ClusterStatus;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.util.VersionInfo;

public class NNConnTest {

private enum NNStats {

STATS_CAPACITY_IDX(0, 
                "Total storage capacity of the system, in bytes: ");
        //... see org.apache.hadoop.hdfs.protocol.ClientProtocol 

private int id;
        private String desc;

private NNStats(int id, String desc) {
            this.id = id;
            this.desc = desc;
        }

public String getDesc() {
            return desc;
        }

public int getId() {
            return id;
        }

}

private enum ClusterStats {

see org.apache.hadoop.mapred.ClusterStatus API docs
        USED_MEM {
            @Override
            public String getDesc() {
                String desc = "Total heap memory used by the JobTracker: ";
                return desc + clusterStatus.getUsedMemory();
            }
        };

private static ClusterStatus clusterStatus;
        public static void setClusterStatus(ClusterStatus stat) {
            clusterStatus = stat;
        }

public abstract String getDesc();
    }

public static void main(String[] args) throws Exception {

InetSocketAddress namenodeAddr = new InetSocketAddress("myhost",8020);
        InetSocketAddress jobtrackerAddr = new InetSocketAddress("myhost",8021);

Configuration conf = new Configuration();

query NameNode
        DFSClient client = new DFSClient(namenodeAddr, conf);
        ClientProtocol namenode = client.namenode;
        long[] stats = namenode.getStats();

System.out.println("NameNode info: ");
        for (NNStats sf : NNStats.values()) {
            System.out.println(sf.getDesc() + stats[sf.getId()]);
        }

query JobTracker
        JobClient jobClient = new JobClient(jobtrackerAddr, conf); 
        ClusterStatus clusterStatus = jobClient.getClusterStatus(true);

System.out.println("\nJobTracker info: ");
        System.out.println("State: " + 
                clusterStatus.getJobTrackerState().toString());

ClusterStats.setClusterStatus(clusterStatus);
        for (ClusterStats cs : ClusterStats.values()) {
            System.out.println(cs.getDesc());
        }

System.out.println("\nHadoop build version: " 
                + VersionInfo.getBuildVersion());

query Datanodes
        System.out.println("\nDataNode info: ");
        DatanodeInfo[] datanodeReport = namenode.getDatanodeReport(
                DatanodeReportType.ALL);
        for (DatanodeInfo di : datanodeReport) {
            System.out.println("Host: " + di.getHostName());
            System.out.println(di.getDatanodeReport());
        }

}

}

Ensure that your clients should use the same version of Hadoop as your cluster, otherwise EOFException can occur.

Related Problems and Solutions