Java – Access kerberos-protected WebHDFS without SPnego

Access kerberos-protected WebHDFS without SPnego… here is a solution to the problem.

Access kerberos-protected WebHDFS without SPnego

I have a working application that manages HDFS using WebHDFS.
I need to be able to do this on a Kerberos security cluster.

The problem is, there is no library or extension to negotiate tickets for my app, I only have a basic HTTP client.

Is it possible to create a Java service to handle the ticket exchange and pass it to the application for HTTP requests once it gets a service ticket?
In other words, my application would ask for a Java service negotiation ticket, and it would return the service ticket as a string or raw string to my application, which would then append it to the HTTP request?

EDIT: Is there a similar elegant solution like the one @SamsonScharfrichter describes for HTTPfs? (As far as I know, it doesn’t support delegate tokens

.)

EDIT2: Hi everyone, I’m still completely lost. I tried to figure out the Hadoop-auth client without any luck. Can you help me again? I’ve spent hours reading it without luck.
The example says to do this:

* // establishing an initial connection
*
* URL url = new URL("http://foo:8080/bar");
* AuthenticatedURL.Token token = new AuthenticatedURL.Token();
* AuthenticatedURL aUrl = new AuthenticatedURL();
* HttpURLConnection conn = new AuthenticatedURL(url, token).openConnection();
* ....
* // use the 'conn' instance
* ....

I’m already lost here. What do I need for an initial connection? How can it be

new AuthenticatedURL(url, token).openConnection();

Take two parameters? There is no constructor in this case. (hence I get the error). Shouldn’t the principal be designated somewhere? It may not be that easy.

    URL url = new URL("http://<host>:14000/webhdfs/v1/?op=liststatus");
    AuthenticatedURL.Token token = new AuthenticatedURL.Token();

HttpURLConnection conn = new AuthenticatedURL(url, token).openConnection(url, token);

Solution

Using Java code plus the Hadoop Java API to open a Kerberized session, get a delegate token for the session, and pass that token to other applications—as @tellisnz suggests—has one downside: Java APIs require quite a few dependencies (i.e. a lot of JAR, plus Hadoop native libraries). If you run your app on Windows, it’s a tough journey.

Another option is to use Java code plus WebHDFS to run a single SPNEGOed query and get a delegate token and pass it to another application — this option absolutely does not require a Hadoop library on your server. The barebones version will be like sthg

URL urlGetToken = new URL("http://<host>:<port>/webhdfs/v1/?op=GETDELEGATIONTOKEN") ;
HttpURLConnection cnxGetToken =(HttpURLConnection) urlGetToken.openConnection() ;
BufferedReader httpMessage = new BufferedReader( new InputStreamReader(cnxGetToken.getInputStream()), 1024) ;
Pattern regexHasToken =Pattern.compile("urlString[\": ]+(.[ ^\" ]+)") ;
String httpMessageLine ;
while ( (httpMessageLine =httpMessage.readLine()) != null)
{ Matcher regexToken =regexHasToken.matcher(httpMessageLine) ;
  if (regexToken.find())
  { System.out.println("Use that template: http://<Host>:<Port>/webhdfs/v1%AbsPath%?delegation=" +regexToken.group(1) +"&op=...") ; }
}
httpMessage.close() ;

This is how I use to access HDFS from Windows Powershell scripts (and even Excel macros). Warning: For Windows, you must create a Kerberos TGT dynamically by passing a JAAS configuration to the JVM that points to the appropriate key table file. Either way, this warning also applies to Java APIs.

Related Problems and Solutions