Java – HBase: MiniDFSCluster.java fails in some environments

HBase: MiniDFSCluster.java fails in some environments… here is a solution to the problem.

HBase: MiniDFSCluster.java fails in some environments

I’m writing some code to access HBase, I’m writing unit tests to create a MiniDFSCluster as part of the test setup.

(defn test-config [& options]
    (let [testing-utility (HBaseTestingUtility.)]
        (.startMiniCluster testing-utility 1)
        (let [config (.getConfiguration testing-utility)]
            (if (not= options nil)
                (doseq [[key value] options]
                    (.set config key value)))
            config)))

;; For those who don't read Clojure, lines 2 and 3 cause 
;; the failure and are equivalent to the following Java
;; 
;; HBaseTestingUtility testingUtility = new HBaseTestingUtility();
;; testingUtility.startMiniCluster(1);   blows up on Linux but not Mac OSX

This works fine on Mac OSX with Java HotSpot:

$ java -version
java version "1.6.0_51"
Java(TM) SE Runtime Environment (build 1.6.0_51-b11-457-11M4509)
Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01-457, mixed mode)

$ lein test

lein test hbase.config-test

lein test hbase.table-test
2013-07-12 17:44:13.488 java[27384:1203] Unable to load realm info from SCDynamicStore
Starting DataNode 0 with dfs.data.dir: /Users/dwilliams/Desktop/Repos/mobiusinversion/hbase/target/test-data/fe0199fd-0168-48d9-98ce-b4a5e62d3257/dfscluster_ bbad1095-58d1-4571-ba12-4d4f1c24203f/dfs/data/data1, /Users/dwilliams/Desktop/Repos/mobiusinversion/hbase/target/test-data/fe0199fd-0168-48d9-98ce-b4a5e62d3257/dfscluster_bbad1095-58d1-4571-ba12-4d4f1c24203f/dfs/data/ data2
Cluster is active

Ran 11 tests containing 14 assertions.
0 failures, 0 errors.

However, when running in a Linux environment, the following error occurs:

ERROR in (create-table) (MiniDFSCluster.java:426)
Uncaught exception, not in assertion.
expected: nil
  actual: java.lang.NullPointerException: null
 at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes (MiniDFSCluster.java:426)
    org.apache.hadoop.hdfs.MiniDFSCluster.<init> (MiniDFSCluster.java:284)
    org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster (HBaseTestingUtility.java:444)
    org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster (HBaseTestingUtility.java:612)
    org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster (HBaseTestingUtility.java:568)
    org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster (HBaseTestingUtility.java:555)

I

submitted a travis-ci ticket because it came up there first, which I think is probably due to their circumstances.

https://github.com/travis-ci/travis-ci/issues/1240

However, after discussing with travis support, I was able to reproduce the bug on CentOS. I tried both Sun JDK and OpenJDK on Linux, but both produced the same error. What’s going on here? Is this a trivial configuration issue? Maybe something not set in Linux ENV is set in ENV in Mac OSX?

If you want to run tests, clone repo

https://github.com/mobiusinversion/hbase

Then run the LEIN test. Thank you very much for your help!

Update:

Submit this HBASE Jira ticket

https://issues.apache.org/jira/browse/HBASE-8944

Solution

Short answer: Set “umask 022” before running the test.

Long answer: This is a common environmental problem with running MiniDFSCluster from Hadoop 1.x release, which HBaseTestingUtility uses internally. It has been effectively fixed in Hadoop 0.22+ (including 2.0+, but not currently excluding 1.x).

The fundamental problem is https://issues.apache.org/jira/browse/HDFS-2556

When MiniDFSCluster starts, it creates a temporary storage directory (configured as “dfs.data.dir”) for the data node process. These will be created using the umask you currently set up. When each data node starts, it checks that the directory configured in “dfs.data.dir” exists and that the directory permissions match the expected value (set to “dfs.datanode.data.dir.perm”). If the directory permissions do not match the expected value (default is “755”), the data node process exits.

By default, in Hadoop 1.x, this value is set to “755”, so if you set umask to “022”, the data directory will get the correct permissions. However, if the permissions do not match the expected values, the data node is aborted and you see an error similar to the following in the test log file:

WARN  [main] datanode. DataNode(1577): Invalid directory in dfs.data.dir: Incorrect permission for /.../dfs/data/data2, expected: rwxr-xr-x, while actual: rwxrwxr-x

In future versions of Hadoop, if the directory permissions do not match, the datanode will attempt to change the directory permissions to the expected values. The Datanode aborts only if this operation fails. HDFS-2556 recommends backporting this change to version 1.x, but it has not been fixed.

Related Problems and Solutions