Java – How to refresh tables in HBase

How to refresh tables in HBase… here is a solution to the problem.

How to refresh tables in HBase

I’m using HBase (0.98 in client, HBase 1.1.2 on server) and the underlying data is stored as HDFS.

I tried refreshing the table with the following code and was able to see the data flushed to the HFile location in Hadoop.

            htable.put(puts);
            htable.close();
            admin.flush(tableName);

Data location in Hadoop

./hadoop fs -du /hbase/data/default/tableName/ 

When I power off and restart the

node, restart Hadoop and HBase, I am able to see that the data in HDFS is corrupted.

If the data is properly flushed to HFile, why does it become corrupted during power off.

Do I need to make any changes to the code that refreshes the table?

Thank you
Hah

Solution

I got something similar a few years ago, and that’s due to sync problem. I can see the solution. Here Another description, there is a timing diagram of the put operation.

What happens in your case? Maybe that put is very small and ends up in memory storage, not in HFile, where you want to check if it is “corrupted”.

Try to write 25MB or more – as this is the page size of Hadoop and will trigger all writes. This way you can simply eliminate other problems. If it works – then you can try the storage policy or wait for more. Stupid suggestion, but note that there would be more writes in a normal system, so a full write to HFile would be triggered anyway. Another option is to enforce it, but your product could go bad by writing too much.

Related Problems and Solutions