Java – Hbase CopyTable in Java

Hbase CopyTable in Java… here is a solution to the problem.

Hbase CopyTable in Java

I want to copy an Hbase table to another location that performs well.

I want to reuse data from code in CopyTable.java of the Hbase-server github page

I’ve been looking for documentation for hbase, but it doesn’t help me much http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CopyTable.html

After checking out this post from StackOverflow: Can a main() method of class be invoked in another class in java

I

guess I can use its main class directly to call it.

Question: Do you think it is better to complete this copy than to use CopyTable in hbase-server? Do you see any inconvenience in using this CopyTable?

Solution

Question: Do you think anyway better to get this copy done rather than
using CopyTable from hbase-server ? Do you see any inconvenience using
this CopyTable ?

First thing is snapshot is better way than CopyTable.

  • HBase snapshots allow you to take snapshots of tables without much impact on regional servers. Snapshot, cloning, and restore operations do not involve data replication. Also, exporting a snapshot to another cluster has no impact on regional servers.

Prior to version 0.94.6, the only way to back up or clone a table was to use CopyTable/ExportTable, or copy all hfiles in HDFS after disabling the table. The disadvantage of these methods is that you can reduce regional server performance (copy/export table) or you need to disable the table, which means there are no reads or writes; This is usually Not Acceptable.

See also Snapshots+and+ Repeatable+reads+for+HBase+Tables

Snapshot Internals


Another way to map reduce copyTable:

You can implement something like the following in your code, which is for a standalone program, because you wrote a MapReduce job to bulk insert multiple drop records (perhaps 100,000).

This improves the performance of plugging into the HBase client independently, which you can try as MapReduce

public void addMultipleRecordsAtaShot(final ArrayList<Put> puts, final String tableName) throws Exception {
        try {
            final HTable table = new HTable(HBaseConnection.getHBaseConfiguration(), getTable(tableName));
            table.put(puts);
            LOG.info("INSERT record[s] " + puts.size() + " to table " + tableName + " OK.");
        } catch (final Throwable e) {
            e.printStackTrace();
        } finally {
            LOG.info("Processed ---> " + puts.size());
            if (puts != null) {
                puts.clear();
            }
        }
    }

In addition to this, you can also consider the following….

Enable a write buffer that is larger than the default

1) table.setAutoFlush(false)

2) Set the buffer size

<property>
         <name>hbase.client.write.buffer</name>
         <value>20971520</value> // you can double this for better performance 2 x 20971520 = 41943040
 </property>
             OR

void setWriteBufferSize(long writeBufferSize) throws IOException

The buffer will only be flushed twice:
Explicit refresh
Use the flushCommits() call to send data to the server for permanent storage.

Implicit refresh
This fires when you call put() or setWriteBufferSize().
Both calls compare the currently used buffer size to the configured limit and optionally call the flushCommits() method.

With the entire buffer disabled, setting setAutoFlush(true) forces the client to call the flush method every time it calls put().

Related Problems and Solutions