Hbase CopyTable in Java
I want to copy an Hbase table to another location that performs well.
I want to reuse data from code in CopyTable.java of the Hbase-server github page
I’ve been looking for documentation for hbase, but it doesn’t help me much http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CopyTable.html
After checking out this post from StackOverflow: Can a main() method of class be invoked in another class in java
I
guess I can use its main class directly to call it.
Question: Do you think it is better to complete this copy than to use CopyTable in hbase-server? Do you see any inconvenience in using this CopyTable?
Solution
Question: Do you think anyway better to get this copy done rather than
using CopyTable from hbase-server ? Do you see any inconvenience using
this CopyTable ?
First thing is snapshot is better way than CopyTable
.
- HBase snapshots allow you to take snapshots of tables without much impact on regional servers. Snapshot, cloning, and restore operations do not involve data replication. Also, exporting a snapshot to another cluster has no impact on regional servers.
Prior to version 0.94.6, the only way to back up or clone a table was to use CopyTable/ExportTable, or copy all hfiles in HDFS after disabling the table. The disadvantage of these methods is that you can reduce regional server performance (copy/export table) or you need to disable the table, which means there are no reads or writes; This is usually Not Acceptable.
- Snapshot is not just rename, between multiple operations if you want to restore at one particular point then this is the right case to use :
A snapshot is a set of metadata information that allows an administrator to return to a previous state of a table. A snapshot is not a copy of a table; It is just a list of file names and does not copy data. A full snapshot restore means that you return to your previous Table Schema and your previous data loses any changes made since the snapshot was taken.
See also Snapshots+and+ Repeatable+reads+for+HBase+Tables
Another way to map reduce copyTable
:
You can implement something like the following in your code, which is for a standalone program, because you wrote a MapReduce job to bulk insert multiple drop records (perhaps 100,000).
This improves the performance of plugging into the HBase client independently, which you can try as MapReduce
public void addMultipleRecordsAtaShot(final ArrayList<Put> puts, final String tableName) throws Exception {
try {
final HTable table = new HTable(HBaseConnection.getHBaseConfiguration(), getTable(tableName));
table.put(puts);
LOG.info("INSERT record[s] " + puts.size() + " to table " + tableName + " OK.");
} catch (final Throwable e) {
e.printStackTrace();
} finally {
LOG.info("Processed ---> " + puts.size());
if (puts != null) {
puts.clear();
}
}
}
In addition to this, you can also consider the following….
Enable a write buffer that is larger than the default
1) table.setAutoFlush(false)
2) Set the buffer size
<property>
<name>hbase.client.write.buffer</name>
<value>20971520</value> // you can double this for better performance 2 x 20971520 = 41943040
</property>
OR
void setWriteBufferSize(long writeBufferSize) throws IOException
The buffer will only be flushed twice:
Explicit refresh
Use the flushCommits()
call to send data to the server for permanent storage.
Implicit refresh
This fires when you call put()
or setWriteBufferSize().
Both calls compare the currently used buffer size to the configured limit and optionally call the flushCommits()
method.
With the entire buffer disabled, setting setAutoFlush(true)
forces the client to call the flush method every time it calls put().