Java – HBase scan operation cache

HBase scan operation cache… here is a solution to the problem.

HBase scan operation cache

What is the difference between setCiting and setBatch on the HBase scanning mechanism?
What do I have to use to get the best performance during scanning large amounts of data?

Solution

Unless you have very wide tables with many columns (or very large columns), you should forget about setBatch() altogether and focus on setCaching():


setCaching(int cache).

Sets the number of cached rows that will be delivered to the scanner. If not, the configuration settings HConstants.HBASE_CLIENT_SCANNER_CACHING are applied. A higher cache value enables a faster scanner but uses more memory.

setBatch(int batch)

Sets the maximum number of values returned by next() per call


setBatch is about the number of row values that should be returned per call/iteration. Here’s a good post about it: http://blog.jdwyah.com/2013/08/hbase-scan-batch-vs-cache.html

Related Problems and Solutions