Java - Why is scan.setCacheBlocks (false) recommended for mapReduce jobs?

Why is scan.setCacheBlocks (false) recommended for mapReduce jobs?… here is a solution to the problem.

Why is scan.setCacheBlocks (false) recommended for mapReduce jobs?

I understand why

scan.setCaching is good for the mapreduce job, but I don’t understand why setCacheBlocks(false) is bad. Does it overburden the server?

Solution

In short – yes, if you set blockcaching to true in your mapreduce job, it burdens the RegionServer.
When you use the MapReduce job primarily on an input scan, the most recently scanned input is discarded at the next mapping stage. Blockcache is LRUs: Put the data into the blockcache on the first request, exchange it when you find it useless on the second request, and continue. So RegionServer keeps exchanging data in the BlockCache without any gain. It’s just a lot of unnecessary IO usage.
However, in the case of normal reads, it is recommended to remain true to benefit from data locality.

Java – Why is scan.setCacheBlocks (false) recommended for mapReduce jobs?

Why is scan.setCacheBlocks (false) recommended for mapReduce jobs?

Solution

Related Problems and Solutions