How to adjust the “DataNode maximum Java heap size” in a Hadoop cluster
I searched in google to find information on how to adjust the value – DataNode maximum Java heap size, except for this –
However, no formula was found to calculate the value of the maximum Java heap size for DataNode
The default value for DataNode Maximum Java heap size is 1G
We increase this value to 5G because in some cases we see errors about heap size from data node logs
But this is not the right way to adjust the value
So any suggestions or good articles on how to set the correct value – datanode logs an error about heap size?
Suppose we have the following Hadoop cluster size:
10 datanode machines, 5 block disks, 1T per block disk
Each data node has 32 CPUs
Each data node has 256G memory
Based on this information, can we find the formula that shows the correct value – “datanode logs error about heap size“?
They recommend setting the Datanode java heap to 4G
But I’m not sure if this case covers all scenarios?
Root cause: The IO overhead of DN operations is significant and does not require a 16GB heap.
RESOLUTION: Tuning GC parameters resolved the issue - 4GB Heap recommendation : -Xms4096m -Xmx4096m -XX:NewSize=800m -XX:MaxNewSize=800m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:ParallelGCThreads=8
hadoop-env.sh (also some fields in Ambari, just try searching the heap), there is an option to set the value. It may be referred to as
HADOOP_DATANODE_OPTS in a shell file
8GB is usually a good value for most servers. You have enough memory, though, so I’ll start here and proactively monitor usage with JMX metrics in Grafana, for example.
Namenode may also need to adjust the https://community.hortonworks.com/articles/43838/scaling-the-hdfs-namenode-part-1.html