Java - Hadoop spawns multiple VMs

Hadoop spawns multiple VMs… here is a solution to the problem.

Hadoop spawns multiple VMs

When I start Hadoop using the bin/start-all.sh script, it seems to start different JVMs for name nodes, data nodes, job trackers, and task trackers.

Also, when I start a job, it seems to create another JVM for each job.

Is there any specific reason for Hadoop to do this? I know this is necessary in a multi-node cluster environment, but even in a single-node cluster.

Is there a way to specify any configuration parameters that run everything under the same JVM?

Solution

I haven’t read anything that specifically explains why they do this, but for multiple JVMs, you’re likely to use more physical RAM (depending on the operating system). You’ll also get some quarantine. So, if you want to change the way a component is configured, you just need to restart that component. Not that that is necessarily a huge benefit. Also, from an implementation perspective, it may be simpler to do the same thing, and there is no different logic for different settings.

OTOH, why not generate multiple JVMs?

Java – Hadoop spawns multiple VMs

Hadoop spawns multiple VMs

Solution

Related Problems and Solutions