Java – Yarn container understanding and adjustment

Yarn container understanding and adjustment… here is a solution to the problem.

Yarn container understanding and adjustment

Hello, we recently upgraded from MR1 to YARN. I know containers are an abstraction, but I don’t understand how many JVM tasks (map, reduce, filter, etc.) a container can spawn, or otherwise ask if the container can be reused across multiple map or reduce tasks. I read in the following blog: What is a container in YARN?

"To be accurate, each mapper and reducer runs on its own container!" This means that if I look at the AM logs, I should see that the number of containers allocated is equal to the number of mapped tasks (failed | succeeded) Is it correct to add the number of reduce tasks?

I know that the number of containers in the application lifecycle changes, based on AM requests, splits, dispatchers, etc.

But there is no way to request the minimum initial number of containers for a given application. I think one way is to configure a fair scheduler queue. But is there anything else that can determine this?

In the case of MR, if I have mapreduce.map.memory.mb = 3GB and
mapreduce.map.cpu.vcores=4。 I also have yarn.scheduler.minimum-allocation-mb = 1024m and yarn.scheduler.minimum-allocation-vcores = 1.

Does that mean I’ll get a 4-core container or 4 containers and a core?

It’s not clear where you can specify mapreduce.map.memory.mb and mapreduce.map.cpu.vcores. Should they be set in the client node or can they also be set per application?

Can I also view the containers currently assigned for a given application from the RM UI or AM UI?

Solution

  1. A container is a logical entity. It allows an application to use a specific amount of resources (memory, CPU, etc.) on a specific host (node manager).
    Map and Reduce tasks for the same application cannot reuse containers.

For example I have a Mapreduce application that generates 10 mappers:
Number of mappers

I run it on a single host with 8 vCores (the value is determined by the configuration parameter: yarn.nodemanager.resource.cpu-vcores). The default setting is 8. Please check “YarnConfiguration.java”

  /** Number of Virtual CPU Cores which can be allocated for containers.*/
  public static final String NM_VCORES = NM_PREFIX + "resource.cpu-vcores";
  public static final int DEFAULT_NM_VCORES = 8;

Because there are 10 mappers and 1 application host, the total number of containers generated is 11.
enter image description here

Therefore, for each map/reduce task, a different container is started.

However, in Yarn, for MapReduce jobs, there is a concept of an Uber job that enables users to use a single container for multiple mappers and 1 reducer (https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml: The current code cannot support multiple reductions and will ignore larger values. )。

  1. There are no configuration parameters that can be used to specify the minimum number of containers. The Application Master is responsible for requesting the number of containers required.

  2. yarn.scheduler.minimum-allocation-mb – determines the minimum memory allocation per container (yarn.scheduler.maximum-allocation-mb determines the maximum allocation per container request).

    yarn.scheduler.minimum-allocation-vcores – Determines the minimum vCore allocation per container (yarn.scheduler.maximum-allocation-vcores determines the maximum allocation for each container request).

    In your case, you request “mapreduce.map.memory.mb = 3m (3MB) and mapreduce.map.cpu.vcores = 4 (4 vCores).”

    So, for each mapper, you get 1 container and 4 vCores (assuming yarn.scheduler.maximum-allocation-vcores>= 4).

  3. The parameters “mapreduce.map.memory.mb”

  4. and “mapreduce.map.cpu.vcores” set the file in the mapred-site .xml. If this configuration parameter is not Final, you can override it in the client before submitting the job.

  5. Yes. From the Application Attempts page for an application, you can see the number of containers that have been allocated. Check the image above.

Related Problems and Solutions