Java – MapReduce: How to pass a HashMap to the mapper

MapReduce: How to pass a HashMap to the mapper… here is a solution to the problem.

MapReduce: How to pass a HashMap to the mapper

I’m designing a next-generation analytics system that needs to process multiple events from multiple sensors in near real time. To do this, I want to use a big data analytics platform such as Hadoop, Spark Streaming, or Flink.

In order to parse each event, I need to use some metadata from the table (DB) or at least load it into a cached map.

The problem is that each mapper will be parallelized on multiple nodes.

So I have two things to deal with:

  • First, how do I load/pass the HashMap to the mapper?
  • Is there any way to keep the HashMap consistent between mappers?

Solution

Serializes the HashMap structure to a file, stores it in HDFS, and uses it in the MapReduce job configuration phase DistributedCache Use the serialized HashMap to propagate the file in all mappers. Then in the map phase, each mapper can read the file, deserialize, and then access this HashMap.

Related Problems and Solutions