Java - NLineInputFormat usage exceeds GC overhead limits

NLineInputFormat usage exceeds GC overhead limits… here is a solution to the problem.

NLineInputFormat usage exceeds GC overhead limits

I’m trying to read multiple rows in the mapper. To do this, I started using the NLineInputFormat class. When using it, I get a GC limit error. As a reference, the error code is:

16/02/21 01:37:13 INFO mapreduce. Job:  map 0% reduce 0%
16/02/21 01:37:38 WARN mapred. LocalJobRunner: job_local726191039_0001
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1019)
at java.util.concurrent.ConcurrentHashMap.putAll(ConcurrentHashMap.java:1084)
at java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:852)
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:713)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:442)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.<init>(LocalJobRunner.java:217)
at org.apache.hadoop.mapred.LocalJobRunner$Job.getMapTaskRunnables(LocalJobRunner.java:272)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:517)
16/02/21 01:37:39 INFO mapreduce. Job: Job job_local726191039_0001 failed with state FAILED due to: NA

As a reference, please find the code snippet below.

public class JobLauncher {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "TestDemo");
        job.setJarByClass(JobLauncher.class);

job.setMapperClass(CSVMapper.class);
        job.setMapOutputKeyClass(NullWritable.class);
        job.setMapOutputValueClass(NullWritable.class);

conf.setInt(NLineInputFormat.LINES_PER_MAP, 3);
        job.setInputFormatClass(NLineInputFormat.class);
        NLineInputFormat.addInputPath(job, new Path(args[0]));

job.setNumReduceTasks(0);
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
   }
}

I only have a simple CSVMapper mapper. Why am I getting this error? Please help me fix this error.

Thanks in advance.

Solution

Why I am getting this error?

In general, the most likely explanation for OOME is insufficient memory, because

Your code has a memory leak, or
You don’t have enough memory to record what you’re trying to do/how you’re trying to do it.

(Because of this particular “flavor” of OOME, you haven’t completely run out of memory yet.) However, it is likely that you are close to exhaustion, which caused GC CPU usage to peak above the GC Overhead threshold. This detail does not change the way you should try to fix the problem. )

In your case, the error seems to have occurred when you loaded the input from the file into a map (or map collection). So it can be inferred that you have told Hadoop to load more data at one time than memory can hold.

Please help me resolve this error.

Solution:

Reduce the size of the input file; For example, break down your problem into smaller ones
Increase the memory size of the affected JVM (specifically the Java heap size).
Change your application so that the job itself streams data from a file (or HFS)… Instead of loading the CSV file into the map.

If you need

a more specific answer, you’ll need to provide more details.

Java – NLineInputFormat usage exceeds GC overhead limits

NLineInputFormat usage exceeds GC overhead limits

Solution

Related Problems and Solutions