Java – Use MapReduce to read large files in hadoop

Use MapReduce to read large files in hadoop… here is a solution to the problem.

Use MapReduce to read large files in hadoop

I have a code that can read a file from an FTP server and write it to HDFS. I’ve implemented a custom InputFormatReader that sets the isSplitable property of the input to false. But this gives me the following error.

INFO mapred. MapTask: Record too large for in-memory buffer

The code I use to read the data is

Path file = fileSplit.getPath();
                FileSystem fs = file.getFileSystem(conf);
                FSDataInputStream in = null;
                try {
                    in = fs.open(file);

IOUtils.readFully(in, contents, 0, contents.length);

value.set(contents, 0, contents.length);

}

Any ideas on how to avoid java heap space error without splitting the input file? Or how do I read the file if I make isSplitable true?

Solution

If I heard me correctly – you load the entire file into memory. Nothing to do with hadoop – you can’t do it on Java and make sure you have enough memory.
I suggest defining some reasonable blocks and making them “recorded”

Related Problems and Solutions