Java – Data does not appear in files written using HADOOP LocalFileSystem

Data does not appear in files written using HADOOP LocalFileSystem… here is a solution to the problem.

Data does not appear in files written using HADOOP LocalFileSystem

I wrote the following code to write a few bytes to a local file using HADOOP’s LocalFileSytem. I used flush(), which as far as I can tell, flushes the JVM buffer, while hsynch() flushes the OS buffer, causing data to be written to disk. But in my case, its data does not appear in the file “1.txt”. But when I close the output stream with close() [for now I’ve commented it out in my code] the data comes up perfectly. Correct me if my understanding of flush() and hsynch() is correct? If correct, why didn’t the data appear?

package hdfsTrying.hdfstrying;
import java.io.BufferedWriter;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import com.google.protobuf.ByteString.Output;

public class LocalFileAccess {
    public static void main(String arg[])
    {
        Path p = new Path("/home/priya/1.txt");
        FileSystem fs;
        Configuration cfg = new Configuration();
        try
        {
            fs= FileSystem.getLocal(cfg);

FSDataOutputStream out = fs.create(p);
            out.write("Hi This should be written to file 1.txt".getBytes());
            out.flush();
            out.hsync();
            out.close();
            FileStatus fst[]  = fs.listStatus(p);
            for(FileStatus g:fst)
             System.out.println(g.getPath());

}
        catch(IOException io)
        {
            System.out.println("I am having exception");
            System.out.println(io.getMessage());
        }

}

}

Solution

When you use the flush method to write a file to disk, it is only written to disk when the amount of data for one block is reached. So, if your data is small (which is your case), you need to call the hsync function to force all buffers to be synchronized. However, this method only works with Hadoop versions above 1.x, because before that this method only called hflush. If you have an older version of Hadoop, try calling sync instead of hsync.

Related Problems and Solutions