Java – How do I put data (in Hadoop) into map and reduce functions in the correct type?

How do I put data (in Hadoop) into map and reduce functions in the correct type?… here is a solution to the problem.

How do I put data (in Hadoop) into map and reduce functions in the correct type?

I’m a bit having a hard time understanding how data in Hadoop fits into maps and simplified features. I know we can define the input format and the output format, and then define the key types for the input and output. But for example, if we want an object as an input type, how does Hadoop do it internally?

Thanks….

Solution

You can use the Hadoop InputFormat and OutputFormat interfaces to create your custom formats: An example might be formatting the output of a MapReduce job as JSON: Something like this –

public class JsonOutputFormat extends TextOutputFormat<Text, IntWritable> {
    @Override
    public RecordWriter<Text, IntWritable> getRecordWriter(
            TaskAttemptContext context) throws IOException, 
                  InterruptedException {
        Configuration conf = context.getConfiguration();
        Path path = getOutputPath(context);
        FileSystem fs = path.getFileSystem(conf);
        FSDataOutputStream out = 
                fs.create(new Path(path,context.getJobName()));
        return new JsonRecordWriter(out);
    }

private static class JsonRecordWriter extends 
          LineRecordWriter<Text,IntWritable>{
        boolean firstRecord = true;
        @Override
        public synchronized void close(TaskAttemptContext context)
                throws IOException {
            out.writeChar('{');
            super.close(null);
        }

@Override
        public synchronized void write(Text key, IntWritable value)
                throws IOException {
            if (!firstRecord){
                out.writeChars(",\r\n");
                firstRecord = false;
            }
            out.writeChars("\"" + key.toString() + "\":\""+
                    value.toString()+"\"");
        }

public JsonRecordWriter(DataOutputStream out) 
                throws IOException{
            super(out);
            out.writeChar('}');
        }
    }
}

Related Problems and Solutions