Java – How do I read Hadoop files using Apache Beam?

How do I read Hadoop files using Apache Beam?… here is a solution to the problem.

How do I read Hadoop files using Apache Beam?

I’m trying to read files using Apache Beam on a Hadoop server (not on-premises). The question is: what do I do? I read some articles about using Beam’s Hadoop I/O format:

https://beam.apache.org/documentation/io/built-in/hadoop/

I don’t quite understand this part :

Configuration myHadoopConfiguration = new Configuration(false);
THIS --> // Set Hadoop InputFormat, key and value class in configuration <-- THIS
myHadoopConfiguration.setClass("mapreduce.job.inputformat.class", 
InputFormatClass,
InputFormat.class);
myHadoopConfiguration.setClass("key.class", InputFormatKeyClass, Object.class);
myHadoopConfiguration.setClass("value.class", InputFormatValueClass, Object.class);

How to set this format? Do I need to create a class? Because if I c/p this code doesn’t work. Thanks

Solution

The standard default InputFormat is TextInputFormat This extends FileInputFormat<LongWritable, Text>

It says the Long value as the byte offset in the file. import org.apache.hadoop.io.LongWritable

and Text values as singular lines. import org.apache.hadoop.io.Text

The code does not work because InputFormatClass, InputFormatKeyClass, or InputFormatValueClass are not actual variables

Related Problems and Solutions