How do I use avro files as input to MRJob jobs?… here is a solution to the problem.
How do I use avro files as input to MRJob jobs?
I need to use the avro file as input to the mrjob hadoop job. Unless I pass additional commands to the hadoop streaming jar, I can’t find any documentation on how to do this. This complicates development because I’ve been testing locally with Inline
Runner.
Can I use inline
runner to read avro files via MRJob?
Solution
What you need is to tell Hadoop what the “input format” of your Hadoop job is:
hadoop jar hadoop-streaming.jar
;; other params go here
-inputformat org.apache.avro.mapred.AvroAsTextInputFormat
But I’m not sure how you run MRJobs. If you’re using Plain Hadoop, my previous solution will work.