How to parse the dataset Apache Spark Multi-line JSON in Java… here is a solution to the problem.
How to parse the dataset Apache Spark Multi-line JSON in Java
Is there a way to parse multi-line json files using the dataset
Here is the sample code
public static void main(String[] args) {
creating spark session
SparkSession spark = SparkSession.builder().appName("Java Spark SQL basic example")
.config("spark.some.config.option", "some-value").getOrCreate();
Dataset<Row> df = spark.read().json("D:/sparktestio/input.json");
df.show();
}
If json is in a single line, it works fine, but I need it for multiple lines
My json file
{
"name": "superman",
"age": "unknown",
"height": "6.2",
"weight": "flexible"
}
Solution
The Apache Spark documentation makes this clear –
For regular multiline JSON files, set the multiLine option to true.
So, the solution is
Dataset<Row> df = spark.read().option("multiLine", true).json("file:/a/b/c.json");
df.show();
I’ve tried json with the same format (a json object that spans multiple lines). After adding that option, I no longer see results with corrupted_record
headers in the results.