How do I write a dataset that contains only headers (no rows) to an hdfs location (csv format) so that headers are included on download?… here is a solution to the problem.
How do I write a dataset that contains only headers (no rows) to an hdfs location (csv format) so that headers are included on download?
I have a dataset with only the title (id, name, age) and 0 rows.
I want to write
it as a csv file to an hdfs location using
DataFrameWriter dataFrameWriter = dataset.write();
Map<String, String> csvOptions = new HashMap<>();
csvOptions.put("header", "true");
dataFrameWriter = dataFrameWriter.options(csvOptions);
dataFrameWriter.mode(SaveMode.Overwrite).csv(location);
In the hdfs location, the file is:
1. _SUCCESS
2. tempFile.csv
If I go to that location and download the file (tempFile.csv), I get an empty csv file.
Tried using the headers true and false.
How do I write a header as the contents of a csv file?
Solution
Well, here’s a workaround. In Scala, you can do this:
df.take(1).isEmpty match {
case true => sc.parallelize(Array(df.schema.map(_.name).mkString(",")))
.saveAsTextFile("temp")
case false => df.write.save("temp")
}
df.schema
returns the schema of the data frame df
as StructType
.
_.
name returns the name of each column in the schema.
mkString(",")
converts the resulting sequence of names to comma-separated strings
I guess Java can do something similar.