Java/ Spring : How to Figure out MimeType on an InputStream Without Consuming It
Basics
This is a Java 1.8
Spring Boot 1.5 application.
It currently uses Apache Tika 1.22
to read Mime-Type information, but this could easily change.
Summary
There is a mapper for users to use to download files. These files come from another URL
separate from the application. The file can be of several types (excel
, PDF
, text
, etc.) and the application has no way of knowing what it will be until it pulls the file down.
Question
To return the file download to the user with the appropriate title, extension, and ContentType
, the application uses Apache Tika
to extract that information. Unfortunately, now that the header of the InputStream has been consumed, the file is incomplete when the application writes the InputStream
to HttpServletResponse
.
This means that for the current functionality, the application closes the first InputStream and then opens the second InputStream
to return to the user.
This is bad because it means that the URL
was called twice, wasting system resources.
What is the correct way to have this feature?
Code example
@GetMapping("/My/Download/")
public void doDownload(HttpServletResponse httpServletResponse) {
String externalFileURL = "http://www.pdf995.com/samples/pdf.pdf";
try {
InputStream firstStream = new URL(externalFileURL).openStream();
TikaConfig tikaConfig = new TikaConfig();
MediaType mediaType = tikaConfig.getDetector().detect(TikaInputStream.get(firstStream), new Metadata());
firstStream.close();
InputStream secondStream = new URL(externalFileURL).openStream();
httpServletResponse.setHeader("Content-Disposition", String.format("attachment; filename=\"%s\"", "DownloadMe." + mediaType.getSubtype()));
httpServletResponse.setContentType(mediaType.getBaseType().toString());
FileCopyUtils.copy(secondStream, httpServletResponse.getOutputStream());
httpServletResponse.flushBuffer();
} catch (Exception e) {
}
}
Solution
detect Javadoc ()
says:
The given stream is guaranteed to support the
mark feature
and the detector is expected tomark
the stream before reading any bytes from it, and toreset
the stream before returning.
TikaInputStream's
Javadoc says:
The created TikaInputStream instance keeps track of the original resource used to create it, while behaving otherwise just like a normal, buffered
InputStream
. A TikaInputStream instance is also guaranteed to support themark(int)
feature.
This means that you should use TikaInputStream
to read the content and try-with-resources to close it:
try (InputStream tikaStream = TikaInputStream.get(new URL(externalFileURL))) {
TikaConfig tikaConfig = new TikaConfig();
MediaType mediaType = tikaConfig.getDetector().detect(tikaStream, new Metadata());
httpServletResponse.setHeader("Content-Disposition", String.format("attachment; filename=\"%s\"", "DownloadMe." + mediaType.getSubtype()));
httpServletResponse.setContentType(mediaType.getBaseType().toString());
FileCopyUtils.copy(tikaStream, httpServletResponse.getOutputStream());
httpServletResponse.flushBuffer();
}