Java/ Spring : How to Figure out MimeType on an InputStream Without Consuming It

Java/ Spring : How to Figure out MimeType on an InputStream Without Consuming It … here is a solution to the problem.

Java/ Spring : How to Figure out MimeType on an InputStream Without Consuming It

Basics

This is a Java 1.8 Spring Boot 1.5 application.

It currently uses Apache Tika 1.22 to read Mime-Type information, but this could easily change.

Summary

There is a mapper for users to use to download files. These files come from another URL separate from the application. The file can be of several types (excel, PDF, text, etc.) and the application has no way of knowing what it will be until it pulls the file down.

Question

To return the file download to the user with the appropriate title, extension, and ContentType, the application uses Apache Tika to extract that information. Unfortunately, now that the header of the InputStream has been consumed, the file is incomplete when the application writes the InputStream to HttpServletResponse.

This means that for the current functionality, the application closes the first InputStream and then opens the second InputStream to return to the user.

This is bad because it means that the URL was called twice, wasting system resources.

What is the correct way to have this feature?

Code example

    @GetMapping("/My/Download/")
    public void doDownload(HttpServletResponse httpServletResponse) {

String externalFileURL = "http://www.pdf995.com/samples/pdf.pdf";

try {       
                InputStream firstStream = new URL(externalFileURL).openStream();        
                TikaConfig tikaConfig = new TikaConfig();
                MediaType mediaType = tikaConfig.getDetector().detect(TikaInputStream.get(firstStream), new Metadata());
                firstStream.close();

InputStream secondStream = new URL(externalFileURL).openStream();   
                httpServletResponse.setHeader("Content-Disposition", String.format("attachment; filename=\"%s\"", "DownloadMe." + mediaType.getSubtype()));
                httpServletResponse.setContentType(mediaType.getBaseType().toString());
                FileCopyUtils.copy(secondStream, httpServletResponse.getOutputStream());
                httpServletResponse.flushBuffer();
            } catch (Exception e) {

}
    }

Solution

detect Javadoc () says:

The given stream is guaranteed to support the mark feature and the detector is expected to mark the stream before reading any bytes from it, and to reset the stream before returning.

TikaInputStream's Javadoc says:

The created TikaInputStream instance keeps track of the original resource used to create it, while behaving otherwise just like a normal, buffered InputStream . A TikaInputStream instance is also guaranteed to support the mark(int) feature.

This means that you should use TikaInputStream to read the content and try-with-resources to close it:

try (InputStream tikaStream = TikaInputStream.get(new URL(externalFileURL))) {
    TikaConfig tikaConfig = new TikaConfig();
    MediaType mediaType = tikaConfig.getDetector().detect(tikaStream, new Metadata());

httpServletResponse.setHeader("Content-Disposition", String.format("attachment; filename=\"%s\"", "DownloadMe." + mediaType.getSubtype()));
    httpServletResponse.setContentType(mediaType.getBaseType().toString());
    FileCopyUtils.copy(tikaStream, httpServletResponse.getOutputStream());
    httpServletResponse.flushBuffer();
}

Related Problems and Solutions