如何获取正在上传的文件的InputStream的MIME类型?

发布于 2024-10-10 13:30:18 字数 92 浏览 6 评论 0原文

简单的问题:对于用户上传到我的 servlet 的文件,如何在不保存文件的情况下获取 InputStream 的 MIME 类型(或内容类型)?

Simple question: how can I get MIME type (or content type) of an InputStream, without saving file, for a file that a user is uploading to my servlet?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

夜司空 2024-10-17 13:30:18

我为 byte[] 编写了自己的内容类型检测器,因为上面的库不合适或者我无权访问它们。希望这可以帮助别人。

// retrieve file as byte[]
byte[] b = odHit.retrieve( "" );

// copy top 32 bytes and pass to the guessMimeType(byte[]) funciton
byte[] topOfStream = new byte[32];
System.arraycopy(b, 0, topOfStream, 0, topOfStream.length);
String mimeGuess = guessMimeType(topOfStream);

...

private static String guessMimeType(byte[] topOfStream) {

    String mimeType = null;
    Properties magicmimes = new Properties();
    FileInputStream in = null;

    // Read in the magicmimes.properties file (e.g. of file listed below)
    try {
        in = new FileInputStream( "magicmimes.properties" );
        magicmimes.load(in);
        in.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

    // loop over each file signature, if a match is found, return mime type
    for ( Enumeration keys = magicmimes.keys(); keys.hasMoreElements(); ) {
        String key = (String) keys.nextElement();
        byte[] sample = new byte[key.length()];
        System.arraycopy(topOfStream, 0, sample, 0, sample.length);
        if( key.equals( new String(sample) )){
            mimeType = magicmimes.getProperty(key);
            System.out.println("Mime Found! "+ mimeType);
            break;
        } else {
            System.out.println("trying "+key+" == "+new String(sample));
        }
    }

    return mimeType;
}

magicmimes.properties 文件示例(不确定这些签名是否正确,但它们适合我的使用)

# SignatureKey                  content/type
\u0000\u201E\u00f1\u00d9        text/plain
\u0025\u0050\u0044\u0046        application/pdf
%PDF                            application/pdf
\u0042\u004d                    image/bmp
GIF8                            image/gif
\u0047\u0049\u0046\u0038        image/gif
\u0049\u0049\u004D\u004D        image/tiff
\u0089\u0050\u004e\u0047        image/png
\u00ff\u00d8\u00ff\u00e0        image/jpg

I wrote my own content-type detector for a byte[] because the libraries above weren't suitable or I didn't have access to them. Hopefully this helps someone out.

// retrieve file as byte[]
byte[] b = odHit.retrieve( "" );

// copy top 32 bytes and pass to the guessMimeType(byte[]) funciton
byte[] topOfStream = new byte[32];
System.arraycopy(b, 0, topOfStream, 0, topOfStream.length);
String mimeGuess = guessMimeType(topOfStream);

...

private static String guessMimeType(byte[] topOfStream) {

    String mimeType = null;
    Properties magicmimes = new Properties();
    FileInputStream in = null;

    // Read in the magicmimes.properties file (e.g. of file listed below)
    try {
        in = new FileInputStream( "magicmimes.properties" );
        magicmimes.load(in);
        in.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

    // loop over each file signature, if a match is found, return mime type
    for ( Enumeration keys = magicmimes.keys(); keys.hasMoreElements(); ) {
        String key = (String) keys.nextElement();
        byte[] sample = new byte[key.length()];
        System.arraycopy(topOfStream, 0, sample, 0, sample.length);
        if( key.equals( new String(sample) )){
            mimeType = magicmimes.getProperty(key);
            System.out.println("Mime Found! "+ mimeType);
            break;
        } else {
            System.out.println("trying "+key+" == "+new String(sample));
        }
    }

    return mimeType;
}

magicmimes.properties file example (not sure these signatures are correct, but they worked for my uses)

# SignatureKey                  content/type
\u0000\u201E\u00f1\u00d9        text/plain
\u0025\u0050\u0044\u0046        application/pdf
%PDF                            application/pdf
\u0042\u004d                    image/bmp
GIF8                            image/gif
\u0047\u0049\u0046\u0038        image/gif
\u0049\u0049\u004D\u004D        image/tiff
\u0089\u0050\u004e\u0047        image/png
\u00ff\u00d8\u00ff\u00e0        image/jpg
梦里°也失望 2024-10-17 13:30:18

根据 Real Gagnon 的优秀网站,适合您情况的更好解决方案是使用Apache Tika

According to Real Gagnon's excellent site, the better solution for your case would be to use Apache Tika.

南城追梦 2024-10-17 13:30:18

这取决于您从哪里获取输入流。如果您从 servlet 获取它,则可以通过作为 doPost 参数的 HttpServerRequest 对象访问它。如果您使用某种 Rest API(例如 Jersey),则可以使用 @Context 注入请求。如果您通过套接字上传文件,则您有责任将 MIME 类型指定为协议的一部分,因为您不会继承 http 标头。

It depends on where you are getting the input stream from. If you are getting it from a servlet then it is accessable through the HttpServerRequest object that is an argument of doPost. If you are using some sort of rest API like Jersey then the request can be injected by using @Context. If you are uploading the file through a socket it will be your responsibility to specify the MIME type as part of your protocol as you will not inherit the http headers.

命比纸薄 2024-10-17 13:30:18

我非常支持“先自己动手,然后寻找图书馆解决方案”。幸运的是,这个案例就是这样。

您必须知道文件的“幻数”,即它的签名。
我举一个检测InputStream是否代表PNG文件的例子。

PNG 签名由以下十六进制附加在一起组成:

1) 错误检查字节

2) ASCII 中的字符串“PNG”:

     P - 0x50
     N - 0x4E
     G - 0x47

3) CR(回车) - 0x0D

4) LF(换行)- 0xA

5) SUB(替换)- 0x1A

6) LF(换行) - 0xA

因此,幻数是

89   50 4E 47 0D 0A 1A 0A

137  80 78 71 13 10 26 10 (decimal)
-119 80 78 71 13 10 26 10 (in Java)

137 -> 的解释-119转换

N位数字可以用来表示2^N不同的值。
对于 2^8=2560..255 范围的字节(8 位)。
Java认为字节原语是有符号的,因此范围是-128..127
因此,137 被认为是有符号的并表示 -119 = 137 - 256

Koltin 中的示例

private fun InputStream.isPng(): Boolean {
    val magicNumbers = intArrayOf(-119, 80, 78, 71, 13, 10, 26, 10)
    val signatureBytes = ByteArray(magicNumbers.size)
    read(signatureBytes, 0, signatureBytes.size)
    return signatureBytes.map { it.toInt() }.toIntArray().contentEquals(magicNumbers)
}

当然,为了支持许多 MIME 类型,您必须以某种方式扩展此解决方案,如果您对结果不满意,请考虑使用一些库。

I'm a big proponent of "do it yourself first, then look for a library solution". Luckily, this case is just that.

You have to know the file's "magic number", i.e. its signature.
Let me give an example for detecting whether the InputStream represents PNG file.

PNG signature is composed by appending together the following in HEX:

1) error-checking byte

2) string "PNG" as in ASCII:

     P - 0x50
     N - 0x4E
     G - 0x47

3) CR (carriage return) - 0x0D

4) LF (line feed) - 0xA

5) SUB (substitute) - 0x1A

6) LF (line feed) - 0xA

So, the magic number is

89   50 4E 47 0D 0A 1A 0A

137  80 78 71 13 10 26 10 (decimal)
-119 80 78 71 13 10 26 10 (in Java)

Explanation of 137 -> -119 conversion

N bit number can be used to represent 2^N different values.
For a byte (8 bits) that is 2^8=256, or 0..255 range.
Java considers byte primitives to be signed, so that range is -128..127.
Thus, 137 is considered to be singed and represent -119 = 137 - 256.

Example in Koltin

private fun InputStream.isPng(): Boolean {
    val magicNumbers = intArrayOf(-119, 80, 78, 71, 13, 10, 26, 10)
    val signatureBytes = ByteArray(magicNumbers.size)
    read(signatureBytes, 0, signatureBytes.size)
    return signatureBytes.map { it.toInt() }.toIntArray().contentEquals(magicNumbers)
}

Of course, in order to support many MIME types, you have to scale this solution somehow, and if you are not happy with the result, consider some library.

我们的影子 2024-10-17 13:30:18

您可以检查 Content-Type 标头字段 并查看 所使用的文件名的扩展名。对于其他一切,您必须运行更复杂的例程,例如通过 Tika 等进行检查。

You can check the Content-Type header field and have a look at the extension of the filename used. For everything else, you have to run more complex routines, like checking by Tikaetc.

胡大本事 2024-10-17 13:30:18

只要您不在其他任何地方使用 slf4j 日志记录,您就可以将 tika-app-1.x.jar 添加到您的类路径中,因为它会导致冲突。如果您使用 tika 来检测输入流,则它必须被标记为支持。否则,调用 tika 将删除您的输入流。但是,如果您使用 apache IO 库来解决此问题,并将 InputStream 转换为内存中的文件。

import org.apache.tika.*;

Tike tika = new Tika();
InputStream in = null;
FileOutputStream out = null;
try{
   out = new FileOutputStream(c:/tmp.tmp);
   IOUtils.copy(in, out);
   String mimeType = tika.detect(out);
}catch(Exception e){
   System.err.println(e);
} finally {
   if(null != in) 
       in.close();
   if(null != out)
       out.close();
 }

You can just add the tika-app-1.x.jar to your classpath as long as you don't use slf4j logging anywhere else because it will cause a collision. If you use tika to detect an inputstream it has to be mark supported. Otherwise, calling tika will erase your input stream. However if you use the apache IO library to get around this and just turn the InputStream into a File in memory.

import org.apache.tika.*;

Tike tika = new Tika();
InputStream in = null;
FileOutputStream out = null;
try{
   out = new FileOutputStream(c:/tmp.tmp);
   IOUtils.copy(in, out);
   String mimeType = tika.detect(out);
}catch(Exception e){
   System.err.println(e);
} finally {
   if(null != in) 
       in.close();
   if(null != out)
       out.close();
 }
病毒体 2024-10-17 13:30:18

如果使用 JAX-RS 休息服务,您可以从 MultipartBody 获取它。

@POST
@Path( "/<service_path>" )
@Consumes( "multipart/form-data" )
public Response importShapeFile( final MultipartBody body ) {
    String filename = null;
    String InputStream stream = null;
    for ( Attachment attachment : body.getAllAttachments() )
    {
        ContentDisposition disposition = attachment.getContentDisposition();
        if ( disposition != null && PARAM_NAME.equals( disposition.getParameter( "name" ) ) )
        {
            filename = disposition.getParameter( "filename" );
            stream = attachment.getDataHandler().getInputStream();
            break;
        }
    }

    // Read extension from filename to get the file's type and
    // read the stream accordingly.
}

其中 PARAM_NAME 是表示保存文件流的参数名称的字符串。

If using a JAX-RS rest service you can get it from the MultipartBody.

@POST
@Path( "/<service_path>" )
@Consumes( "multipart/form-data" )
public Response importShapeFile( final MultipartBody body ) {
    String filename = null;
    String InputStream stream = null;
    for ( Attachment attachment : body.getAllAttachments() )
    {
        ContentDisposition disposition = attachment.getContentDisposition();
        if ( disposition != null && PARAM_NAME.equals( disposition.getParameter( "name" ) ) )
        {
            filename = disposition.getParameter( "filename" );
            stream = attachment.getDataHandler().getInputStream();
            break;
        }
    }

    // Read extension from filename to get the file's type and
    // read the stream accordingly.
}

Where PARAM_NAME is a string representing the name of the parameter holding the file stream.

£烟消云散 2024-10-17 13:30:18

我认为这解决了问题:

    public String readIt(InputStream is) {
    if (is != null) {
            BufferedReader reader = new BufferedReader(new InputStreamReader(is, "utf-8"), 8);

            StringBuilder sb = new StringBuilder();
            String line;
            while ((line = reader.readLine()) != null) {
                sb.append(line).append("\n");
            }
            is.close();
            return sb.toString();
    }
    return "error: ";
}        

它返回什么?例如,对于 png :“PNG\n\nCheck........”,对于 xml:

非常有用,您不能尝试 string.contains() 来检查它是什么

I think this solves problem:

    public String readIt(InputStream is) {
    if (is != null) {
            BufferedReader reader = new BufferedReader(new InputStreamReader(is, "utf-8"), 8);

            StringBuilder sb = new StringBuilder();
            String line;
            while ((line = reader.readLine()) != null) {
                sb.append(line).append("\n");
            }
            is.close();
            return sb.toString();
    }
    return "error: ";
}        

What it returns? For example for png : "♦PNG\n\n♦♦♦.....", for xml:

Quite usefull, You cant try string.contains() to check what is it

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文