如何获取正在上传的文件的InputStream的MIME类型?
简单的问题:对于用户上传到我的 servlet 的文件,如何在不保存文件的情况下获取 InputStream
的 MIME 类型(或内容类型)?
Simple question: how can I get MIME type (or content type) of an InputStream
, without saving file, for a file that a user is uploading to my servlet?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我为 byte[] 编写了自己的内容类型检测器,因为上面的库不合适或者我无权访问它们。希望这可以帮助别人。
...
magicmimes.properties 文件示例(不确定这些签名是否正确,但它们适合我的使用)
I wrote my own content-type detector for a byte[] because the libraries above weren't suitable or I didn't have access to them. Hopefully this helps someone out.
...
magicmimes.properties file example (not sure these signatures are correct, but they worked for my uses)
根据 Real Gagnon 的优秀网站,适合您情况的更好解决方案是使用Apache Tika。
According to Real Gagnon's excellent site, the better solution for your case would be to use Apache Tika.
这取决于您从哪里获取输入流。如果您从 servlet 获取它,则可以通过作为 doPost 参数的 HttpServerRequest 对象访问它。如果您使用某种 Rest API(例如 Jersey),则可以使用 @Context 注入请求。如果您通过套接字上传文件,则您有责任将 MIME 类型指定为协议的一部分,因为您不会继承 http 标头。
It depends on where you are getting the input stream from. If you are getting it from a servlet then it is accessable through the HttpServerRequest object that is an argument of doPost. If you are using some sort of rest API like Jersey then the request can be injected by using @Context. If you are uploading the file through a socket it will be your responsibility to specify the MIME type as part of your protocol as you will not inherit the http headers.
我非常支持“先自己动手,然后寻找图书馆解决方案”。幸运的是,这个案例就是这样。
您必须知道文件的“幻数”,即它的签名。
我举一个检测
InputStream
是否代表PNG文件的例子。PNG 签名由以下十六进制附加在一起组成:
1) 错误检查字节
2) ASCII 中的字符串“PNG”:
3)
CR
(回车) -0x0D
4)
LF
(换行)-0xA
5)
SUB
(替换)-0x1A
6)
LF
(换行) -0xA
因此,幻数是
137 -> 的解释-119
转换N位数字可以用来表示
2^N
不同的值。对于
2^8=256
或0..255
范围的字节(8
位)。Java认为字节原语是有符号的,因此范围是
-128..127
。因此,
137
被认为是有符号的并表示-119 = 137 - 256
。Koltin 中的示例
当然,为了支持许多 MIME 类型,您必须以某种方式扩展此解决方案,如果您对结果不满意,请考虑使用一些库。
I'm a big proponent of "do it yourself first, then look for a library solution". Luckily, this case is just that.
You have to know the file's "magic number", i.e. its signature.
Let me give an example for detecting whether the
InputStream
represents PNG file.PNG signature is composed by appending together the following in HEX:
1) error-checking byte
2) string "PNG" as in ASCII:
3)
CR
(carriage return) -0x0D
4)
LF
(line feed) -0xA
5)
SUB
(substitute) -0x1A
6)
LF
(line feed) -0xA
So, the magic number is
Explanation of
137 -> -119
conversionN bit number can be used to represent
2^N
different values.For a byte (
8
bits) that is2^8=256
, or0..255
range.Java considers byte primitives to be signed, so that range is
-128..127
.Thus,
137
is considered to be singed and represent-119 = 137 - 256
.Example in Koltin
Of course, in order to support many MIME types, you have to scale this solution somehow, and if you are not happy with the result, consider some library.
您可以检查
Content-Type
标头字段 并查看 所使用的文件名的扩展名。对于其他一切,您必须运行更复杂的例程,例如通过 Tika 等进行检查。You can check the
Content-Type
header field and have a look at the extension of the filename used. For everything else, you have to run more complex routines, like checking byTika
etc.只要您不在其他任何地方使用 slf4j 日志记录,您就可以将 tika-app-1.x.jar 添加到您的类路径中,因为它会导致冲突。如果您使用 tika 来检测输入流,则它必须被标记为支持。否则,调用 tika 将删除您的输入流。但是,如果您使用 apache IO 库来解决此问题,并将 InputStream 转换为内存中的文件。
You can just add the tika-app-1.x.jar to your classpath as long as you don't use slf4j logging anywhere else because it will cause a collision. If you use tika to detect an inputstream it has to be mark supported. Otherwise, calling tika will erase your input stream. However if you use the apache IO library to get around this and just turn the InputStream into a File in memory.
如果使用 JAX-RS 休息服务,您可以从 MultipartBody 获取它。
其中 PARAM_NAME 是表示保存文件流的参数名称的字符串。
If using a JAX-RS rest service you can get it from the MultipartBody.
Where PARAM_NAME is a string representing the name of the parameter holding the file stream.
我认为这解决了问题:
它返回什么?例如,对于 png :“PNG\n\nCheck........”,对于 xml:
非常有用,您不能尝试 string.contains() 来检查它是什么
I think this solves problem:
What it returns? For example for png : "♦PNG\n\n♦♦♦.....", for xml:
Quite usefull, You cant try string.contains() to check what is it