验证文件的完整性
验证这些文件的完整性的步骤是什么? doc,docx,docm,odt,rtf,pdf,odf,odp,xls,xlsx,xlsm,ppt,pptm
或至少其中一些。通常在上传到内容存储库时。
我猜想 inputStream 总是 99,99% 从 MultiPart http 请求中正确读取,否则将引发异常并采取操作。但用户可以上传已经损坏的文件 - 我是否使用第三方库来检查?我在 odftoolkit、itextpdf、pdfbox、apache poi 或 tika 中没有看到类似的内容
What are the steps to verify integrity of these documents ? doc,docx,docm,odt,rtf,pdf,odf,odp,xls,xlsx,xlsm,ppt,pptm
Or at least of some of them. Usually when uploaded to a content repository.
I guess that inputStream is always 99,99% read properly from MultiPart http request otherwise exception would be thrown and action taken. But user can upload already corrupted file - do I use third party libraries for checking that? I didn't see anything like that in odftoolkit, itextpdf, pdfbox, apache poi or tika
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
“腐败”有很多种。
一些损坏应该很容易被发现。例如,当您尝试打开被截断的 ODF 文件时,它很可能会失败,因为 ZIP 阅读器无法读取它。
其他人几乎不可能被发现。例如,RTF 文件中的一个字符损坏将无法检测到,(我认为)大多数 RTF 文件截断也将如此。
如果您找到一个(免费)工具来为所有这些文件类型完成这项工作,即使在技术上可行的范围内,我也会感到惊讶。当前一代用于读/写文档格式的开源库往往只关注某一类格式。如果您认真对待这一点,您可能需要使用商业库。
There are many kinds of "corrupt".
Some corruptions should be easy to detect. For instance a truncated ODF file will most likely fail when you attempt to open it because the ZIP reader can't read it.
Others will be literally impossible to detect. For instance a one character corruption in an RTF file will be undetectable, and so (I think) will most RTF file truncations.
I'd be surprised if you found a single (free) tool to do this job for all of those file types, even to the extent that it is technically possible. The current generation of open source libraries for reading / writing document formats tend to focus on one family of formats only. If you are serious about this, you probably need to use a commercial library.
对于上面列出的所有文件格式,都有可以打开的第三方库等。 - 我不知道“仅验证”,但我认为能够毫无例外地打开它们等至少是一个基本检查文件符合指定的格式...一个这样的(商业)库是
Aspose
- 不隶属,只是一个满意的客户...For all of the above listed file formats there are 3rd-party libraries which can open etc. - I don't know of a "verification only" but I think being able to open them without exceptions etc. is at least a basic check that the file is within the specified format... One such (commercial) library is
Aspose
- not affiliated, just a happy customer...您可以在上传之前对文件进行校验和/哈希(即安全哈希),然后单独上传校验和。如果随后下载的文件具有相同的校验和,则它与原始文件相比没有被更改(达到一定的高概率,取决于所使用的校验和/哈希)。
You can do checksums/hashes (that is, a secure hash) of the file before uploading, then upload the checksum separately. If the subsequently downloaded file has the same checksum, it has not been changed (to a certain high probability, depending on the checksum/hash used) from the original.
去检查 LibreOffice 项目(已经处理这些档案),它有用 Java 编写的部分,并且您肯定可以找到并使用它们的机制来检查损坏的文件。
我想你可以从这里获取代码:
http://www.libreoffice.org/get-参与/开发人员/
Go to check LibreOffice project (that already handles these archives), it has parts written in Java, and for sure you could find and use their mecanisms to check for corrupted files.
I think you can get the code from here:
http://www.libreoffice.org/get-involved/developers/