如何识别ODF文件?

发布于 2024-08-12 19:33:45 字数 254 浏览 10 评论 0 原文

我需要能够根据文件的内容而不是文件的扩展名来识别给定文件是 ODF 文件。

ODF 文件实际上是 zip 容器中 XML 文件的集合,这意味着我无法使用该文件的幻数,因为它只会表明它是一个 zip 文件。

所以我真正要问的是是否有任何文件需要存在于ODF容器中?如果是这样,则该文件存在于zip容器中表明它很可能是 ODF 文件,并且缺少该文件表明它肯定不是 ODF 文件。

I need to be able to identify that a given file is an ODF file based on the contents of the file, and not on the file's extension.

ODF files are really a collection of XML files in a zip container, which means that I cannot use the file's magic number as it will just indicate that it is a zip file.

So what I'm really asking is are there any files that are required to be present in an ODF container? If so the presence of that file in a zip container indicates that it is likely to be an ODF file, and the absence of that file indicates that it definitely is not an ODF file.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

誰ツ都不明白 2024-08-19 19:33:45

为什么不查看ODF 技术规范?那里列出的 mimetype 文件可能是一种理想的检查方法(只需在 mimetype 中查找 vnd.oasis.opendocument 字符串)。

Why not check out the ODF Technical Specification? The mimetype file listed there would probably be an ideal way to check (just look for the vnd.oasis.opendocument string in the mimetype).

旧人 2024-08-19 19:33:45

据我了解,存档的根目录中总会有 .xml 文件,并且这个/这些 xml 文件将始终在开头附近包含字符串

我所看到的所有这些似乎都在根目录中包含一个名为“content.xml”的文件,该文件确实包含此字符串。

编写ODF文档的应用程序并不多,过去基本上只有一个。因此,安装一些旧版本的 OpenOffice、保存一些文件并检查该规则是否适用于当前 ODF 文件应该不会太困难。

我会在一批已知的 ODF 文件上使用类似的方法进行测试,以检查它是否可靠:

$ unzip -c $FILE content.xml | grep -q '<office:document' && echo yes || echo NO

As I understand it, there will always be .xml file(s) in the root of the archive, and this/these xml files will always contain the string <office:document very near the beginning.

All those I have seen seem to contain a file called "content.xml" in the root, which does contain this string.

There are not so many applications writing ODF documents, and in the past, there was basically just one. So it shouldn't be too difficult to install some ancient version of OpenOffice, save a few files, and check that this rule applies as it does on current ODF files.

I would test with something like this on a batch of know ODF files, to check if it is reliable:

$ unzip -c $FILE content.xml | grep -q '<office:document' && echo yes || echo NO
凡间太子 2024-08-19 19:33:45

读取构建 ID - 如果丢失,则该文档不是 ODF。

oDoc = ThisComponent
If oDoc.BuildID = "" Then
    bIsNotODF = TRUE
Endif

Read the Build ID - if missing, the document is not ODF.

oDoc = ThisComponent
If oDoc.BuildID = "" Then
    bIsNotODF = TRUE
Endif
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文