如何编写一个神奇的文件测试模式来匹配文件末尾?

发布于 2024-10-17 00:19:50 字数 370 浏览 1 评论 0原文

我开始怀疑这是否可能,因为在 SO、Google、Bing 和 linuxquestions.org 上进行的多次搜索都没有结果。

我有兴趣扩展位于 /usr/share/magic 中的魔术模式(由 file(1) 实用程序使用)来根据位于或附近的数据识别文件文件末尾。我已经能够对文件的开头以及从开头到文件的任意偏移量执行此操作。

手册页很好地说明了一些标准用例;不幸的是,似乎没有办法从末尾而不是从头开始索引。我能想到的唯一解决方法是采用使用 tac 和/或 lreverse 的脚本化方法,但感觉这些可能对二进制数据不友好。

另外,我想避免任何其他脚本处理 - 我觉得这应该可以通过正确的文件魔法来实现。有什么想法吗?

I am beginning to wonder if this is even possible as multiple searches on SO, Google, Bing and linuxquestions.org have turned up nothing.

I am interested in extending the magic patterns located in /usr/share/magic (used by the file(1) utility) to recognize files based on data at or near the end of the file. I have been able to do this for the beginning of a file, as well as for arbitrary offsets into the file from the beginning.

The man page does a pretty good job of illustrating some standard usage cases; unfortunately, it does not seem like there is a way to index from the end as opposed to the beginning. The only workaround I could come up with was to adopt a scripted approach using tac and/or lreverse but feel these may be unfriendly to binary data.

Also, I wanted to avoid any other scripted processing - I feel like this should be doable with the right file magic. Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

糖果控 2024-10-24 00:19:50

这是不可能的。 file(1) 也被设计为与管道一起使用。您不能在管道上使用 lseek(2) 来到达文件末尾。读取整个文件直到末尾会非常慢(并且 file(1) 努力加快速度),如果它实际上是从管道读取,它可能永远不会遇到文件末尾,这会更糟。

至于文档,对于开源软件来说,源代码本身就是最终的文档。如果您遇到这样的情况,看看总是一个好主意。 src/magic.c 中的函数 file_or_fd() 给出了线索​​。使用源头,卢克! ;-)

在您的具体情况下,我会再次查看有问题的文件格式,如果它确实无法被 file(1) 解析,那么应该使用一个简短的 Perl 或 Python 脚本做这个伎俩。祝你好运!

It's not possible. file(1) is designed to work with pipes too. You can not use lseek(2) on pipes to get to the end of the file. Reading the whole file until the end would be very slow (and file(1) tries hard to be fast) and if it is actually reading from a pipe, it may never encounter the end of the file, which would be even worse.

As for the documentation, in case of open source software, the source code itself is the ultimate documentation. If you get stuck in a case like this, it is always a good idea to have a look. The function file_or_fd() in src/magic.c gives the clue. Use the Source, Luke! ;-)

In your specific case, I would have a second look at the file format in question, and if it really can not be parsed by file(1), then a short Perl or Python script should do the trick. Good luck!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文