如何编写一个神奇的文件测试模式来匹配文件末尾?
我开始怀疑这是否可能,因为在 SO、Google、Bing 和 linuxquestions.org 上进行的多次搜索都没有结果。
我有兴趣扩展位于 /usr/share/magic
中的魔术模式(由 file(1)
实用程序使用)来根据位于或附近的数据识别文件文件末尾。我已经能够对文件的开头以及从开头到文件的任意偏移量执行此操作。
手册页很好地说明了一些标准用例;不幸的是,似乎没有办法从末尾而不是从头开始索引。我能想到的唯一解决方法是采用使用 tac 和/或 lreverse 的脚本化方法,但感觉这些可能对二进制数据不友好。
另外,我想避免任何其他脚本处理 - 我觉得这应该可以通过正确的文件魔法来实现。有什么想法吗?
I am beginning to wonder if this is even possible as multiple searches on SO, Google, Bing and linuxquestions.org have turned up nothing.
I am interested in extending the magic patterns located in /usr/share/magic
(used by the file(1)
utility) to recognize files based on data at or near the end of the file. I have been able to do this for the beginning of a file, as well as for arbitrary offsets into the file from the beginning.
The man page does a pretty good job of illustrating some standard usage cases; unfortunately, it does not seem like there is a way to index from the end as opposed to the beginning. The only workaround I could come up with was to adopt a scripted approach using tac
and/or lreverse
but feel these may be unfriendly to binary data.
Also, I wanted to avoid any other scripted processing - I feel like this should be doable with the right file magic. Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是不可能的。
file(1)
也被设计为与管道一起使用。您不能在管道上使用 lseek(2) 来到达文件末尾。读取整个文件直到末尾会非常慢(并且file(1)
努力加快速度),如果它实际上是从管道读取,它可能永远不会遇到文件末尾,这会更糟。至于文档,对于开源软件来说,源代码本身就是最终的文档。如果您遇到这样的情况,看看总是一个好主意。
src/magic.c
中的函数file_or_fd()
给出了线索。使用源头,卢克! ;-)在您的具体情况下,我会再次查看有问题的文件格式,如果它确实无法被
file(1)
解析,那么应该使用一个简短的 Perl 或 Python 脚本做这个伎俩。祝你好运!It's not possible.
file(1)
is designed to work with pipes too. You can not uselseek(2)
on pipes to get to the end of the file. Reading the whole file until the end would be very slow (andfile(1)
tries hard to be fast) and if it is actually reading from a pipe, it may never encounter the end of the file, which would be even worse.As for the documentation, in case of open source software, the source code itself is the ultimate documentation. If you get stuck in a case like this, it is always a good idea to have a look. The function
file_or_fd()
insrc/magic.c
gives the clue. Use the Source, Luke! ;-)In your specific case, I would have a second look at the file format in question, and if it really can not be parsed by
file(1)
, then a short Perl or Python script should do the trick. Good luck!