从更大的二进制文件中查找大的二进制值
正如标题所示,我想 grep 一个相当大(大约 100MB)的二进制文件,以获取二进制字符串 - 这个二进制字符串不到 5K。
我尝试过使用 -P 选项进行 grep ,但这似乎只在模式只有几个字节时返回匹配项 - 当我达到大约 100 个字节时,它不再找到任何匹配项。
我也尝试过bgrep。这最初工作得很好,但是,当我需要将模式扩展到我现在的长度时,我只是得到“无效/空搜索字符串”错误。
具有讽刺意味的是,在 Windows 中我可以使用 HxD 搜索文件,并在实例中找到它。但我真正需要的是一个 Linux 命令行工具。
谢谢你的帮助,
西蒙
As the title suggests I would like to grep a reasonably large (about 100MB) binary file, for a binary string - this binary string is just under 5K.
I've tried grep using the -P option, but this only seems to return matches when the pattern is only a few bytes - when I go up to about 100 bytes it no longer finds any matches.
I've also tried bgrep. This worked well originally, however, when I needed to extend the pattern to the length I have now I just get "invalid/empty search string" errors.
The irony is, in Windows I can use HxD to search the file and I finds it in a instance. What I really need though is a Linux command line tool.
Thanks for your help,
Simon
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
假设我们有几个大的二进制数据文件。对于不应该匹配的大文件,我们创建一个 100MB 的文件,其内容全部为 NUL 字节。
对于我们想要匹配的,创建一百个随机兆字节。
以
./mkrand >myfile.dat
的形式执行。最后,将已知匹配提取到名为
pattern
的文件中。我假设您只需要匹配的文件 (
-l
) 并希望按字面意思处理您的模式 (-F
或--fixed-strings
)。我怀疑您可能遇到了-P
的长度限制。您可能想使用
--file=PATTERN-FILE
选项,但grep
将 PATTERN-FILE 的内容解释为换行符分隔模式,因此在您的 5KB 模式可能包含换行符的情况下,您将遇到编码问题。因此,希望您的系统的
ARG_MAX
足够大,然后就可以了。请务必引用pattern
的内容。例如:Say we have a couple of big binary data files. For a big one that shouldn't match, we create a 100MB file whose contents are all NUL bytes.
For the one we want to match, create a hundred random megabytes.
Execute it as
./mkrand >myfile.dat
.Finally, extract a known match into a file named
pattern
.I assume you want only the files that match (
-l
) and want your pattern to be treated literally (-F
or--fixed-strings
). I suspect you may have been running into a length limit with-P
.You may be tempted to use the
--file=PATTERN-FILE
option, butgrep
interprets the contents of PATTERN-FILE as newline-separated patterns, so in the likely case that your 5KB pattern contains newlines, you'll hit an encoding problem.So hope your system's
ARG_MAX
is big enough and go for it. Be sure to quote the contents ofpattern
. For example:尝试使用 grep -U 来将文件视为二进制文件。
另外,您如何指定搜索模式?它可能只需要转义即可在 shell 参数扩展中生存
Try using
grep -U
which treats files as binary.Also, how are you specifying the search pattern? It might just need escaping to survive shell parameter expansions
由于您正在搜索的字符串相当长。您可以受益于 Boyer-Moore 搜索算法的实现,该算法在搜索字符串很长时非常有效
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
该维基也有链接一些示例代码。
As the string you are searching is pretty long. You could benefit by an implementation of the Boyer-Moore search algorithm which is very efficient when search string is very long
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
The wiki also has links to some sample code.
您可能想查看一个简单的 Python 脚本。
这在 Linux 和 Windows 下都可以可靠地工作。
You might want to look at a simple Python script.
This might work reliably under Linux as well as Windows.