从更大的二进制文件中查找大的二进制值

发布于 2024-11-18 12:56:07 字数 399 浏览 3 评论 0原文

正如标题所示,我想 grep 一个相当大(大约 100MB)的二进制文件,以获取二进制字符串 - 这个二进制字符串不到 5K。

我尝试过使用 -P 选项进行 grep ,但这似乎只在模式只有几个字节时返回匹配项 - 当我达到大约 100 个字节时,它不再找到任何匹配项。

我也尝试过bgrep。这最初工作得很好,但是,当我需要将模式扩展到我现在的长度时,我只是得到“无效/空搜索字符串”错误。

具有讽刺意味的是,在 Windows 中我可以使用 HxD 搜索文件,并在实例中找到它。但我真正需要的是一个 Linux 命令行工具。

谢谢你的帮助,

西蒙

As the title suggests I would like to grep a reasonably large (about 100MB) binary file, for a binary string - this binary string is just under 5K.

I've tried grep using the -P option, but this only seems to return matches when the pattern is only a few bytes - when I go up to about 100 bytes it no longer finds any matches.

I've also tried bgrep. This worked well originally, however, when I needed to extend the pattern to the length I have now I just get "invalid/empty search string" errors.

The irony is, in Windows I can use HxD to search the file and I finds it in a instance. What I really need though is a Linux command line tool.

Thanks for your help,

Simon

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

无可置疑 2024-11-25 12:56:07

假设我们有几个大的二进制数据文件。对于不应该匹配的大文件,我们创建一个 100MB 的文件,其内容全部为 NUL 字节。

dd ibs=1 count=100M if=/dev/zero of=allzero.dat

对于我们想要匹配的,创建一百个随机兆字节。

#! /usr/bin/env perl

use warnings;

binmode STDOUT or die "$0: binmode: $!";

for (1 .. 100 * 1024 * 1024) {
  print chr rand 256;
}

./mkrand >myfile.dat 的形式执行。

最后,将已知匹配提取到名为 pattern 的文件中。

dd skip=42 count=10 if=myfile.dat of=pattern

我假设您只需要匹配的文件 (-l) 并希望按字面意思处理您的模式 (-F--fixed-strings)。我怀疑您可能遇到了 -P 的长度限制。

您可能想使用 --file=PATTERN-FILE 选项,但 grepPATTERN-FILE 的内容解释为换行符分隔模式,因此在您的 5KB 模式可能包含换行符的情况下,您将遇到编码问题。

因此,希望您的系统的 ARG_MAX 足够大,然后就可以了。请务必引用 pattern 的内容。例如:

$ grep -l --fixed-strings "$(cat pattern)" allzero.dat myfile.dat
myfile.dat

Say we have a couple of big binary data files. For a big one that shouldn't match, we create a 100MB file whose contents are all NUL bytes.

dd ibs=1 count=100M if=/dev/zero of=allzero.dat

For the one we want to match, create a hundred random megabytes.

#! /usr/bin/env perl

use warnings;

binmode STDOUT or die "$0: binmode: $!";

for (1 .. 100 * 1024 * 1024) {
  print chr rand 256;
}

Execute it as ./mkrand >myfile.dat.

Finally, extract a known match into a file named pattern.

dd skip=42 count=10 if=myfile.dat of=pattern

I assume you want only the files that match (-l) and want your pattern to be treated literally (-F or --fixed-strings). I suspect you may have been running into a length limit with -P.

You may be tempted to use the --file=PATTERN-FILE option, but grep interprets the contents of PATTERN-FILE as newline-separated patterns, so in the likely case that your 5KB pattern contains newlines, you'll hit an encoding problem.

So hope your system's ARG_MAX is big enough and go for it. Be sure to quote the contents of pattern. For example:

$ grep -l --fixed-strings "$(cat pattern)" allzero.dat myfile.dat
myfile.dat
野生奥特曼 2024-11-25 12:56:07

尝试使用 grep -U 来将文件视为二进制文件。

另外,您如何指定搜索模式?它可能只需要转义即可在 shell 参数扩展中生存

Try using grep -U which treats files as binary.

Also, how are you specifying the search pattern? It might just need escaping to survive shell parameter expansions

抹茶夏天i‖ 2024-11-25 12:56:07

由于您正在搜索的字符串相当长。您可以受益于 Boyer-Moore 搜索算法的实现,该算法在搜索字符串很长时非常有效

http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

该维基也有链接一些示例代码。

As the string you are searching is pretty long. You could benefit by an implementation of the Boyer-Moore search algorithm which is very efficient when search string is very long

http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

The wiki also has links to some sample code.

﹉夏雨初晴づ 2024-11-25 12:56:07

您可能想查看一个简单的 Python 脚本。

match= (b"..." 
    b"...."
    b"..." ) # Some byte string literal of immense proportions
with open("some_big_file","rb") as source:
    block= read(len(match))
    while block != match:
        byte= read(1)
        if not byte: break
        block= block[1:]+read(1)

这在 Linux 和 Windows 下都可以可靠地工作。

You might want to look at a simple Python script.

match= (b"..." 
    b"...."
    b"..." ) # Some byte string literal of immense proportions
with open("some_big_file","rb") as source:
    block= read(len(match))
    while block != match:
        byte= read(1)
        if not byte: break
        block= block[1:]+read(1)

This might work reliably under Linux as well as Windows.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文