在二进制文件中查找模式

发布于 2024-10-20 17:45:56 字数 920 浏览 1 评论 0原文

我正在用 C 开发一个小项目,我必须解析未记录文件格式的二进制文件。由于我对 CI 很陌生,所以有两个问题想问一些更有经验的程序员。

第一个似乎很简单。如何从二进制文件中提取所有字符串并将它们放入数组中?基本上,我正在寻找 C 语言中 strings 程序的简单实现。

当我在任何文本编辑器中打开二进制文件时,我会得到很多垃圾,其中混有一些可读的字符串。我可以使用以下命令提取这些字符串命令行中的字符串。现在我想在 C 中做类似的事情,就像下面的伪代码一样:

while (!EOF) {
     if (string found) {
          put it into array[i]
          i++
       }
     return i;
}

第二个问题有点复杂,我相信这是实现相同目标的正确方法。当我在十六进制编辑器中查看该文件时,很容易注意到一些模式。例如,在每个字符串之前有一个字节值 02 (0x02),后跟字符串的长度和字符串本身。例如 02 18 52 4F 4F 54 4B 69 57 69 4B 61 4B 69 是一个字符串,字符串部分以粗体显示。

现在我尝试创建的函数将像这样工作:

while(!EOF) {
     for(i=0; i<buffer_size; ++i) {
          if(buffer[i] hex value == 02) {
               int n = read the next byte;
               string = read the next n bytes as char;
               put string into array;
          }
     }
}

感谢您的任何指示。 :)

I'm working on a small project in C where I have to parse a binary file of undocumented file format. As I'm quite new to C I have two questions to some more experienced programmers.

The first seems to be an easy one. How do I extract all the strings from the binary file and put them into an array? Basically I am looking for a simple implementation of strings program in C.

When I open the binary file in any text editor I get a lot of rubbish with some readable strings mixed in. I can extract this strings using strings in the command line. Now I'd like to do something similar in C, like in the pseudocode below:

while (!EOF) {
     if (string found) {
          put it into array[i]
          i++
       }
     return i;
}

The second problem is a little bit more complicated and is, I believe, the proper way of achieving the same thing. When I look at the file in HEX editor it's easy to notice some patterns. For example before each string there is a byte of value 02 (0x02) followed by the length of the string and the string itself. For example 02 18 52 4F 4F 54 4B 69 57 69 4B 61 4B 69 is a string with the string part in bold.

Now the function I'm trying to create would work like this:

while(!EOF) {
     for(i=0; i<buffer_size; ++i) {
          if(buffer[i] hex value == 02) {
               int n = read the next byte;
               string = read the next n bytes as char;
               put string into array;
          }
     }
}

Thanks for any pointers. :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

败给现实 2024-10-27 17:45:56

第一个似乎很简单。如何从二进制文件中提取所有字符串并将它们放入数组中?

找出代表可打印 ASCII 字符的字符范围。遍历文件,检查字符是否为 ASCII 字符,并对相邻的 ASCII 字符进行计数。默认情况下,strings 会将四个或更多字符的序列视为字符串;当找到下一个非ASCII字符时,检查是否超出了数量;如果有,则输出字符串。一些簿记是必要的。

第二个问题有点复杂,我相信这是实现同样目标的正确方法。

您的伪代码基本上是正确的。您可以手动将 buffer[i] 的内容与整数(例如 2)进行比较。读取一个字节就像递增i一样简单。确保没有超出缓冲区,并确保读取字符串的数组足够大(如果大小参数只有一个字节,则可以使用 255 长度的数组缓冲区。)

The first seems to be an easy one. How do I extract all the strings from the binary file and put them into an array?

Figure out what character range represents printable ASCII characters. Iterate across the file, checking if characters are ASCII characters, and counting up for adjacent ASCII characters. By default, strings will treat sequences of four or more characters as strings; when you find the next non-ASCII character, check if the number has been exceeded; if it has, output the string. Some book-keeping is necessary.

The second problem is a little bit more complicated and is, I believe, the proper way of achieving the same thing.

Your pseudocode is essentially correct. You can manually compare the contents of buffer[i] with an integer (e.g. 2). Reading a byte is as simple as incrementing i. Make sure you don't overrun the buffer, and make sure the array your reading the string to is big enough (if the size parameter is only one byte, you can get away with a 255 length array buffer.)

謸气贵蔟 2024-10-27 17:45:56

我不确定您的解决方案是否有效:如果您找到长度为 350 个字符的字符串怎么办?
数字可以是字符串的一部分,或者您可以认为它们是“垃圾”?

我认为最安全的方法是

  1. 定义您认为的字符串和您认为的“垃圾” - 例如“:,!?”是“字符串”还是“垃圾”?
  2. 定义被视为“可读”字符串的最小字符串长度
  3. 解析文件,查找长度 >= 最小值的每组字符。
    我知道,这很无聊,但我认为这是唯一安全的方法。祝你好运!

I'm not sure your solution will work: what if you find a string with 350 char length?
Numbers can be part of a string or you can consider them "rubbish"?

I think the most safe way is

  1. Define what you consider string and what you consider "rubbish" - for instance ":,!?" are "string" or "rubbish"?
  2. Define a minimum string length to be considered a "readable" string
  3. Parse the file looking for every group of char with length >= minimum.
    I know, it's boring, but I think it's the only safe way. Good luck!
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文