在文件中搜索字节序列 (C#)

发布于 2024-10-04 09:37:56 字数 125 浏览 4 评论 0原文

我正在编写一个 C# 应用程序,需要在文件(可能非常大)中搜索字节序列,但我无法使用任何库来执行此操作。因此,我需要一个函数,它将字节数组作为参数并返回给定序列后面的字节的位置。该功能不必非常快,只需能够工作即可。任何帮助将不胜感激:)

I'm writing a C# application in which I need to search a file (could be very big) for a sequence of bytes, and I can't use any libraries to do so. So, I need a function that takes a byte array as an argument and returns the position of the byte following the given sequence. The function doesn't have to be fast, it simply has to work. Any help would be greatly appreciated :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

孤城病女 2024-10-11 09:37:56

如果不需要很快,你可以使用这个:

int GetPositionAfterMatch(byte[] data, byte[]pattern)
{
  for (int i = 0; i < data.Length - pattern.Length; i++)
  {
    bool match = true;
    for (int k = 0; k < pattern.Length; k++)
    {
      if (data[i + k] != pattern[k])
      {
        match = false;
        break;
      }
    }
    if (match)
    {
      return i + pattern.Length;
    }
  }
}

但我真的建议你使用 Knuth-Morris-Pratt 算法,该算法主要用作字符串 IndexOf 方法的基础。除了小数组和小模式之外,上面的算法执行速度非常慢。

If it doesn't have to be fast you could use this:

int GetPositionAfterMatch(byte[] data, byte[]pattern)
{
  for (int i = 0; i < data.Length - pattern.Length; i++)
  {
    bool match = true;
    for (int k = 0; k < pattern.Length; k++)
    {
      if (data[i + k] != pattern[k])
      {
        match = false;
        break;
      }
    }
    if (match)
    {
      return i + pattern.Length;
    }
  }
}

But I really would recommend you to use Knuth-Morris-Pratt algorithm, it's the algorithm mostly used as a base of IndexOf methods for strings. The algorithm above will perform really slow, exept for small arrays and small patterns.

撧情箌佬 2024-10-11 09:37:56

Turrau 指出的直接方法是有效的,并且对于您的目的来说可能已经足够好了,因为您说它不必很快 - 特别是因为对于大多数实际目的,该算法比最坏情况 O( n*m)。 (我猜这取决于你的模式)。

要获得最佳解决方案,您还可以查看 Knuth-Morris -Pratt 算法,它利用部分匹配,最终的复杂度为 O(n+m)。

The straight-forward approach as pointed out by Turrau works, and for your purposes is probably good enough, since you say it doesn't have to be fast - especially since for most practical purposes the algorithm is much faster than the worst case O(n*m). (Depending on your pattern I guess).

For an optimal solution you can also check out the Knuth-Morris-Pratt algorithm, which makes use of partial matches which in the end is O(n+m).

是你 2024-10-11 09:37:56

这是我用来进行 boyer-moore 类型搜索的一些代码的摘录。它适用于 pcap 文件,因此它会逐条记录地进行操作,但应该很容易修改以适合搜索长二进制文件。它是从一些测试代码中提取的,所以我希望我已经掌握了所有内容供您遵循。还要在维基百科上查找 boyer-moore 搜索,因为这就是它的基础。

int[] badMatch = new int[256];
byte[] pattern;  //the pattern we are searching for

//badMath is an array of every possible byte value (defined as static later).
//we use this as a jump table to know how many characters we can skip comparison on
//so first, we prefill every possibility with the length of our search string
for (int i = 0; i < badMatch.Length; i++)
{
  badMatch[i] = pattern.Length;
}

//Now we need to calculate the individual maximum jump length for each byte that appears in my search string
for (int i = 0; i < pattern.Length - 1; i++)
{
  badMatch[pattern[i] & 0xff] = pattern.Length - i - 1;
}

// Place the bytes you want to run the search against in the payload variable
byte[] payload = <bytes>

// search the packet starting at offset, and try to match the last character
// if we loop, we increment by whatever our jump value is
for (i = offset + pattern.Length - 1; i < end && cont; i += badMatch[payload[i] & 0xff])
{
  // if our payload character equals our search string character, continue matching counting backwards
  for (j = pattern.Length - 1, k = i; (j >= 0) && (payload[k] == pattern[j]) && cont; j--)
  {
    k--;
  }
// if we matched every character, then we have a match, add it to the packet list, and exit the search (cont = false)
  if (j == -1)
  {
    //we MATCHED!!!
    //i = end;
    cont = false;
  }
}

Here's an extract of some code I used to do a boyer-moore type search. It's mean to work on pcap files, so it operates record by record, but should be easy enough to modify to suit just searching a long binary file. It's sort of extracted from some test code, so I hope I got everything for you to follow along. Also look up boyer-moore searching on wikipedia, since that is what it's based off of.

int[] badMatch = new int[256];
byte[] pattern;  //the pattern we are searching for

//badMath is an array of every possible byte value (defined as static later).
//we use this as a jump table to know how many characters we can skip comparison on
//so first, we prefill every possibility with the length of our search string
for (int i = 0; i < badMatch.Length; i++)
{
  badMatch[i] = pattern.Length;
}

//Now we need to calculate the individual maximum jump length for each byte that appears in my search string
for (int i = 0; i < pattern.Length - 1; i++)
{
  badMatch[pattern[i] & 0xff] = pattern.Length - i - 1;
}

// Place the bytes you want to run the search against in the payload variable
byte[] payload = <bytes>

// search the packet starting at offset, and try to match the last character
// if we loop, we increment by whatever our jump value is
for (i = offset + pattern.Length - 1; i < end && cont; i += badMatch[payload[i] & 0xff])
{
  // if our payload character equals our search string character, continue matching counting backwards
  for (j = pattern.Length - 1, k = i; (j >= 0) && (payload[k] == pattern[j]) && cont; j--)
  {
    k--;
  }
// if we matched every character, then we have a match, add it to the packet list, and exit the search (cont = false)
  if (j == -1)
  {
    //we MATCHED!!!
    //i = end;
    cont = false;
  }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文