当前位置：文江博客话题详情

在文件中搜索字节序列 (C#)

发布于 2024-10-04 09:37:56 字数 125 浏览 4 评论 0原文

我正在编写一个 C# 应用程序，需要在文件（可能非常大）中搜索字节序列，但我无法使用任何库来执行此操作。因此，我需要一个函数，它将字节数组作为参数并返回给定序列后面的字节的位置。该功能不必非常快，只需能够工作即可。任何帮助将不胜感激:)

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤城病女 2024-10-11 09:37:56

如果不需要很快，你可以使用这个：

int GetPositionAfterMatch(byte[] data, byte[]pattern)
{
  for (int i = 0; i < data.Length - pattern.Length; i++)
  {
    bool match = true;
    for (int k = 0; k < pattern.Length; k++)
    {
      if (data[i + k] != pattern[k])
      {
        match = false;
        break;
      }
    }
    if (match)
    {
      return i + pattern.Length;
    }
  }
}

但我真的建议你使用 Knuth-Morris-Pratt 算法，该算法主要用作字符串 IndexOf 方法的基础。除了小数组和小模式之外，上面的算法执行速度非常慢。

If it doesn't have to be fast you could use this:

int GetPositionAfterMatch(byte[] data, byte[]pattern)
{
  for (int i = 0; i < data.Length - pattern.Length; i++)
  {
    bool match = true;
    for (int k = 0; k < pattern.Length; k++)
    {
      if (data[i + k] != pattern[k])
      {
        match = false;
        break;
      }
    }
    if (match)
    {
      return i + pattern.Length;
    }
  }
}

But I really would recommend you to use Knuth-Morris-Pratt algorithm, it's the algorithm mostly used as a base of IndexOf methods for strings. The algorithm above will perform really slow, exept for small arrays and small patterns.

回复收藏 0 原文

撧情箌佬 2024-10-11 09:37:56

Turrau 指出的直接方法是有效的，并且对于您的目的来说可能已经足够好了，因为您说它不必很快 - 特别是因为对于大多数实际目的，该算法比最坏情况 O( n*m）。（我猜这取决于你的模式）。

要获得最佳解决方案，您还可以查看 Knuth-Morris -Pratt 算法，它利用部分匹配，最终的复杂度为 O(n+m)。

回复收藏 0 原文

是你 2024-10-11 09:37:56

这是我用来进行 boyer-moore 类型搜索的一些代码的摘录。它适用于 pcap 文件，因此它会逐条记录地进行操作，但应该很容易修改以适合搜索长二进制文件。它是从一些测试代码中提取的，所以我希望我已经掌握了所有内容供您遵循。还要在维基百科上查找 boyer-moore 搜索，因为这就是它的基础。

int[] badMatch = new int[256];
byte[] pattern;  //the pattern we are searching for

//badMath is an array of every possible byte value (defined as static later).
//we use this as a jump table to know how many characters we can skip comparison on
//so first, we prefill every possibility with the length of our search string
for (int i = 0; i < badMatch.Length; i++)
{
  badMatch[i] = pattern.Length;
}

//Now we need to calculate the individual maximum jump length for each byte that appears in my search string
for (int i = 0; i < pattern.Length - 1; i++)
{
  badMatch[pattern[i] & 0xff] = pattern.Length - i - 1;
}

// Place the bytes you want to run the search against in the payload variable
byte[] payload = <bytes>

// search the packet starting at offset, and try to match the last character
// if we loop, we increment by whatever our jump value is
for (i = offset + pattern.Length - 1; i < end && cont; i += badMatch[payload[i] & 0xff])
{
  // if our payload character equals our search string character, continue matching counting backwards
  for (j = pattern.Length - 1, k = i; (j >= 0) && (payload[k] == pattern[j]) && cont; j--)
  {
    k--;
  }
// if we matched every character, then we have a match, add it to the packet list, and exit the search (cont = false)
  if (j == -1)
  {
    //we MATCHED!!!
    //i = end;
    cont = false;
  }
}

Here's an extract of some code I used to do a boyer-moore type search. It's mean to work on pcap files, so it operates record by record, but should be easy enough to modify to suit just searching a long binary file. It's sort of extracted from some test code, so I hope I got everything for you to follow along. Also look up boyer-moore searching on wikipedia, since that is what it's based off of.

int[] badMatch = new int[256];
byte[] pattern;  //the pattern we are searching for

//badMath is an array of every possible byte value (defined as static later).
//we use this as a jump table to know how many characters we can skip comparison on
//so first, we prefill every possibility with the length of our search string
for (int i = 0; i < badMatch.Length; i++)
{
  badMatch[i] = pattern.Length;
}

//Now we need to calculate the individual maximum jump length for each byte that appears in my search string
for (int i = 0; i < pattern.Length - 1; i++)
{
  badMatch[pattern[i] & 0xff] = pattern.Length - i - 1;
}

// Place the bytes you want to run the search against in the payload variable
byte[] payload = <bytes>

// search the packet starting at offset, and try to match the last character
// if we loop, we increment by whatever our jump value is
for (i = offset + pattern.Length - 1; i < end && cont; i += badMatch[payload[i] & 0xff])
{
  // if our payload character equals our search string character, continue matching counting backwards
  for (j = pattern.Length - 1, k = i; (j >= 0) && (payload[k] == pattern[j]) && cont; j--)
  {
    k--;
  }
// if we matched every character, then we have a match, add it to the packet list, and exit the search (cont = false)
  if (j == -1)
  {
    //we MATCHED!!!
    //i = end;
    cont = false;
  }
}

回复收藏 0 原文

~没有更多了~