在文件的字符数组中搜索 2 个连续的十六进制值

发布于 2024-11-08 16:14:15 字数 576 浏览 0 评论 0原文

我已经使用 fread 将文件读入字符数组。现在我想在该数组中搜索两个连续的十六进制值，即 FF 后跟 D9（它是表示文件结尾的 jpeg 标记）。下面是我用来执行此操作的代码：

char* searchBuffer(char* b) {
    char* p1 = b;
    char* p2 = ++b;
    int count = 0;

    while (*p1 != (unsigned char)0xFF && *p2 != (unsigned char)0xD9) {
        p1++;
        p2++;
        count++;
    }

    count = count;
    return p1;
}

现在我知道如果我搜索不包含 0xFF 的十六进制值（例如 4E 后跟 46），此代码可以工作，但每次我尝试搜索 0xFF 时都会失败。当我不将十六进制值转换为无符号字符时，程序不会进入 while 循环，当我这样做时，程序会遍历数组中的所有字符，并且不会停止，直到出现越界错误。我很困惑，请帮忙。

忽略计数，它只是一个帮助我调试的变量。

提前致谢。

原文

I've read a file into an array of characters using fread. Now I want to search that array for two consecutive hex values, namely FF followed by D9 (its a jpeg marker signifying end of file). Here is the code I use to do that:

char* searchBuffer(char* b) {
    char* p1 = b;
    char* p2 = ++b;
    int count = 0;

    while (*p1 != (unsigned char)0xFF && *p2 != (unsigned char)0xD9) {
        p1++;
        p2++;
        count++;
    }

    count = count;
    return p1;
}

Now I know this code works if I search for hex values that don't include 0xFF (eg 4E followed by 46), but every time I try searching for 0xFF it fails. When I don't cast the hex values to unsigned char the program doesn't enter the while loop, when I do the program goes through all the chars in the array and doesn't stop until I get an out of bounds error. I'm stumped, please help.

Ignore count, its just a variable that helps me debug.

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

对岸观火 2024-11-15 16:14:15

为什么不使用 memchr() 来查找潜在的匹配项？

另外，请确保您正在处理潜在签名类型的提升（char 可能会或可能不会签名）。请注意，虽然 0xff 和 0xd9 在视为 8 位值时设置了高位，但它们是非负整数常量，因此不存在“符号扩展”发生在他们身上的情况：

char* searchBuffer(char* b) {
    unsigned char* p1 = (unsigned char*) b;
    int count = 0;

    for (;;) {
        /* find the next 0xff char */
        /* note - this highlights that we really should know the size   */
        /* of the buffer we're searching, in case we don't find a match */
        /* at the moment we're making it up to be some large number     */
        p1 = memchr(p1, 0xff, UINT_MAX);
        if (p1 && (*(p1 + 1) == 0xd9)) {
            /* found the 0xff 0xd9 sequence */
            break;
        }

        p1 += 1;
    }

    return (char *) p1;
}

另外，请注意，您确实应该传递正在搜索的缓冲区大小的一些概念，以防找不到目标。

这是一个采用缓冲区大小参数的版本：

char* searchBuffer(char* b, size_t siz) {
    unsigned char* p1 = (unsigned char*) b;
    unsigned char* end = p1 + siz;

    for (;;) {
        /* find the next 0xff char */
        p1 = memchr(p1, 0xff, end - p1);
        if (!p1) {
            /* sequnce not found, return NULL */
            break;
        }


        if (((p1 + 1) != end) && (*(p1 + 1) == 0xd9)) {
            /* found the 0xff 0xd9 sequence */
            break;
        }

        p1 += 1;
    }

    return (char *) p1;
}

Why not use memchr() to find potential matches?

Also, make sure you're dealing with promotions of potentially signed types (char may or may not be signed). Note that while 0xff and 0xd9 have the high bit set when looked at as 8-bit values, they are non-negative integer constants, so there is no 'sign extension' that occurs for them:

char* searchBuffer(char* b) {
    unsigned char* p1 = (unsigned char*) b;
    int count = 0;

    for (;;) {
        /* find the next 0xff char */
        /* note - this highlights that we really should know the size   */
        /* of the buffer we're searching, in case we don't find a match */
        /* at the moment we're making it up to be some large number     */
        p1 = memchr(p1, 0xff, UINT_MAX);
        if (p1 && (*(p1 + 1) == 0xd9)) {
            /* found the 0xff 0xd9 sequence */
            break;
        }

        p1 += 1;
    }

    return (char *) p1;
}

Also, note that you really should be passing in some notion of the size of the buffer being searched, in case the target isn't found.

Here's a version that takes a buffer size paramter:

char* searchBuffer(char* b, size_t siz) {
    unsigned char* p1 = (unsigned char*) b;
    unsigned char* end = p1 + siz;

    for (;;) {
        /* find the next 0xff char */
        p1 = memchr(p1, 0xff, end - p1);
        if (!p1) {
            /* sequnce not found, return NULL */
            break;
        }


        if (((p1 + 1) != end) && (*(p1 + 1) == 0xd9)) {
            /* found the 0xff 0xd9 sequence */
            break;
        }

        p1 += 1;
    }

    return (char *) p1;
}

回复收藏 0 原文

终止放荡 2024-11-15 16:14:15

您正在犯整数促销的错误。 !=（和类似）的两个操作数都提升为 int。如果其中至少一个是 unsigned，那么它们都被视为 unsigned（实际上这不是 100% 准确，但对于这种特殊情况，应该就够了）。所以这：

*p1 != (unsigned char)0xFF

相当于：

(unsigned int)*p1 != (unsigned int)(unsigned char)0xFF

在您的平台上，char显然是signed，在这种情况下，它永远不能采用(unsigned int)0xFF<的值/代码>。

因此，请尝试按如下方式转换 *p1：

(unsigned char)*p1 != 0xFF

或者，您可以让函数采用 unsigned char 参数而不是 char，并避免所有转换。

[请注意，除此之外，您的循环逻辑不正确，正如各种评论中指出的那样。]

You are falling foul of integer promotions. Both operands for != (and similar) are promoted to int. And if at least one of them is unsigned, then both of them are treated as unsigned (actually that isn't 100% accurate, but for this particular situation, it should suffice). So this:

*p1 != (unsigned char)0xFF

is equivalent to:

(unsigned int)*p1 != (unsigned int)(unsigned char)0xFF

On your platform, char is evidently signed, in which case it can never take on the value of (unsigned int)0xFF.

So try casting *p1 as follows:

(unsigned char)*p1 != 0xFF

Alternatively, you could have the function take unsigned char arguments instead of char, and avoid all the casting.

[Note that on top of all of this, your loop logic is incorrect, as pointed out in various comments.]

回复收藏 0 原文

書生途 2024-11-15 16:14:15

4E 会将其自身提升为正整数，但 *p1 将为负整数 FF，然后将提升为一个非常大的无符号值，该值将远远大于 FF。

您需要使 p1 无符号。

回复收藏 0 原文

墟烟 2024-11-15 16:14:15

您可以将代码编写得更短，如下所示：

char* searchBuffer(const char* b) {
    while (*b != '\xff' || *(b+1) != '\xd9') b++;
    return b;
}

另请注意，如果 b 实际上不包含字节 FFD9，则该函数将导致分段错误（或更糟的是，返回无效结果）。

You can write the code a lot shorter as:

char* searchBuffer(const char* b) {
    while (*b != '\xff' || *(b+1) != '\xd9') b++;
    return b;
}

Also note the function will cause a segmentation fault (or worse, return invalid results) if b does not, in fact, contain the bytes FFD9.

回复收藏 0 原文

就像说晚安 2024-11-15 16:14:15

使用 void *memmem(const void *haystack, size_t haystacklen, const void *needle, size_t Needlelen);

它在 string.h 中可用并且易于使用。

char* searchBuffer(char* b, int len) 
{
    unsigned char needle[2] = {0xFF, 0XD9};
    char * c;
    c = memmem(b, len, needle, sizeof(needle));
    return c;
}

use void *memmem(const void *haystack, size_t haystacklen, const void *needle, size_t needlelen);

which is available in string.h and easy to use.

char* searchBuffer(char* b, int len) 
{
    unsigned char needle[2] = {0xFF, 0XD9};
    char * c;
    c = memmem(b, len, needle, sizeof(needle));
    return c;
}

回复收藏 0 原文

~没有更多了~