在文件的字符数组中搜索 2 个连续的十六进制值
我已经使用 fread 将文件读入字符数组。现在我想在该数组中搜索两个连续的十六进制值,即 FF 后跟 D9(它是表示文件结尾的 jpeg 标记)。下面是我用来执行此操作的代码:
char* searchBuffer(char* b) {
char* p1 = b;
char* p2 = ++b;
int count = 0;
while (*p1 != (unsigned char)0xFF && *p2 != (unsigned char)0xD9) {
p1++;
p2++;
count++;
}
count = count;
return p1;
}
现在我知道如果我搜索不包含 0xFF 的十六进制值(例如 4E 后跟 46),此代码可以工作,但每次我尝试搜索 0xFF 时都会失败。当我不将十六进制值转换为无符号字符时,程序不会进入 while 循环,当我这样做时,程序会遍历数组中的所有字符,并且不会停止,直到出现越界错误。我很困惑,请帮忙。
忽略计数,它只是一个帮助我调试的变量。
提前致谢。
I've read a file into an array of characters using fread. Now I want to search that array for two consecutive hex values, namely FF followed by D9 (its a jpeg marker signifying end of file). Here is the code I use to do that:
char* searchBuffer(char* b) {
char* p1 = b;
char* p2 = ++b;
int count = 0;
while (*p1 != (unsigned char)0xFF && *p2 != (unsigned char)0xD9) {
p1++;
p2++;
count++;
}
count = count;
return p1;
}
Now I know this code works if I search for hex values that don't include 0xFF (eg 4E followed by 46), but every time I try searching for 0xFF it fails. When I don't cast the hex values to unsigned char the program doesn't enter the while loop, when I do the program goes through all the chars in the array and doesn't stop until I get an out of bounds error. I'm stumped, please help.
Ignore count, its just a variable that helps me debug.
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
为什么不使用
memchr()
来查找潜在的匹配项?另外,请确保您正在处理潜在签名类型的提升(
char
可能会或可能不会签名)。请注意,虽然0xff
和0xd9
在视为 8 位值时设置了高位,但它们是非负整数常量,因此不存在“符号扩展”发生在他们身上的情况:另外,请注意,您确实应该传递正在搜索的缓冲区大小的一些概念,以防找不到目标。
这是一个采用缓冲区大小参数的版本:
Why not use
memchr()
to find potential matches?Also, make sure you're dealing with promotions of potentially signed types (
char
may or may not be signed). Note that while0xff
and0xd9
have the high bit set when looked at as 8-bit values, they are non-negative integer constants, so there is no 'sign extension' that occurs for them:Also, note that you really should be passing in some notion of the size of the buffer being searched, in case the target isn't found.
Here's a version that takes a buffer size paramter:
您正在犯整数促销的错误。
!=
(和类似)的两个操作数都提升为int
。如果其中至少一个是unsigned
,那么它们都被视为unsigned
(实际上这不是 100% 准确,但对于这种特殊情况,应该就够了)。所以这:相当于:
在您的平台上,
char
显然是signed
,在这种情况下,它永远不能采用(unsigned int)0xFF<的值/代码>。
因此,请尝试按如下方式转换
*p1
:或者,您可以让函数采用
unsigned char
参数而不是char
,并避免所有转换。[请注意,除此之外,您的循环逻辑不正确,正如各种评论中指出的那样。]
You are falling foul of integer promotions. Both operands for
!=
(and similar) are promoted toint
. And if at least one of them isunsigned
, then both of them are treated asunsigned
(actually that isn't 100% accurate, but for this particular situation, it should suffice). So this:is equivalent to:
On your platform,
char
is evidentlysigned
, in which case it can never take on the value of(unsigned int)0xFF
.So try casting
*p1
as follows:Alternatively, you could have the function take
unsigned char
arguments instead ofchar
, and avoid all the casting.[Note that on top of all of this, your loop logic is incorrect, as pointed out in various comments.]
4E 会将其自身提升为正整数,但
*p1
将为负整数 FF,然后将提升为一个非常大的无符号值,该值将远远大于 FF。您需要使
p1
无符号。4E will promote itself to a positive integer but
*p1
will be negative with FF, and then will be promoted to a very large unsigned value that will be far greater than FF.You need to make
p1
unsigned.您可以将代码编写得更短,如下所示:
另请注意,如果 b 实际上不包含字节 FFD9,则该函数将导致分段错误(或更糟的是,返回无效结果)。
You can write the code a lot shorter as:
Also note the function will cause a segmentation fault (or worse, return invalid results) if b does not, in fact, contain the bytes FFD9.
使用 void *memmem(const void *haystack, size_t haystacklen, const void *needle, size_t Needlelen);
它在 string.h 中可用并且易于使用。
use void *memmem(const void *haystack, size_t haystacklen, const void *needle, size_t needlelen);
which is available in string.h and easy to use.