C:fscanf 和字符/字符串大小
我正在使用 fscanf 解析文本(css)文件。基本目标很简单;我想取出与此模式匹配的任何内容:
@import "some/file/somewhere.css";
因此,我使用 fscanf,告诉它读取并丢弃“@”字符之前的所有内容,然后存储所有内容,直到到达“;”为止。特点。这是执行此操作的函数:
char* readDelimitedSectionAsChar(FILE *file)
{
char buffer[4096];
int charsRead;
do
{
fscanf(file, "%*[^@] %[^;]", buffer, &charsRead);
} while(charsRead == 4095);
char *ptr = buffer;
return ptr;
}
据我所知,我创建了一个应该能够容纳 4095 个字符的缓冲区。然而,我发现事实并非如此。如果我有一个包含很长的匹配字符串的文件,如下所示:
@import "some/really/really/really/long/file/path/to/a/file";
使用 char[4096] 缓冲区将其截断为 31 个字符。 (如果我使用 printf 检查缓冲区的值,我发现字符串被缩短。)
如果增加缓冲区大小,则会包含更多字符串。我的印象是一个字符占用一个字节(尽管我知道这受到编码的影响)。我想了解这里发生了什么。
理想情况下,我希望能够将缓冲区设置为“动态”所需的大小——也就是说,让 fscanf 创建一个足够大的缓冲区来存储字符串。这可以做到吗? (我知道 GNU 的 %as 标志,但这是一个适用于 OS 10.5/10.6 的 Mac 应用程序,我不确定它是否适用于该平台。)
I am parsing a text (css) file using fscanf. The basic goal is simple; I want to pull out anything that matches this pattern:
@import "some/file/somewhere.css";
So I'm using fscanf, telling it to read and discard everything up to a '@' character and then store everything until it reaches a ';' character. Here's the function that does this:
char* readDelimitedSectionAsChar(FILE *file)
{
char buffer[4096];
int charsRead;
do
{
fscanf(file, "%*[^@] %[^;]", buffer, &charsRead);
} while(charsRead == 4095);
char *ptr = buffer;
return ptr;
}
I've created a buffer that should be able to hold 4095 characters, as I understand it. However, I'm discovering that this is not the case. If I have a file that contains a matching string that's long, like this:
@import "some/really/really/really/long/file/path/to/a/file";
That gets truncated to 31 characters using a buffer of char[4096]. (If I use printf to check the value of buffer, I find that the string is cut short.)
If I increase the buffer size, more of the string is included. I was under the impression that one character takes one byte (though I am aware this is affected by encoding). I am trying to understand what's going on here.
Ideally, I'd like to be able to set the buffer as large as it needs to be "on the fly" --- that is, have fscanf just create a buffer big enough to store the string. Can this be done? (I know of the %as flag for GNU, but this is a Mac application for OS 10.5/10.6 and I'm unsure if that will work on this platform.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您遇到的主要问题是您要返回一个指向堆栈上本地缓冲区的指针,该缓冲区是悬空的(因此会被您进行的下一个调用覆盖)。您还存在潜在的缓冲区溢出问题。
你提到了“a”选项,这会有很大帮助,但不幸的是它是一个 GNU 扩展,通常不可用。
其次,你可以使用 scanf 的额外选项
&charsRead
,它永远不会被写入,因为格式字符串中没有%
。因此 charsRead 将始终是随机垃圾 - 这意味着您的循环将(可能)只运行一次,或者(很少)永远循环。尝试类似的方法This 仍然存在问题,因为如果内存不足,它会出现异常行为(如果你向它提供一个开头带有 @ 而没有 ; 的巨大格式错误的文件,则很容易发生这种情况),
The main problem you have is that you're returning a pointer to a local buffer on the stack, which is dangling (and so overwritten by the next call you make). You also have a potential buffer overflow.
You mention the 'a' option, which would help a lot, but its unfortunately a GNU extension which isn't generally available.
Second, you have this extra option to scanf,
&charsRead
which will never be written to as there's no%
for it in the format string. So charsRead will always be random garbage -- which means you loop will (probably) just run once, or (rarely) loop forever. Try something likeThis is still broken in that it will misbehave if you run out of memory (which can easily happen if you feed it a huge malformed file with an @ in the beginning and no ;),
您的缓冲区是该函数的本地缓冲区。您为其分配了一个指针,但是当调用者访问该指针时,缓冲区不再存在。任何事情都可能发生。
所以,不要这样做。
而且
scanf
可能不是适合这项工作的工具。我会尝试使用getc
或fgets
来代替。Your buffer is local to the function. You assign a pointer to it, but when the caller accesses the pointer, buffer no longer exists. Anything can happen.
So, don't do that.
And
scanf
probably isn't the right tool for the job. I'd trygetc
, orfgets
instead.