使用fseek回溯

发布于 2024-07-18 00:05:36 字数 491 浏览 10 评论 0原文

使用fseek回溯字符fscanf操作可靠吗?

例如,如果我只有 fscanf-ed 10 个字符,但我想回溯这 10 个字符,我可以只 fseek(infile, -10, SEEK_CUR) 吗?

对于大多数情况,它是有效的,但我似乎对字符 ^M 有问题。 显然,fseek 将其注册为字符,但 fscanf 没有注册它,因此在我之前的示例中,包含 ^M 的 10 个字符块将需要使用 fseek(infile, -11, SEEK_CUR) 来代替。 fseek(infile, -10, SEEK_CUR) 会使它短 1 个字符。

为什么会这样呢?

编辑:我在文本模式下使用 fopen

Is using fseek to backtrack character fscanf operations reliable?

Like for example if I have just fscanf-ed 10 characters but I would like to backtrack the 10 chars can I just fseek(infile, -10, SEEK_CUR) ?

For most situations it works but I seem to have problems with the character ^M. Apparently fseek registers it as a char but fscanf doesn't register it, thus in my previous example a 10 char block containing a ^M would require fseek(infile, -11, SEEK_CUR) instead. fseek(infile, -10, SEEK_CUR) would make bring it short by 1 character.

Why is this so?

Edit: I was using fopen in text mode

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

痴梦一场 2024-07-25 00:05:36

您将看到“文本”文件和“二进制”文件之间的区别。 当文件以文本模式打开时(fopen 第二个参数中没有“b”),stdio 库可以(实际上必须)根据操作系统的文本文件约定解释文件的内容。 例如,在 Windows 中,一行以 \r\n 结尾,并且 stdio 将其转换为单个 \n,因为这是 C 约定。 写入文本文件时,单个 \n 的输出为 \r\n。

这使得编写处理文本文件的可移植 C 程序变得更加容易。 然而,一些细节变得复杂,fseeking 就是其中之一。 因此,C 标准仅在几种情况下在文本文件中定义 fseek:到最开头、到最后、到当前位置以及到使用 ftell 检索到的上一个位置。 换句话说,您无法计算要查找文本文件的位置。 或者你可以,但你必须自己处理所有特定于平台的细节。

或者,您可以使用二进制文件并自己进行行结束转换。 再次,可移植性受到影响。

就您而言,如果您只想返回到上次执行 fscancf 的位置,最简单的方法是在 fscanf 之前使用 ftell。

You're seeing the difference between a "text" and a "binary" file. When a file is opened in text mode (no 'b' in the fopen second argument), the stdio library may (indeed, must) interpret the contents of the file according to the operating system's conventions for text files. For example, in Windows, a line ends with \r\n, and this gets translated to a single \n by stdio, since that is the C convention. When writing to a text file, a single \n gets output as \r\n.

This makes it easier to write portable C programs that handle text files. Some details become complicated, however, and fseeking is one of them. Because of this, the C standard only defines fseek in text files in a few cases: to the very beginning, to the very end, to the current position, and to a previous position that has been retrieved with ftell. In other words, you can't compute a location to seek to for text files. Or you can, but you have to take care of the all the platform-specific details yourself.

Alternatively, you can use binary files and do the line-ending transformations yourself. Again, portability suffers.

In your case, if you just want to go back to where you last did fscancf, the easiest would be to use ftell just before you fscanf.

若沐 2024-07-25 00:05:36

这是因为 fseek 处理字节,而 fscanf 智能地处理回车和换行是两个字节,并将它们作为一个字符吞下。

This is because fseek works with bytes, whereas fscanf intelligently handles that the carriage return and line feed are two bytes, and swallows them as one char.

梦回梦里 2024-07-25 00:05:36

Fseek 不了解文件的内容,只是将文件指针向后移动 10 个字符。

fscanf 根据操作系统的不同,可能会以不同的方式解释换行符; 如果您使用的是 DOS 并且 ^M 没有出现在文件中,那么 fscanf 甚至可能会插入 ^M。 检查 C 编译器附带的手册

Fseek has no understanding of the file's contents and just moves the filepointer 10 characters back.

fscanf depending on the OS, may interpret newlines differently; it may even be so that fscanf will insert the ^M if you're on DOS and the ^M does not appear in the file. Check your manual that came with your C compiler

不忘初心 2024-07-25 00:05:36

刚刚用 VS2008 尝试了一下,发现 fscanf 和 fseek 以相同的方式处理 CR 和 LF 字符(作为单个字符)。

因此有两个文件:

0000000: 3132 3334 3554 3738 3930 3132 3334 3536 12345X7890123456

0000000: 3132 3334 350d 0a37 3839 3031 3233 3435 1 2345..789012345

如果我读取 15 个字符,我会到达第二个“5”,然后查找 10 个字符,我读取的下一个字符在第一种情况下是“X”,在第二种情况下是 CRLF。

这似乎是一个操作系统/编译器特定的问题。

Just tried this with VS2008 and found that fscanf and fseek treated the CR and LF characters in the same way (as a single character).

So with two files:

0000000: 3132 3334 3554 3738 3930 3132 3334 3536 12345X7890123456

and

0000000: 3132 3334 350d 0a37 3839 3031 3233 3435 12345..789012345

If I read 15 characters I get to the second '5', then seek back 10 characters, my next character read is the 'X' in the first case and the CRLF in the second.

This seems like a very OS/compiler specific problem.

月牙弯弯 2024-07-25 00:05:36

您测试过fscanf的返回值吗? 发布一些代码。

看看 ungetc。 您可能必须对其运行循环。

Did you test the return value of fscanf? Post some code.

Take a look at ungetc. You may have to run a loop over it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文