便携式行尾(换行符)
'\n'
被替换为 "\r 真是令人不快\n"
在 Windows 上,我不知道这一点。 (我猜它在Mac上也被替换了...)
有没有一种简单的方法可以确保Linux,Mac和Windows用户可以轻松地交换文本文件?
简单的方法我的意思是:无需编写以二进制模式保存文件或自己测试和替换行尾字符(或使用某些第三方程序/代码)。此问题影响我执行文本文件 I/O 的 C++ 程序。
It's been an unpleasant surprise that '\n'
is replaced with "\r\n"
on Windows, I did not know that. (I am guessing it is also replaced on Mac...)
Is there an easy way to ensure that Linux, Mac and Windows users can easily exchange text files?
By easy way I mean: without writing the file in binary mode or testing and replacing the end-of-line chars myself (or with some third party program/code). This issue effects my C++ program doing the text file I/O.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
对于与其他答案的部分重叠表示歉意,但为了完整性:
神话:
endl
是“更便携”,因为它根据平台约定编写行结尾。真相:
endl
被定义为将\n
写入流,然后调用flush
。所以事实上你几乎不想使用它。所有写入文本模式流的\n
都会由 CRT 在后台隐式转换为\r\n
,无论您是否使用os<< ;endl
、os<<'\n'
或fputs("\n",file)
。误区:您应该以文本模式打开文件来写入文本,以二进制模式打开文件来写入二进制数据。
真相: 文本模式之所以存在,是因为不久前存在区分文本文件和二进制文件的文件系统。在我所知道的任何理智的平台上,这都不再是真的。您也可以将文本写入二进制打开的文件,只需松开自动
\n
-> Windows 上的\r\n
转换。然而,这种转变弊大于利。除此之外,它使您的代码在不同平台上的行为有所不同,并且tell/seek
变得难以使用。因此,最好避免这种自动转换。请注意,POSIX 不区分二进制和文本模式。如何处理文本:以二进制模式打开所有内容并使用普通的
\n
。您还需要担心编码。对 UTF-8 进行标准化以确保 Unicode 正确性。 在内部使用 UTF-8 编码的窄字符串,而不是wchar_t
在不同的平台上是不同的。您的代码将变得更容易移植。提示:您可以强制 MSVC 默认以二进制模式打开所有文件。它应该按如下方式工作:
编辑:截至 2021 年,Windows 10 记事本可以识别 UNIX 行结尾。
Apologies for the partial overlap with other answers, but for the sake of completeness:
Myth:
endl
is 'more portable' since it writes the line ending depending on the platform convention.Truth:
endl
is defined to write\n
to the stream and then callflush
. So in fact you almost never want to use it. All\n
that are written to a text-mode stream are implicitly converted to\r\n
by the CRT behind the scenes, whether you useos<<endl
,os<<'\n'
, orfputs("\n",file)
.Myth: You should open files in text mode to write text and in binary mode to write binary data.
Truth: Text mode exists in the first place because some time ago there were file-systems that distinguished between text files and binary files. It's no longer true on any sane platform I know. You can write text to binary-opened files just as well, you just loose the automatic
\n
->\r\n
conversion on Windows. However, this conversion causes more harm than good. Among others, it makes your code behave differently on different platforms, andtell/seek
become problematic to use. Therefore it's best to avoid this automatic conversion. Note that POSIX does not distinguish between binary and text mode.How to do text: Open everything in binary mode and use the plain-old
\n
. You'll also need to worry about the encoding. Standardize on UTF-8 for Unicode-correctness. Use UTF-8 encoded narrow-strings internally, instead ofwchar_t
which is different on different platforms. Your code will become easier to port.Tip: You can force MSVC to open all files in binary mode by default. It should work as follows:
EDIT: As of 2021, Windows 10 Notepad understands UNIX line endings.
问题根本不在于
endl
,而是文本流根据系统标准重新格式化换行符。如果你不想这样,就不要使用文本流——使用二进制流。也就是说,使用 ios::binary 标志打开文件。
也就是说,如果唯一的问题是用户可以交换文件,我根本不会关心输出模式,我宁愿确保您的程序可以读取不同的格式而不会阻塞。也就是说,它应该接受不同的行结尾。
顺便说一下,任何像样的文本编辑器都会这样做(但话又说回来,Windows 上默认的
notepad.exe
并不是一个像样的文本编辑器,并且无法正确处理Unix 换行符)。The issue isn’t with
endl
at all, it’s that text streams reformat line breaks depending on the system’s standard.If you don’t want that, simply don’t use text streams – use binary streams. That is, open your files with the
ios::binary
flag.That said, if the only issue is that users can exchange files, I wouldn’t bother with the output mode at all, I’d rather make sure that your program can read different formats without choking. That is, it should accept different line endings.
This is by the way what any decent text editor does (but then again, the default
notepad.exe
on Windows is not a decent text editor, and won’t correctly handle Unix line breaks).如果您确实只想要 ASCII LF,最简单的方法是以二进制模式打开文件:在非二进制模式下 \n 被替换为特定于平台的行尾序列(例如,它可能被替换为 LF/CR 或CR/LF 序列;在 UNIX 上通常只是 LF)。在二进制模式下,这不会完成。关闭替换也是二进制模式的唯一作用。
顺便说一句,使用 endl 相当于写入 \n,然后刷新流。通常无意的刷新可能会成为主要的性能问题。因此,endl 应该很少使用,并且仅在需要冲洗时才使用。
If you really just want an ASCII LF, the easiest way is to open the file in binary mode: in non-binary mode \n is replaced by a platform specific end of line sequence (e.g. it may be replaced by a LF/CR or a CR/LF sequence; on UNIXes it typically is just LF). In binary mode this is not done. Turning off the replacement is also the only effect of the binary mode.
BTW, using endl is equivalent to writing a \n followed by flushing the stream. The typically unintended flush can become a major performance problem. Thus, endl should be use rarely and only when the flush is intended.