为什么 DOS/Windows 和 Mac 决定使用 \r\n 和 \r 作为行结尾而不是 \n? 难道这只是试图与 Unix “不同”的结果吗?

现在 Mac OS X 是 Unix(类似),Apple 是否从 \r 切换到 \n ?

Why did DOS/Windows and Mac decide to use \r\n and \r for line ending instead of \n? Was it just a result of trying to be "different" from Unix?

And now that Mac OS X is Unix (-like), did Apple switch to \n from \r?

有趣的是,CRLF 几乎是互联网标准。 也就是说,几乎每个面向行的标准互联网协议都使用 CRLF。 SMTP、POP、IMAP、NNTP 等。电子邮件正文由以 CRLF 结尾的行组成。

It's interesting to note the CRLF is pretty much the internet standard. That is, pretty much every standard internet protocol that is line oriented uses CRLF. SMTP, POP, IMAP, NNTP, etc.. The body of email consists of lines terminated by CRLF.

根据维基百科的说法:一开始,程序必须在 LF 之前添加额外的 CR 字符来减慢程序速度,以便打印机有时间跟上 - CP/M 和后来的 Windows 使用了这种方法。 但是 Multics 的打印机驱动程序会自动输入额外的字符,因此程序不需要这样做 - Unix 开发人员也因此而受益。 但这些都不能解释为什么早期的 Mac 没有这样做(现在它们是基于 Unix 的)。


序列 CR+LF 通常用于许多采用 Teletype 机器(通常是 Teletype Model 33 ASR)作为控制台设备的早期计算机系统,因为需要使用此序列将这些打印机定位在新行的开头。 将换行符分成两个函数掩盖了这样一个事实:打印头无法及时从最右边返回到下一行的开头来打印下一个字符。 当打印头仍在将笔架移回第一个位置时,在 CR 之后打印的任何字符通常会在页面中间打印为污迹。 “解决方案是将换行符设置为两个字符:CR 将回车移动到第一列,LF 将纸张向上移动。”[1] 事实上,通常需要发送额外的字符(无关的 CR 或 NUL),这被忽略,但给打印头时间移动到左边距。 许多早期的视频显示还需要多个字符时间来滚动显示。

在此类系统上,应用程序必须直接与电传打字机对话并遵循其约定,因为设备驱动程序向应用程序隐藏此类硬件详细信息的概念尚未得到很好的发展。 因此,通常会编写文本来满足电传打字机的需要。 DEC 的大多数小型计算机系统都使用此约定。 CP/M 还使用它来在小型计算机使用的相同终端上进行打印。 从此MS-DOS(1981)为了兼容采用了CP/M的CR+LF,这个约定也被微软后来的Windows操作系统继承了。

Multics 操作系统于 1964 年开始开发,并单独使用 LF 作为换行符。 Multics 使用设备驱动程序将此字符转换为打印机所需的任何序列(包括额外的填充字符),并且单字节更便于编程。 似乎更明显的[需要引用]选择——CR——没有被使用,因为CR提供了将一行与另一行叠印以创建粗体和删除线效果的有用功能。 也许更重要的是,单独使用 LF 作为线路终止符已经被纳入最终的 ISO/IEC 646 标准草案中。 Unix 遵循了 Multics 的做法,后来类 Unix 系统也遵循了 Unix。 这在 Windows 和类 Unix 操作系统之间造成了冲突,在一个操作系统上编写的文件无法正确格式化或由另一个操作系统解释(例如,在 Windows 文本编辑器(如记事本)中编写的 UNIX shell 脚本)。

According to Wikipedia: in the beginning, the program had to put in extra CR characters before the LF to slow the program down so the printer had time to keep up - and CP/M and then later Windows used this method. But Multics's printer driver put in extra characters automatically so the program didn't have to - and Unix developer from that. But none of that explains why the early Mac didn't do that (they do now that they are based on Unix).


The sequence CR+LF was commonly used on many early computer systems that had adopted Teletype machines—typically a Teletype Model 33 ASR—as a console device, because this sequence was required to position those printers at the start of a new line. The separation of newline into two functions concealed the fact that the print head could not return from the far right to the beginning of the next line in time to print the next character. Any character printed after a CR would often print as a smudge in the middle of the page while the print head was still moving the carriage back to the first position. "The solution was to make the newline two characters: CR to move the carriage to column one, and LF to move the paper up."[1] In fact, it was often necessary to send extra characters—extraneous CRs or NULs—which are ignored but give the print head time to move to the left margin. Many early video displays also required multiple character times to scroll the display.

On such systems, applications had to talk directly to the Teletype machine and follow its conventions since the concept of device drivers hiding such hardware details from the application was not yet well developed. Therefore, text was routinely composed to satisfy the needs of Teletype machines. Most minicomputer systems from DEC used this convention. CP/M also used it in order to print on the same terminals that minicomputers used. From there MS-DOS (1981) adopted CP/M's CR+LF in order to be compatible, and this convention was inherited by Microsoft's later Windows operating system.

The Multics operating system began development in 1964 and used LF alone as its newline. Multics used a device driver to translate this character to whatever sequence a printer needed (including extra padding characters), and the single byte was more convenient for programming. What seems like a more obvious[citation needed] choice—CR—was not used, as CR provided the useful function of overprinting one line with another to create boldface and strikethrough effects. Perhaps more importantly, the use of LF alone as a line terminator had already been incorporated into drafts of the eventual ISO/IEC 646 standard. Unix followed the Multics practice, and later Unix-like systems followed Unix. This created conflicts between Windows and Unix-like OSes, whereby files composed on one OS cannot be properly formatted or interpreted by another OS (for example a UNIX shell script written in a Windows text editor like Notepad).

DOS 从 CP/M 继承了 CR-LF 行结尾(您所说的 \r\n,只是使 ascii 字符显式化)。 CP/M 继承了影响 CP/M 设计师 Gary Kildall 的各种 DEC 操作系统。

使用 CR-LF 以便电传打字机将打印头返回到左边距(CR = 回车),然后移动到下一行(LF = 换行)。

Unix 人员在设备驱动程序中处理了这个问题,并在必要时将 LF 转换为 CR-LF 输出到需要它的设备。

正如您所猜测的,Mac OS X 现在使用 LF。

DOS inherited CR-LF line endings (what you're calling \r\n, just making the ascii characters explicit) from CP/M. CP/M inherited it from the various DEC operating systems which influenced CP/M designer Gary Kildall.

CR-LF was used so that the teletype machines would return the print head to the left margin (CR = carriage return), and then move to the next line (LF = line feed).

The Unix guys handled that in the device driver, and when necessary translated LF to CR-LF on output to devices that needed it.

And as you guessed, Mac OS X now uses LF.

真正添加到@Mark Harrison...

那些告诉你Unix“只是输出程序员指定的文本”而DOS 被破坏的人是完全错误的。 也有人声称,DOS 在看到 EOF 字符时标记 EOF 是愚蠢的,这引发了 EOF 字符到底有何用途的问题。

对于文本文件行结尾没有一种真正的约定 - 只有特定于平台的约定。 毕竟,即使 CR-LF、CR 和 LF 也不是唯一使用的行结束约定,而且 ASCII 也从来不是唯一的字符集。 问题在于 C 标准库和运行时,它没有抽象出这种依赖于平台的细节。 其他第三代语言(例如 Pascal 甚至 Basic)至少在某种程度上做到了这一点。 因此,当为其他平台编写 C 编译器时,需要对运行时库进行修改以实现与现有源代码和书籍的兼容性。

事实上,Unix 和 Multics 最初需要控制台 I/O 的字符串转换,因为用户通常坐在需要 CR LF 行尾的 ASCII 终端上。 不过,这种转换是在设备驱动程序中完成的 - 目标是抽象出设备特定信息,假设最好采用一种约定并坚持它来存储文本文件。

C 文本 I/O 破解在原理上与 CygWin 现在所做的类似,破解 Linux 运行时使其能够在 Windows 上正常工作。 确实存在通过黑客攻击将其转变为类似 Unix 的历史 - 但也有 Wine 将 Linux 转变为 Windows。 奇怪的是,您可以在 CygWin 常见问题解答(2013 年添加互联网档案链接 - 该页面不再存在)。 也许这只是他们的幽默感,因为他们基本上是在做他们所批评的事情,但规模更大;-)

C++ 标准库(无论其在什么平台上实现)使用 iostreams 避免了这个问题,它抽象了行结束。 对于输出,这很适合我。 对于输入,我需要更多的控制,因此我要么逐个字符地解释,要么使用扫描仪生成器。

[编辑事实证明,上面删除的声明不是真的,而且从来都不是。 std::endl 字面意思是 \n 和刷新。 \n 与 C 中的 \n 完全相同 - 它往往被称为“换行”,但它实际上是一个 ASCII 换行符,然后得到如有必要,由运行时翻译。 事情(除了在顶部添加更多层),这应该总是显而易见的。]

有趣的是,错误的假设如此根深蒂固,你永远不会质疑它们 - 基本上,出于兼容性原因,C++ 没有选择做 C 所做的 我的 POV 的责任在于 C,但 C 并不是唯一一个未能预见到其迁移到其他平台的项目。 指责比尔·盖茨简直是疯了——他所做的只是购买并完善了当时流行的 CP/M 的一个变体。 实际上,这只是历史 - 这与我们不知道大多数文本文件中的字符代码 128 到 255 所指的原因相同。 鉴于处理所有三种行端约定都很容易,奇怪的是,一些开发人员仍然坚持“我的平台约定是唯一真正的方法,无论你喜欢与否,我都会强迫它”的态度。

另外 - Unicode 行分隔符代码点 U+2028 会取代未来文本文件中的所有这些约定吗? ;-)

Really adding to @Mark Harrison...

The people who tell you that Unix is "just outputting the text the programmer specified" whereas DOS is broken are plain wrong. There are also claims that it's stupid for DOS to flag EOF when it sees an EOF character, raising the question of what exactly that EOF character is for.

There is no one true convention for text file line endings - only platform-specific conventions. After all, even CR-LF, CR and LF aren't the only line end conventions to ever be used, and ASCII was never even the one and only character set. The problem is the C standard library and runtime, which didn't abstract away this platform-dependent detail. Other third generation languages (such as Pascal and even Basic) managed it, at least to some degree. Because of this, when C compilers were written for other platforms, runtime library hacks were needed to achieve compatibility with existing source code and books.

In fact, it's Unix and Multics that originally needed string translation for console I/O, since users usually sat at an ASCII terminal that required CR LF line ends. This translation was done in a device driver, though - the goal was to abstract away the device-specifics, assuming that it was better to adopt one convention and stick to it for stored text files.

The C text I/O hack is similar in principle to what CygWin does now, hacking Linux runtimes to work as well as can be expected on Windows. There's a real history of hacking things about to turn them into Unix-alikes - but then there's also Wine, turning Linux into Windows. Oddly enough, you can read some misplaced line-end criticism of Windows in the CygWin FAQ (Internet Archive link added 2013 - the page no longer exists). Maybe it's just their sense of humour, since they are basically doing what they are criticising, but on a much grander scale ;-)

The C++ standard library (whatever platform its implemented on) avoids this issue using iostreams, which abstract away line ends. For output, that suits me fine. For input, I need more control, so I either interpret character-by-character or else use a scanner generator.

[EDIT It turns out that the struck-out claim above isn't true, and never was. The std::endl literally translates to a \n and a flush. The \n is exactly the same \n you get in C - it tends to get called "new line", but it's actually an ASCII line feed character, which then gets translated by the runtime if necessary. Funny how false assumptions can get so ingrained you never question them - basically, C++ had no choice to do what C did (other than adding more layers on top) for compatibility reasons, and that should always have been obvious.]

The biggest slice of blame from my POV is with C, but C isn't the only project to fail to anticipate its move to other platforms. Blaming Bill Gates is just nuts - all he did was buy and polish a variant of the then popular CP/M. Really, it's just history - the same reason why we don't know what character codes 128 to 255 refer to in most text files. Given the ease of coping with all three line end conventions, it's odd that some developers still insist on that "my platforms convention is the one true way, and I shall force it on you like it or not" attitude.

Also - will the Unicode line separator codepoint U+2028 replace all these conventions in future text files? ;-)

