不同线路在不同平台结束的历史原因
为什么 DOS/Windows 和 Mac 决定使用 \r\n 和 \r 作为行结尾而不是 \n? 难道这只是试图与 Unix “不同”的结果吗?
现在 Mac OS X 是 Unix(类似),Apple 是否从 \r 切换到 \n ?
Why did DOS/Windows and Mac decide to use \r\n and \r for line ending instead of \n? Was it just a result of trying to be "different" from Unix?
And now that Mac OS X is Unix (-like), did Apple switch to \n from \r?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
有趣的是,CRLF 几乎是互联网标准。 也就是说,几乎每个面向行的标准互联网协议都使用 CRLF。 SMTP、POP、IMAP、NNTP 等。电子邮件正文由以 CRLF 结尾的行组成。
It's interesting to note the CRLF is pretty much the internet standard. That is, pretty much every standard internet protocol that is line oriented uses CRLF. SMTP, POP, IMAP, NNTP, etc.. The body of email consists of lines terminated by CRLF.
根据维基百科的说法:一开始,程序必须在 LF 之前添加额外的 CR 字符来减慢程序速度,以便打印机有时间跟上 - CP/M 和后来的 Windows 使用了这种方法。 但是 Multics 的打印机驱动程序会自动输入额外的字符,因此程序不需要这样做 - Unix 开发人员也因此而受益。 但这些都不能解释为什么早期的 Mac 没有这样做(现在它们是基于 Unix 的)。
https://en.wikipedia.org/wiki/Newline#History:
According to Wikipedia: in the beginning, the program had to put in extra CR characters before the LF to slow the program down so the printer had time to keep up - and CP/M and then later Windows used this method. But Multics's printer driver put in extra characters automatically so the program didn't have to - and Unix developer from that. But none of that explains why the early Mac didn't do that (they do now that they are based on Unix).
https://en.wikipedia.org/wiki/Newline#History:
DOS 从 CP/M 继承了 CR-LF 行结尾(您所说的 \r\n,只是使 ascii 字符显式化)。 CP/M 继承了影响 CP/M 设计师 Gary Kildall 的各种 DEC 操作系统。
使用 CR-LF 以便电传打字机将打印头返回到左边距(CR = 回车),然后移动到下一行(LF = 换行)。
Unix 人员在设备驱动程序中处理了这个问题,并在必要时将 LF 转换为 CR-LF 输出到需要它的设备。
正如您所猜测的,Mac OS X 现在使用 LF。
DOS inherited CR-LF line endings (what you're calling \r\n, just making the ascii characters explicit) from CP/M. CP/M inherited it from the various DEC operating systems which influenced CP/M designer Gary Kildall.
CR-LF was used so that the teletype machines would return the print head to the left margin (CR = carriage return), and then move to the next line (LF = line feed).
The Unix guys handled that in the device driver, and when necessary translated LF to CR-LF on output to devices that needed it.
And as you guessed, Mac OS X now uses LF.
真正添加到@Mark Harrison...
那些告诉你Unix“只是输出程序员指定的文本”而DOS 被破坏的人是完全错误的。 也有人声称,DOS 在看到 EOF 字符时标记 EOF 是愚蠢的,这引发了 EOF 字符到底有何用途的问题。
对于文本文件行结尾没有一种真正的约定 - 只有特定于平台的约定。 毕竟,即使 CR-LF、CR 和 LF 也不是唯一使用的行结束约定,而且 ASCII 也从来不是唯一的字符集。 问题在于 C 标准库和运行时,它没有抽象出这种依赖于平台的细节。 其他第三代语言(例如 Pascal 甚至 Basic)至少在某种程度上做到了这一点。 因此,当为其他平台编写 C 编译器时,需要对运行时库进行修改以实现与现有源代码和书籍的兼容性。
事实上,Unix 和 Multics 最初需要控制台 I/O 的字符串转换,因为用户通常坐在需要 CR LF 行尾的 ASCII 终端上。 不过,这种转换是在设备驱动程序中完成的 - 目标是抽象出设备特定信息,假设最好采用一种约定并坚持它来存储文本文件。
C 文本 I/O 破解在原理上与 CygWin 现在所做的类似,破解 Linux 运行时使其能够在 Windows 上正常工作。 确实存在通过黑客攻击将其转变为类似 Unix 的历史 - 但也有 Wine 将 Linux 转变为 Windows。 奇怪的是,您可以在 CygWin 常见问题解答(2013 年添加互联网档案链接 - 该页面不再存在)。 也许这只是他们的幽默感,因为他们基本上是在做他们所批评的事情,但规模更大;-)
C++ 标准库(无论其在什么平台上实现)使用 iostreams 避免了这个问题,它抽象了行结束。对于输出,这很适合我。 对于输入,我需要更多的控制,因此我要么逐个字符地解释,要么使用扫描仪生成器。[编辑事实证明,上面删除的声明不是真的,而且从来都不是。
std::endl
字面意思是\n
和刷新。\n
与 C 中的\n
完全相同 - 它往往被称为“换行”,但它实际上是一个 ASCII 换行符,然后得到如有必要,由运行时翻译。 事情(除了在顶部添加更多层),这应该总是显而易见的。]有趣的是,错误的假设如此根深蒂固,你永远不会质疑它们 - 基本上,出于兼容性原因,C++ 没有选择做 C 所做的 我的 POV 的责任在于 C,但 C 并不是唯一一个未能预见到其迁移到其他平台的项目。 指责比尔·盖茨简直是疯了——他所做的只是购买并完善了当时流行的 CP/M 的一个变体。 实际上,这只是历史 - 这与我们不知道大多数文本文件中的字符代码 128 到 255 所指的原因相同。 鉴于处理所有三种行端约定都很容易,奇怪的是,一些开发人员仍然坚持“我的平台约定是唯一真正的方法,无论你喜欢与否,我都会强迫它”的态度。
另外 - Unicode 行分隔符代码点 U+2028 会取代未来文本文件中的所有这些约定吗? ;-)
Really adding to @Mark Harrison...
The people who tell you that Unix is "just outputting the text the programmer specified" whereas DOS is broken are plain wrong. There are also claims that it's stupid for DOS to flag EOF when it sees an EOF character, raising the question of what exactly that EOF character is for.
There is no one true convention for text file line endings - only platform-specific conventions. After all, even CR-LF, CR and LF aren't the only line end conventions to ever be used, and ASCII was never even the one and only character set. The problem is the C standard library and runtime, which didn't abstract away this platform-dependent detail. Other third generation languages (such as Pascal and even Basic) managed it, at least to some degree. Because of this, when C compilers were written for other platforms, runtime library hacks were needed to achieve compatibility with existing source code and books.
In fact, it's Unix and Multics that originally needed string translation for console I/O, since users usually sat at an ASCII terminal that required CR LF line ends. This translation was done in a device driver, though - the goal was to abstract away the device-specifics, assuming that it was better to adopt one convention and stick to it for stored text files.
The C text I/O hack is similar in principle to what CygWin does now, hacking Linux runtimes to work as well as can be expected on Windows. There's a real history of hacking things about to turn them into Unix-alikes - but then there's also Wine, turning Linux into Windows. Oddly enough, you can read some misplaced line-end criticism of Windows in the CygWin FAQ (Internet Archive link added 2013 - the page no longer exists). Maybe it's just their sense of humour, since they are basically doing what they are criticising, but on a much grander scale ;-)
The C++ standard library (whatever platform its implemented on) avoids this issue using iostreams, which abstract away line ends.For output, that suits me fine. For input, I need more control, so I either interpret character-by-character or else use a scanner generator.[EDIT It turns out that the struck-out claim above isn't true, and never was. The
std::endl
literally translates to a\n
and a flush. The\n
is exactly the same\n
you get in C - it tends to get called "new line", but it's actually an ASCII line feed character, which then gets translated by the runtime if necessary. Funny how false assumptions can get so ingrained you never question them - basically, C++ had no choice to do what C did (other than adding more layers on top) for compatibility reasons, and that should always have been obvious.]The biggest slice of blame from my POV is with C, but C isn't the only project to fail to anticipate its move to other platforms. Blaming Bill Gates is just nuts - all he did was buy and polish a variant of the then popular CP/M. Really, it's just history - the same reason why we don't know what character codes 128 to 255 refer to in most text files. Given the ease of coping with all three line end conventions, it's odd that some developers still insist on that "my platforms convention is the one true way, and I shall force it on you like it or not" attitude.
Also - will the Unicode line separator codepoint U+2028 replace all these conventions in future text files? ;-)