\r 和 \n 有什么区别?
\r
和 \n
有什么不同? 我认为这与 Unix、Windows 和 Mac 有关,但我不确定它们究竟有何不同,以及在正则表达式中搜索/匹配哪些内容。
How are \r
and \n
different? I think it has something to do with Unix vs. Windows vs. Mac, but I'm not sure exactly how they're different, and which to search for/match in regexes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
他们是不同的角色。
\r
是回车符,\n
是换行符。在“旧”打印机上,
\r
将打印头送回到行的开头,而\n
将纸张前进一行。 因此,两者都是在下一行开始打印所必需的。显然,现在有点无关紧要,尽管根据控制台,您仍然可以使用
\r
移动到行的开头并覆盖现有文本。更重要的是,Unix倾向于使用
\n
作为行分隔符; Windows 倾向于使用\r\n
作为行分隔符,Mac(操作系统 9 及以上)习惯使用\r
作为行分隔符。 (Mac OS X 是 Unix-y,因此使用\n
代替;但在某些兼容性情况下,可能会使用\r
。)有关详细信息,请参阅维基百科换行文章。
编辑:这是语言敏感的。 例如,在 C# 和 Java 中,
\n
始终表示 Unicode U+000A,它被定义为换行符。 在 C 和 C++ 中,水有点浑浊,因为含义是特定于平台的。 详情请参阅评论。They're different characters.
\r
is carriage return, and\n
is line feed.On "old" printers,
\r
sent the print head back to the start of the line, and\n
advanced the paper by one line. Both were therefore necessary to start printing on the next line.Obviously that's somewhat irrelevant now, although depending on the console you may still be able to use
\r
to move to the start of the line and overwrite the existing text.More importantly, Unix tends to use
\n
as a line separator; Windows tends to use\r\n
as a line separator and Macs (up to OS 9) used to use\r
as the line separator. (Mac OS X is Unix-y, so uses\n
instead; there may be some compatibility situations where\r
is used instead though.)For more information, see the Wikipedia newline article.
EDIT: This is language-sensitive. In C# and Java, for example,
\n
always means Unicode U+000A, which is defined as line feed. In C and C++ the water is somewhat muddier, as the meaning is platform-specific. See comments for details.在C和C++中,
\n
是一个概念,\r
是一个字符,而\r\n
(几乎总是)是可移植性漏洞。想想老式的电传打字机。 打印头位于某行、某列。 当您向电传打字机发送可打印字符时,它会在当前位置打印该字符并将头部移动到下一列。 (这在概念上与打字机相同,只是打字机通常相对于打印头移动纸张。)
当您想要完成当前行并开始下一行时,您必须执行两个单独的步骤:
ASCII 将这些操作编码为两个不同的控制字符:
\x0D
(CR) 将打印头移回行首。 (Unicode 将其编码为U+000D CARRIAGE RETURN
。)\x0A
(LF) 将打印头向下移动到下一行。 (Unicode 将其编码为U+000A LINE FEED
。)在电传打字机和早期技术打印机时代,人们实际上利用了这是两个独立操作的事实。 通过发送 CR 而不跟随 LF,您可以打印已经打印的行。 这允许诸如重音、粗体和下划线等效果。 一些系统多次叠印以防止密码在硬拷贝中可见。 在早期的串行 CRT 终端上,CR 是控制光标位置以更新屏幕上已有文本的方法之一。
但大多数时候,您实际上只想转到下一行。 有些系统不需要一对控制字符,而是只允许其中一个。 例如:
U+0085 NEXT LINE
,但实际的 EBCDIC 值为0x15
。为什么不同的系统选择不同的方法? 很简单,因为没有通用标准。 您的键盘可能会显示“Enter”,而旧键盘过去会显示“Return”,这是回车符的缩写。 事实上,在串行终端上,按 Return 键实际上会发送 CR 字符。 如果您正在编写一个文本编辑器,那么很容易只使用从终端输入的该字符。 也许这就是为什么旧款 Mac 只使用 CR 的原因。
现在我们有了标准,还有更多种表示换行符的方法。 虽然在野外极其罕见,但 Unicode 拥有一些新字符,例如:
U+2028 行分隔符
U+2029 段落分隔符
即使在 Unicode 出现之前,程序员就想要用简单的方法来表示某些字符最有用的控制代码,而无需担心底层字符集。 C 有几个转义序列来表示控制代码:
\a
(用于警报),它会敲响电传打字机铃声或使终端发出蜂鸣声\f
(用于换页),它会移动到下一页的开头\t
(对于制表符),它将打印头移动到下一个水平制表符位置(此列表故意不完整。)
此映射发生在编译时--编译器看到
\a
并输入用于敲响铃声的任何魔法值。请注意,大多数助记符与 ASCII 控制代码直接相关。 例如,
\a
将映射到0x07 BEL
。 可以为使用 ASCII 以外的主机字符集(例如 EBCDIC)的系统编写编译器。 大多数具有特定助记符的控制代码可以映射到其他字符集中的控制代码。好哇! 可移植性!
嗯,差不多了。 在 C 语言中,我可以编写
printf("\aHello, World!");
来敲响铃声(或发出蜂鸣声)并输出一条消息。 但如果我想在下一行打印一些内容,我仍然需要知道主机平台需要什么才能移动到下一行输出。 CR LF? CR? 如果? NL? 还有别的事吗? 便携性就这么多了。C 有两种 I/O 模式:二进制和文本。 在二进制模式下,无论发送什么数据都会按原样传输。 但在文本模式下,有一个运行时翻译,可以将特殊字符转换为主机平台新行所需的任何字符(反之亦然)。
太棒了,那么有什么特殊字符呢?
嗯,这也依赖于实现,但是有一种独立于实现的方法来指定它:
\n
。 它通常称为“换行符”。这是一个微妙但重要的点:
\n
在编译时映射到实现定义字符值然后(在文本模式下)在运行时再次映射到底层平台移动到下一行所需的实际字符(或字符序列)。\n
与所有其他反斜杠文字不同,因为涉及两个映射。 这种两步映射使得\n
与\r
显着不同,后者只是到 CR(或任何底层中最相似的控制代码)的编译时映射。字符集是)。这让许多 C 和 C++ 程序员感到困惑。 如果您要轮询其中 100 个,至少有 99 个会告诉您
\n
表示换行。 这并不完全正确。 大多数(也许是所有)C 和 C++ 实现都使用 LF 作为\n
的神奇中间值,但这是一个实现细节。 编译器使用不同的值是可行的。 事实上,如果主机字符集不是 ASCII 的超集(例如,如果它是 EBCDIC),则\n
几乎肯定不会是 LF。因此,在 C 和 C++ 中:
\r
实际上是回车符。\n
是一个神奇的值,它在运行时与主机平台的换行符语义之间进行转换(以文本模式)。\r\n
几乎总是一个可移植性错误。 在文本模式下,这会被转换为 CR,后跟平台的换行符序列——可能不是预期的。 在二进制模式下,它会被转换为 CR,后跟一些可能可能不是 LF 的神奇值——可能不是预期的值。\x0A
是指示 ASCII LF 的最便携方式,但您只想在二进制模式下执行此操作。 大多数文本模式实现会将其视为\n
。In C and C++,
\n
is a concept,\r
is a character, and\r\n
is (almost always) a portability bug.Think of an old teletype. The print head is positioned on some line and in some column. When you send a printable character to the teletype, it prints the character at the current position and moves the head to the next column. (This is conceptually the same as a typewriter, except that typewriters typically moved the paper with respect to the print head.)
When you wanted to finish the current line and start on the next line, you had to do two separate steps:
ASCII encodes these actions as two distinct control characters:
\x0D
(CR) moves the print head back to the beginning of the line. (Unicode encodes this asU+000D CARRIAGE RETURN
.)\x0A
(LF) moves the print head down to the next line. (Unicode encodes this asU+000A LINE FEED
.)In the days of teletypes and early technology printers, people actually took advantage of the fact that these were two separate operations. By sending a CR without following it by a LF, you could print over the line you already printed. This allowed effects like accents, bold type, and underlining. Some systems overprinted several times to prevent passwords from being visible in hardcopy. On early serial CRT terminals, CR was one of the ways to control the cursor position in order to update text already on the screen.
But most of the time, you actually just wanted to go to the next line. Rather than requiring the pair of control characters, some systems allowed just one or the other. For example:
U+0085 NEXT LINE
, but the actual EBCDIC value is0x15
.Why did different systems choose different methods? Simply because there was no universal standard. Where your keyboard probably says "Enter", older keyboards used to say "Return", which was short for Carriage Return. In fact, on a serial terminal, pressing Return actually sends the CR character. If you were writing a text editor, it would be tempting to just use that character as it came in from the terminal. Perhaps that's why the older Macs used just CR.
Now that we have standards, there are more ways to represent line breaks. Although extremely rare in the wild, Unicode has new characters like:
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR
Even before Unicode came along, programmers wanted simple ways to represent some of the most useful control codes without worrying about the underlying character set. C has several escape sequences for representing control codes:
\a
(for alert) which rings the teletype bell or makes the terminal beep\f
(for form feed) which moves to the beginning of the next page\t
(for tab) which moves the print head to the next horizontal tab position(This list is intentionally incomplete.)
This mapping happens at compile-time--the compiler sees
\a
and puts whatever magic value is used to ring the bell.Notice that most of these mnemonics have direct correlations to ASCII control codes. For example,
\a
would map to0x07 BEL
. A compiler could be written for a system that used something other than ASCII for the host character set (e.g., EBCDIC). Most of the control codes that had specific mnemonics could be mapped to control codes in other character sets.Huzzah! Portability!
Well, almost. In C, I could write
printf("\aHello, World!");
which rings the bell (or beeps) and outputs a message. But if I wanted to then print something on the next line, I'd still need to know what the host platform requires to move to the next line of output. CR LF? CR? LF? NL? Something else? So much for portability.C has two modes for I/O: binary and text. In binary mode, whatever data is sent gets transmitted as-is. But in text mode, there's a run-time translation that converts a special character to whatever the host platform needs for a new line (and vice versa).
Great, so what's the special character?
Well, that's implementation dependent, too, but there's an implementation-independent way to specify it:
\n
. It's typically called the "newline character".This is a subtle but important point:
\n
is mapped at compile time to an implementation-defined character value which (in text mode) is then mapped again at run time to the actual character (or sequence of characters) required by the underlying platform to move to the next line.\n
is different than all the other backslash literals because there are two mappings involved. This two-step mapping makes\n
significantly different than even\r
, which is simply a compile-time mapping to CR (or the most similar control code in whatever the underlying character set is).This trips up many C and C++ programmers. If you were to poll 100 of them, at least 99 will tell you that
\n
means line feed. This is not entirely true. Most (perhaps all) C and C++ implementations use LF as the magic intermediate value for\n
, but that's an implementation detail. It's feasible for a compiler to use a different value. In fact, if the host character set is not a superset of ASCII (e.g., if it's EBCDIC), then\n
will almost certainly not be LF.So, in C and C++:
\r
is literally a carriage return.\n
is a magic value that gets translated (in text mode) at run-time to/from the host platform's newline semantics.\r\n
is almost always a portability bug. In text mode, this gets translated to CR followed by the platform's newline sequence--probably not what's intended. In binary mode, this gets translated to CR followed by some magic value that might not be LF--possibly not what's intended.\x0A
is the most portable way to indicate an ASCII LF, but you only want to do that in binary mode. Most text-mode implementations will treat that like\n
."\n" =>; 换行或换行
(语义)
基于 Unix 的系统仅使用“\n”来结束一行文本。
"\n" => Newline or Linefeed
(semantics)
Unix based systems use just a "\n" to end a line of text.
\r
用于指向行的开头,并且可以替换从那里开始的文本,例如产生此输出:
\n
用于新行。\r
is used to point to the start of a line and can replace the text from there, e.g.Produces this output:
\n
is for new line.简而言之,\r 的 ASCII 值是 13 (CR),\n 的 ASCII 值是 10 (LF)。
Mac 使用 CR 作为行分隔符(至少以前是这样,我不确定是否适用于现代 Mac),*nix 使用 LF,Windows 两者都使用(CRLF)。
In short \r has ASCII value 13 (CR) and \n has ASCII value 10 (LF).
Mac uses CR as line delimiter (at least, it did before, I am not sure for modern macs), *nix uses LF and Windows uses both (CRLF).
除了 @Jon Skeet 的回答:
传统上 Windows 使用 \r\n、Unix \n 和 Mac \r,但是较新的 Mac 使用 \n,因为它们是基于 unix 的。
In addition to @Jon Skeet's answer:
Traditionally Windows has used \r\n, Unix \n and Mac \r, however newer Macs use \n as they're unix based.
\r 是回车符; \n 是换行(换行)...取决于操作系统的含义。 阅读这篇文章,了解有关 '\n' 和'\r\n' ...在 C 中。
\r is Carriage Return; \n is New Line (Line Feed) ... depends on the OS as to what each means. Read this article for more on the difference between '\n' and '\r\n' ... in C.
在 C# 中,我发现他们在字符串中使用 \r\n 。
in C# I found they use \r\n in a string.
\r 用于回车。 (ASCII 值为 13)
\n 用于换行。 (ASCII 值为 10)
\r used for carriage return. (ASCII value is 13)
\n used for new line. (ASCII value is 10)