\r 和 \n 有什么区别?

发布于 2024-07-30 18:28:52 字数 110 浏览 7 评论 0原文

\r\n 有什么不同? 我认为这与 Unix、Windows 和 Mac 有关,但我不确定它们究竟有何不同,以及在正则表达式中搜索/匹配哪些内容。

How are \r and \n different? I think it has something to do with Unix vs. Windows vs. Mac, but I'm not sure exactly how they're different, and which to search for/match in regexes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

我不在是我 2024-08-06 18:28:52

他们是不同的角色。 \r 是回车符,\n 是换行符。

在“旧”打印机上,\r 将打印头送回到行的开头,而 \n 将纸张前进一行。 因此,两者都是在下一行开始打印所必需的。

显然,现在有点无关紧要,尽管根据控制台,您仍然可以使用 \r 移动到行的开头并覆盖现有文本。

更重要的是,Unix倾向于使用\n作为行分隔符; Windows 倾向于使用 \r\n 作为行分隔符,Mac(操作系统 9 及以上)习惯使用 \r 作为行分隔符。 (Mac OS X 是 Unix-y,因此使用 \n 代替;但在某些兼容性情况下,可能会使用 \r。)

有关详细信息,请参阅维基百科换行文章

编辑:这是语言敏感的。 例如,在 C# 和 Java 中,\n 始终表示 Unicode U+000A,它被定义为换行符。 在 C 和 C++ 中,水有点浑浊,因为含义是特定于平台的。 详情请参阅评论。

They're different characters. \r is carriage return, and \n is line feed.

On "old" printers, \r sent the print head back to the start of the line, and \n advanced the paper by one line. Both were therefore necessary to start printing on the next line.

Obviously that's somewhat irrelevant now, although depending on the console you may still be able to use \r to move to the start of the line and overwrite the existing text.

More importantly, Unix tends to use \n as a line separator; Windows tends to use \r\n as a line separator and Macs (up to OS 9) used to use \r as the line separator. (Mac OS X is Unix-y, so uses \n instead; there may be some compatibility situations where \r is used instead though.)

For more information, see the Wikipedia newline article.

EDIT: This is language-sensitive. In C# and Java, for example, \n always means Unicode U+000A, which is defined as line feed. In C and C++ the water is somewhat muddier, as the meaning is platform-specific. See comments for details.

落花随流水 2024-08-06 18:28:52

在C和C++中,\n是一个概念,\r是一个字符,而\r\n(几乎总是)是可移植性漏洞。

想想老式的电传打字机。 打印头位于某行、某列。 当您向电传打字机发送可打印字符时,它会在当前位置打印该字符并将头部移动到下一列。 (这在概念上与打字机相同,只是打字机通常相对于打印头移动纸张。)

当您想要完成当前行并开始下一行时,您必须执行两个单独的步骤:

  1. 移动纸张打印头回到行首,然后
  2. 向下移动到下一行。

ASCII 将这些操作编码为两个不同的控制字符:

  • \x0D (CR) 将打印头移回行首。 (Unicode 将其编码为 U+000D CARRIAGE RETURN。)
  • \x0A (LF) 将打印头向下移动到下一行。 (Unicode 将其编码为 U+000A LINE FEED。)

在电传打字机和早期技术打印机时代,人们实际上利用了这是两个独立操作的事实。 通过发送 CR 而不跟随 LF,您可以打印已经打印的行。 这允许诸如重音、粗体和下划线等效果。 一些系统多次叠印以防止密码在硬拷贝中可见。 在早期的串行 CRT 终端上,CR 是控制光标位置以更新屏幕上已有文本的方法之一。

但大多数时候,您实际上只想转到下一行。 有些系统不需要一对控制字符,而是只允许其中一个。 例如:

  • Unix 变体(包括现代版本的 Mac)仅使用 LF 字符来指示换行符。
  • 旧的(OSX 之前的)Macintosh 文件仅使用 CR 字符来指示换行符。
  • VMS、CP/M、DOS、Windows 和许多网络协议仍然期望两者:CR LF。
  • 使用 EBCDIC 的旧 IBM 系统以 NL 为标准,该字符甚至不存在于 ASCII 字符集中。 在 Unicode 中,NL 为 U+0085 NEXT LINE,但实际的 EBCDIC 值为 0x15

为什么不同的系统选择不同的方法? 很简单,因为没有通用标准。 您的键盘可能会显示“Enter”,而旧键盘过去会显示“Return”,这是回车符的缩写。 事实上,在串行终端上,按 Return 键实际上会发送 CR 字符。 如果您正在编写一个文本编辑器,那么很容易只使用从终端输入的该字符。 也许这就是为什么旧款 Mac 只使用 CR 的原因。

现在我们有了标准,还有更多种表示换行符的方法。 虽然在野外极其罕见,但 Unicode 拥有一些新字符,例如:

  • U+2028 行分隔符
  • U+2029 段落分隔符

即使在 Unicode 出现之前,程序员就想要用简单的方法来表示某些字符最有用的控制代码,而无需担心底层字符集。 C 有几个转义序列来表示控制代码:

  • \a(用于警报),它会敲响电传打字机铃声或使终端发出蜂鸣声
  • \f(用于换页),它会移动到下一页的开头
  • \t (对于制表符),它将打印头移动到下一个水平制表符位置

(此列表故意不完整。)

此映射发生在编译时--编译器看到 \a 并输入用于敲响铃声的任何魔法值。

请注意,大多数助记符与 ASCII 控制代码直接相关。 例如,\a 将映射到 0x07 BEL。 可以为使用 ASCII 以外的主机字符集(例如 EBCDIC)的系统编写编译器。 大多数具有特定助记符的控制代码可以映射到其他字符集中的控制代码。

好哇! 可移植性!

嗯,差不多了。 在 C 语言中,我可以编写 printf("\aHello, World!"); 来敲响铃声(或发出蜂鸣声)并输出一条消息。 但如果我想在下一行打印一些内容,我仍然需要知道主机平台需要什么才能移动到下一行输出。 CR LF? CR? 如果? NL? 还有别的事吗? 便携性就这么多了。

C 有两种 I/O 模式:二进制和文本。 在二进制模式下,无论发送什么数据都会按原样传输。 但在文本模式下,有一个运行时翻译,可以将特殊字符转换为主机平台新行所需的任何字符(反之亦然)。

太棒了,那么有什么特殊字符呢?

嗯,这也依赖于实现,但是有一种独立于实现的方法来指定它:\n。 它通常称为“换行符”。

这是一个微妙但重要的点: \n编译时映射到实现定义字符值然后(在文本模式下)在运行时再次映射到底层平台移动到下一行所需的实际字符(或字符序列)。

\n 与所有其他反斜杠文字不同,因为涉及两个映射。 这种两步映射使得 \n\r 显着不同,后者只是到 CR(或任何底层中最相似的控制代码)的编译时映射。字符集是)。

这让许多 C 和 C++ 程序员感到困惑。 如果您要轮询其中 100 个,至少有 99 个会告诉您 \n 表示换行。 这并不完全正确。 大多数(也许是所有)C 和 C++ 实现都使用 LF 作为 \n 的神奇中间值,但这是一个实现细节。 编译器使用不同的值是可行的。 事实上,如果主机字符集不是 ASCII 的超集(例如,如果它是 EBCDIC),则 \n 几乎肯定不会是 LF。

因此,在 C 和 C++ 中:

  • \r 实际上是回车符。
  • \n 是一个神奇的值,它在运行时与主机平台的换行符语义之间进行转换(以文本模式)。
  • \r\n 几乎总是一个可移植性错误。 在文本模式下,这会被转换为 CR,后跟平台的换行符序列——可能不是预期的。 在二进制模式下,它会被转换为 CR,后跟一些可能可能不是 LF 的神奇值——可能不是预期的值。
  • \x0A 是指示 ASCII LF 的最便携方式,但您只想在二进制模式下执行此操作。 大多数文本模式实现会将其视为 \n

In C and C++, \n is a concept, \r is a character, and \r\n is (almost always) a portability bug.

Think of an old teletype. The print head is positioned on some line and in some column. When you send a printable character to the teletype, it prints the character at the current position and moves the head to the next column. (This is conceptually the same as a typewriter, except that typewriters typically moved the paper with respect to the print head.)

When you wanted to finish the current line and start on the next line, you had to do two separate steps:

  1. move the print head back to the beginning of the line, then
  2. move it down to the next line.

ASCII encodes these actions as two distinct control characters:

  • \x0D (CR) moves the print head back to the beginning of the line. (Unicode encodes this as U+000D CARRIAGE RETURN.)
  • \x0A (LF) moves the print head down to the next line. (Unicode encodes this as U+000A LINE FEED.)

In the days of teletypes and early technology printers, people actually took advantage of the fact that these were two separate operations. By sending a CR without following it by a LF, you could print over the line you already printed. This allowed effects like accents, bold type, and underlining. Some systems overprinted several times to prevent passwords from being visible in hardcopy. On early serial CRT terminals, CR was one of the ways to control the cursor position in order to update text already on the screen.

But most of the time, you actually just wanted to go to the next line. Rather than requiring the pair of control characters, some systems allowed just one or the other. For example:

  • Unix variants (including modern versions of Mac) use just a LF character to indicate a newline.
  • Old (pre-OSX) Macintosh files used just a CR character to indicate a newline.
  • VMS, CP/M, DOS, Windows, and many network protocols still expect both: CR LF.
  • Old IBM systems that used EBCDIC standardized on NL--a character that doesn't even exist in the ASCII character set. In Unicode, NL is U+0085 NEXT LINE, but the actual EBCDIC value is 0x15.

Why did different systems choose different methods? Simply because there was no universal standard. Where your keyboard probably says "Enter", older keyboards used to say "Return", which was short for Carriage Return. In fact, on a serial terminal, pressing Return actually sends the CR character. If you were writing a text editor, it would be tempting to just use that character as it came in from the terminal. Perhaps that's why the older Macs used just CR.

Now that we have standards, there are more ways to represent line breaks. Although extremely rare in the wild, Unicode has new characters like:

  • U+2028 LINE SEPARATOR
  • U+2029 PARAGRAPH SEPARATOR

Even before Unicode came along, programmers wanted simple ways to represent some of the most useful control codes without worrying about the underlying character set. C has several escape sequences for representing control codes:

  • \a (for alert) which rings the teletype bell or makes the terminal beep
  • \f (for form feed) which moves to the beginning of the next page
  • \t (for tab) which moves the print head to the next horizontal tab position

(This list is intentionally incomplete.)

This mapping happens at compile-time--the compiler sees \a and puts whatever magic value is used to ring the bell.

Notice that most of these mnemonics have direct correlations to ASCII control codes. For example, \a would map to 0x07 BEL. A compiler could be written for a system that used something other than ASCII for the host character set (e.g., EBCDIC). Most of the control codes that had specific mnemonics could be mapped to control codes in other character sets.

Huzzah! Portability!

Well, almost. In C, I could write printf("\aHello, World!"); which rings the bell (or beeps) and outputs a message. But if I wanted to then print something on the next line, I'd still need to know what the host platform requires to move to the next line of output. CR LF? CR? LF? NL? Something else? So much for portability.

C has two modes for I/O: binary and text. In binary mode, whatever data is sent gets transmitted as-is. But in text mode, there's a run-time translation that converts a special character to whatever the host platform needs for a new line (and vice versa).

Great, so what's the special character?

Well, that's implementation dependent, too, but there's an implementation-independent way to specify it: \n. It's typically called the "newline character".

This is a subtle but important point: \n is mapped at compile time to an implementation-defined character value which (in text mode) is then mapped again at run time to the actual character (or sequence of characters) required by the underlying platform to move to the next line.

\n is different than all the other backslash literals because there are two mappings involved. This two-step mapping makes \n significantly different than even \r, which is simply a compile-time mapping to CR (or the most similar control code in whatever the underlying character set is).

This trips up many C and C++ programmers. If you were to poll 100 of them, at least 99 will tell you that \n means line feed. This is not entirely true. Most (perhaps all) C and C++ implementations use LF as the magic intermediate value for \n, but that's an implementation detail. It's feasible for a compiler to use a different value. In fact, if the host character set is not a superset of ASCII (e.g., if it's EBCDIC), then \n will almost certainly not be LF.

So, in C and C++:

  • \r is literally a carriage return.
  • \n is a magic value that gets translated (in text mode) at run-time to/from the host platform's newline semantics.
  • \r\n is almost always a portability bug. In text mode, this gets translated to CR followed by the platform's newline sequence--probably not what's intended. In binary mode, this gets translated to CR followed by some magic value that might not be LF--possibly not what's intended.
  • \x0A is the most portable way to indicate an ASCII LF, but you only want to do that in binary mode. Most text-mode implementations will treat that like \n.
念三年u 2024-08-06 18:28:52
  • “\r” => 返回
  • "\n" =>; 换行或换行
    (语义)

  • 基于 Unix 的系统仅使用“\n”来结束一行文本。

  • Dos 使用“\r\n”来结束一行文本。
  • 其他一些机器只使用“\r”。 (Commodore、Apple II、OS X 之前的 Mac 操作系统等..)
  • "\r" => Return
  • "\n" => Newline or Linefeed
    (semantics)

  • Unix based systems use just a "\n" to end a line of text.

  • Dos uses "\r\n" to end a line of text.
  • Some other machines used just a "\r". (Commodore, Apple II, Mac OS prior to OS X, etc..)
孤独患者 2024-08-06 18:28:52

\r 用于指向行的开头,并且可以替换从那里开始的文本,例如

int main()
{
    printf("\nab");
    printf("\bsi");
    printf("\rha");
}

产生此输出:

hai

\n 用于新行。

\r is used to point to the start of a line and can replace the text from there, e.g.

int main()
{
    printf("\nab");
    printf("\bsi");
    printf("\rha");
}

Produces this output:

hai

\n is for new line.

夜深人未静 2024-08-06 18:28:52

简而言之,\r 的 ASCII 值是 13 (CR),\n 的 ASCII 值是 10 (LF)。
Mac 使用 CR 作为行分隔符(至少以前是这样,我不确定是否适用于现代 Mac),*nix 使用 LF,Windows 两者都使用(CRLF)。

In short \r has ASCII value 13 (CR) and \n has ASCII value 10 (LF).
Mac uses CR as line delimiter (at least, it did before, I am not sure for modern macs), *nix uses LF and Windows uses both (CRLF).

叫思念不要吵 2024-08-06 18:28:52

除了 @Jon Skeet 的回答:

传统上 Windows 使用 \r\n、Unix \n 和 Mac \r,但是较新的 Mac 使用 \n,因为它们是基于 unix 的。

In addition to @Jon Skeet's answer:

Traditionally Windows has used \r\n, Unix \n and Mac \r, however newer Macs use \n as they're unix based.

浮云落日 2024-08-06 18:28:52

\r 是回车符; \n 是换行(换行)...取决于操作系统的含义。 阅读这篇文章,了解有关 '\n' 和'\r\n' ...在 C 中。

\r is Carriage Return; \n is New Line (Line Feed) ... depends on the OS as to what each means. Read this article for more on the difference between '\n' and '\r\n' ... in C.

夏末的微笑 2024-08-06 18:28:52

在 C# 中,我发现他们在字符串中使用 \r\n 。

in C# I found they use \r\n in a string.

纵性 2024-08-06 18:28:52

\r 用于回车。 (ASCII 值为 13)
\n 用于换行。 (ASCII 值为 10)

\r used for carriage return. (ASCII value is 13)
\n used for new line. (ASCII value is 10)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文