RTF 行结束转换出现意外结果

发布于 2024-11-19 11:12:55 字数 896 浏览 2 评论 0原文

如果 txtLog 是 RichTextBox 控件:

Dim text = "hi" & vbCrLf
Debug.WriteLine("t:" & text.Length)        ' --> 4, as expected

txtLog.Text = text
Debug.WriteLine("tL:" & txtLog.TextLength) ' --> 3. muh?! :(

查看RTF 规范后,段落结尾标记为 \par,既不是 CR 也不是 LF。这是有道理的,因为 RTF 是标记语言;就像在 HTML 中一样,行结尾本身没有什么意义。

因此,大概在写入 RichTextBox 时,我的行结尾被编码为 \par 。然后,在提取时,\par 被转换回实际的行结尾以供使用。

事实证明,这一行的结尾是vbLf

为什么,既然 Microsoft 几乎一致地使用 CRLF 作为行结尾,RichTextBox 会将 \par 转换为 vbLf > 而不是 vbCrLf

If txtLog is a RichTextBox control:

Dim text = "hi" & vbCrLf
Debug.WriteLine("t:" & text.Length)        ' --> 4, as expected

txtLog.Text = text
Debug.WriteLine("tL:" & txtLog.TextLength) ' --> 3. muh?! :(

Having looked at the RTF spec, the end of a paragraph is notated as \par, which is neither CR nor LF. This makes sense since RTF is markup language; like in HTML, line endings have little meaning on their own.

So presumably, on writing into the RichTextBox, my line ending is being encoded into \par. And then, on extraction, the \par is being translated back to a real line ending for use.

It turns out that this line ending is vbLf.

Why, since Microsoft near-consistently employ CRLF for line endings, would RichTextBox translate \par to vbLf instead of vbCrLf?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

美人迟暮 2024-11-26 11:12:55

RichTextBox 以这种方式实现的直接原因是因为 RTF 规范 表示回车符(单独)或换行符本身相当于 \par

。 。 。回车符(字符值 13)或换行符(字符值 10)将被视为 \par 控件。 。 .

至于微软为什么要制定这样的规范,我不太清楚。不过我推测这与 RTF 的第一个版本 的事实有关在 20 世纪 80 年代为 Microsoft Office 的 Mac 版本开发。我猜他们开发了这个 par 规则,以便它在 Mac 上运行良好,或者作为一般的跨平台格式运行良好。如果是这种情况,那么微软可能会非常犹豫是否要在未来几年(90 年代、00 年代等)修改规范以匹配标准 Windows 行结尾(因为一般来说,微软有尝试支持向后的历史)对于这样的事情尽可能地兼容)。

The immediate reason RichTextBox is implemented this way is because the RTF specification denotes that a carriage return (by itself) or a linefeed by itself is equivalent to \par.

. . . A carriage return (character value 13) or linefeed (character value 10) will be treated as a \par control . . .

As to why Microsoft would make the specification like this, I don't know for sure. However I would speculate that it had to do with the fact that the first version of RTF was developed for the Mac version of Microsoft Office in the 1980s. I would guess that they developed this par rule so that it worked well on a Mac or worked well as a cross platform format in general. If this is the case, then Microsoft would probably be very hesitant to revise the spec in the coming years ('90s, '00s, etc.) to match standard Windows line endings (since in general Microsoft has a history of trying to support backwards compatibility as much as possible for things like this).

天冷不及心凉 2024-11-26 11:12:55

您对规范的解释不正确。

RTF 规范明确指出:

回车(字符值 13)或换行(字符值
10) 如果该字符前面有以下字符,则将被视为 \par 控制
一个反斜杠。您必须包含反斜杠;否则,RTF 会忽略
控制字。 (您可能还想插入
至少每 255 个不带反​​斜杠的回车/换行对
用于通过通信线路更好地传输文本的字符。)

这使得 RTF 成为几乎无格式的语言,即 RTF 内容独立于换行符(即换行符不是原始文本的一部分)

Hi
\par
guys
\par<eof>

: 相同

Hi\par\guys\par<eof>

与即您的读者必须将所有没有前导反斜杠的 CR 和 LF 视为空格

Hi
\
guys
\
<eof>

。 - 如果换行符是 CR+LF - 让前缀 CR 字符像 \par 标记一样处理,并且所有 LF 字符都被处理作为空格(因为 LF 没有反斜杠前缀)。

所以规格是正确且精确的。

知道了? ;)

( 在这里表示文件结束符,或者文件的结尾,无论你的文本编辑器输出什么,换行符是 CR、CR LF 或 LF,无论你的文本编辑器吐出什么:))

为什么,既然微软几乎一致地使用 CRLF 来结束行,
RichTextBox 会将 \par 翻译为 vbLf 而不是 vbCrLf 吗?

仅在 Windows 上换行符是 CRLF。在其他平台/某些应用程序中,仅是 LF。没有平台仅使用 CR 作为换行符。不过,有些平台可以同等地处理 CR 和 LF,即 CRLF 是两个换行符。在其他系统上,如果紧跟 LF(这通常包括 Windows 应用程序),CR 将被忽略。

您看到的行为是确保文本结果在几乎所有平台上生成相同数量的换行符的唯一方法。< /strong>

(当然,这也是特定于应用程序的......我将其称为鲜为人知的兼容性噩梦之一,即换行符混乱。)

Your interpretation of the spec is incorrect.

RTF spec clearly says:

A carriage return (character value 13) or linefeed (character value
10) will be treated as a \par control if the character is preceded by
a backslash. You must include the backslash; otherwise, RTF ignores
the control word. (You may also want to insert a
carriage-return/linefeed pair without backslashes at least every 255
characters for better text transmission over communication lines.)

This makes RTF an almost format-free language, i.e. RTF content is independent from line breaks (i.e. newline characters are not part of the raw text):

Hi
\par
guys
\par<eof>

is the same as

Hi\par\guys\par<eof>

i.e. your reader must consider all CRs and LFs that have no leading backslash as whitespaces.

Hi
\
guys
\
<eof>

would -if a newline is CR+LF- let the prefixed CR chars be handled like a \par token, and all LF chars be handled as whitespaces (since there is no backslash prefix for the LF present).

So the spec is correct, and precise.

Got it? ;)

(<eof> denotes an end-of-file character here, or the end of the file, whatever your text editor spits out, and a newline is CR, CR LF, or LF, whatever your texteditor spits out :))

Why, since Microsoft near-consistently employ CRLF for line endings,
would RichTextBox translate \par to vbLf instead of vbCrLf?

Only on Windows newlines are CRLF. On other platforms/in some apps, it is LF only. There is no platform using CR only as the newline character. There are platforms, though, that handle CR and LF equally, i.e. CRLF are TWO newlines there. On others, a CR is ignored if followed immediately by LF (this includes Windows apps, usually.)

The behavior you see is the only way to make sure the text result produces the same number of newlines on practically all platforms.

(Of course, this is also application-specific...I´d call this one of the lesser-known compatibility nightmares, that newline mess.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文