RTF 行结束转换出现意外结果
如果 txtLog 是 RichTextBox
控件:
Dim text = "hi" & vbCrLf
Debug.WriteLine("t:" & text.Length) ' --> 4, as expected
txtLog.Text = text
Debug.WriteLine("tL:" & txtLog.TextLength) ' --> 3. muh?! :(
查看RTF 规范后,段落结尾标记为 \par
,既不是 CR
也不是 LF
。这是有道理的,因为 RTF 是标记语言;就像在 HTML 中一样,行结尾本身没有什么意义。
因此,大概在写入 RichTextBox
时,我的行结尾被编码为 \par
。然后,在提取时,\par
被转换回实际的行结尾以供使用。
事实证明,这一行的结尾是vbLf
。
为什么,既然 Microsoft 几乎一致地使用 CRLF
作为行结尾,RichTextBox
会将 \par
转换为 vbLf
> 而不是 vbCrLf
?
If txtLog
is a RichTextBox
control:
Dim text = "hi" & vbCrLf
Debug.WriteLine("t:" & text.Length) ' --> 4, as expected
txtLog.Text = text
Debug.WriteLine("tL:" & txtLog.TextLength) ' --> 3. muh?! :(
Having looked at the RTF spec, the end of a paragraph is notated as \par
, which is neither CR
nor LF
. This makes sense since RTF is markup language; like in HTML, line endings have little meaning on their own.
So presumably, on writing into the RichTextBox
, my line ending is being encoded into \par
. And then, on extraction, the \par
is being translated back to a real line ending for use.
It turns out that this line ending is vbLf
.
Why, since Microsoft near-consistently employ CRLF
for line endings, would RichTextBox
translate \par
to vbLf
instead of vbCrLf
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
RichTextBox 以这种方式实现的直接原因是因为 RTF 规范 表示回车符(单独)或换行符本身相当于
\par
。至于微软为什么要制定这样的规范,我不太清楚。不过我推测这与 RTF 的第一个版本 的事实有关在 20 世纪 80 年代为 Microsoft Office 的 Mac 版本开发。我猜他们开发了这个 par 规则,以便它在 Mac 上运行良好,或者作为一般的跨平台格式运行良好。如果是这种情况,那么微软可能会非常犹豫是否要在未来几年(90 年代、00 年代等)修改规范以匹配标准 Windows 行结尾(因为一般来说,微软有尝试支持向后的历史)对于这样的事情尽可能地兼容)。
The immediate reason RichTextBox is implemented this way is because the RTF specification denotes that a carriage return (by itself) or a linefeed by itself is equivalent to
\par
.As to why Microsoft would make the specification like this, I don't know for sure. However I would speculate that it had to do with the fact that the first version of RTF was developed for the Mac version of Microsoft Office in the 1980s. I would guess that they developed this par rule so that it worked well on a Mac or worked well as a cross platform format in general. If this is the case, then Microsoft would probably be very hesitant to revise the spec in the coming years ('90s, '00s, etc.) to match standard Windows line endings (since in general Microsoft has a history of trying to support backwards compatibility as much as possible for things like this).
您对规范的解释不正确。
RTF 规范明确指出:
这使得 RTF 成为几乎无格式的语言,即 RTF 内容独立于换行符(即换行符不是原始文本的一部分)
: 相同
与即您的读者必须将所有没有前导反斜杠的 CR 和 LF 视为空格
。 - 如果换行符是 CR+LF - 让前缀
CR
字符像\par
标记一样处理,并且所有LF
字符都被处理作为空格(因为 LF 没有反斜杠前缀)。所以规格是正确且精确的。
知道了? ;)
(
在这里表示文件结束符,或者文件的结尾,无论你的文本编辑器输出什么,换行符是 CR、CR LF 或 LF,无论你的文本编辑器吐出什么:))仅在 Windows 上换行符是 CRLF。在其他平台/某些应用程序中,仅是 LF。没有平台仅使用 CR 作为换行符。不过,有些平台可以同等地处理 CR 和 LF,即 CRLF 是两个换行符。在其他系统上,如果紧跟 LF(这通常包括 Windows 应用程序),CR 将被忽略。
您看到的行为是确保文本结果在几乎所有平台上生成相同数量的换行符的唯一方法。< /strong>
(当然,这也是特定于应用程序的......我将其称为鲜为人知的兼容性噩梦之一,即换行符混乱。)
Your interpretation of the spec is incorrect.
RTF spec clearly says:
This makes RTF an almost format-free language, i.e. RTF content is independent from line breaks (i.e. newline characters are not part of the raw text):
is the same as
i.e. your reader must consider all CRs and LFs that have no leading backslash as whitespaces.
would -if a newline is CR+LF- let the prefixed
CR
chars be handled like a\par
token, and allLF
chars be handled as whitespaces (since there is no backslash prefix for the LF present).So the spec is correct, and precise.
Got it? ;)
(
<eof>
denotes an end-of-file character here, or the end of the file, whatever your text editor spits out, and a newline is CR, CR LF, or LF, whatever your texteditor spits out :))Only on Windows newlines are CRLF. On other platforms/in some apps, it is LF only. There is no platform using CR only as the newline character. There are platforms, though, that handle CR and LF equally, i.e. CRLF are TWO newlines there. On others, a CR is ignored if followed immediately by LF (this includes Windows apps, usually.)
The behavior you see is the only way to make sure the text result produces the same number of newlines on practically all platforms.
(Of course, this is also application-specific...I´d call this one of the lesser-known compatibility nightmares, that newline mess.)