将包含反斜杠的富文本转换为纯文本或 html
我正在尝试将富文本字符串转换为纯文本或 html。我目前正在使用 RichTextBox.Text 功能,该功能几乎可以在所有情况下正常工作,除非文本包含反斜杠,然后某些文本会被删除,因为转换器认为它是 rtf 格式的一部分。有谁知道如何让反斜杠保留在这种情况下。 这是我想要的字符串的示例,
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}}\viewkind4\uc1\pard\fs17 Testing Export \with comments\par}
我需要的文本是“Testing Export \with comments”,我从 rtf 转换器返回的文本是“Testing Export comments”。任何帮助将不胜感激。如果您还有其他问题,请回复。
I am trying to convert a rich text string to plain text or html. I am currently using the RichTextBox.Text feature which works correctly for almost all cases except when the text contains backslashes then some of the text is stripped out as the converter believes that it is part of the rtf formatting. Does anyone have any ideas of how to get the backslashes to stay in that instance.
Here is an example of a string I would have
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}}\viewkind4\uc1\pard\fs17 Testing Export \with comments\par}
The text I would need would be "Testing Export \with comments" and the text I am getting back from the rtf converter is "Testing Export comments". Any help would be greatly appreciated. Please respond if you have further questions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为转换器是正确的。 RTF 文本中真正的反斜杠应该被转义(例如转义为
\\
)。我相信,您所收到的根本不是有效的 RTF。您可以尝试通过对输入进行正则表达式替换来修复它,以将不属于 有效控制字,这看起来非常脆弱,如果有人向文本中添加一个是有效控件的序列,就会出错单词。唯一安全的方法是修复生成 RTF 的任何内容,以正确转义反斜杠。
I think the converter is right. A real backslash in text in RTF should be escaped (eg. to
\\
). What you have been given is, I believe, not valid RTF at all.Whist you could try fixing it up by doing a regex replace over the input to double-up any backslashes that were not part of valid control words, this seems very fragile and will go wrong if someone adds a sequence to the text that is a valid control word. The only way to be safe would be to fix whatever is producing the RTF to escape backslashes properly.