将受限制的 RTF 子集转换为带有 HTML 格式标记的纯文本
我希望获取 WPF RichTextBox 的输出,该输出被锁定为仅允许某些格式命令(粗体、下划线和斜体),并将其解析为带有表示格式的 HTML 标记的纯文本。这样 Oracle 发布界面就可以获取并解析格式信息。
所有其他信息(例如字体大小、颜色等)并不重要,因为它们将在发布模板中进一步处理。
理想情况下,我们会得到如下所示的内容,删除所有其他 rtf 标签:
This is <b>some bold text, with <i>this bit</i> italic as well</b>
有没有相对简单的方法来做到这一点?我见过一些正则表达式字符串,但它们似乎总是让不需要的 rtf 材料通过。我真的不想使用商业解决方案,因为这是一个很小的问题。 有什么想法吗?
I'm looking to take the output of a WPF RichTextBox which is locked down to only allow certain formatting commands (Bold, Underlined and Italic), and parse it to be plaintext with HTML tags denoting the formatting. This is so that the formatting information can be picked up and parsed by an Oracle Publishing interface.
All other information such as font sizes, colors etc are not important, as they will be handled the Publishing template further down the line.
Ideally then we would have something like the following, stripping out all other rtf tags:
This is <b>some bold text, with <i>this bit</i> italic as well</b>
Is there a relatively easy way to do this? I've seen some Regex strings, but they always seem to let unwanted rtf material through. I don't want to use a commercial solution really, as its quite a small problem.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您应该解析 RTF 并用 HTML 标记替换必要的控制代码。考虑到 RTF 的复杂性,我认为 Regex 还不够。
富文本格式 (RTF) 规范,版本 1.6< /a>.语法相对简单,我认为您只需要处理诸如
\b
等控制代码即可。NRTFTree - C# 中用于 RTF 处理的类库。它的 SAX 解析器可能就是您所需要的。
You should parse RTF and replace necessary control codes with HTML tags. Considering complexity of RTF, I don't think Regex will be enough.
Rich Text Format (RTF) Specification, version 1.6. Syntax is relatively easy, you just need to process control codes like
\b
for bold etc., I think.NRTFTree - A class library for RTF processing in C#. Its SAX parser is probably what you need.