从 Outlook/Word/Office 粘贴到嵌入式浏览器
所以,我们有一个很棒的应用程序,进展顺利,但我们的一些用户喜欢在粘贴到我们的应用程序之前将其文本复制到 word。当他们这样做时,HTML 会在某种程度上正确解析,但通常包含来自 Outlook 或 Word 的标签,而我们的 XHTML 引擎不喜欢或理解这些标签。
例如,用户在 Word 中输入注释,其中包含一些小格式,然后进入我们的 HTML 编辑器(它只是打开设计模式的基本 Web 浏览器),后续源包括 <_o3a_p>标签等。
我是否必须为每种类型的 MSO html 标签编写一个剥离器?
So, we have a great application, that is going well, but some of our users like to copy their text to word before pasting into our application. When they do that, the HTML is parsed out somewhat properly, but usually contains tags from outlook or word, that our XHTML engine just doesn't like, or understand.
For example, a user types in a note into Word, has some minor formatting in it, and they past into our HTML editor (it's just a basic webbrowser with designmode turned on), the subsequent source includes <_o3a_p> tags, among others.
Am i going to have to just write a stripper for every type of MSO html tag?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我很幸运地将 WORD 内容粘贴到 Libre Office,然后重新选择文本并将其从 Libre Office 复制到 Web 表单中。
它保留格式和链接,并删除所有 Microsoft 格式代码。
I have had good luck pasting WORD content to Libre Office, and then re-selecting and copying the text out of Libre Office into a web form.
It keeps the formatting, and links, and removes all the Microsoft formatting Code.
作为有时将数据从 Word 复制到 Web 表单的用户(我有时喜欢先进行拼写检查),我发现首先粘贴到记事本中,然后从那里复制并粘贴到 Web 表单中,取得了巨大的成功。
然而,Word有时还是能笑到最后。如果您启用了“智能引号”,它会变成
(
请注意“最佳”一词周围的引号)。
解决这个问题的简单方法是在我开始打字之前关闭智能引号;我还可以使用记事本查找所有“智能引号”符号(“ ” ' ')并将其替换为“正常引号”符号(“ ” ' ')。
As a user that sometimes copies data from Word to a web form (I sometimes like to spellcheck first), I've found great success by first pasting into Notepad, then copying from there and pasting into the web form.
However, Word still sometimes has the last laugh. If you have "smart quotes" enabled, it turns
into
(Note the quotes around the word "best").
The easy way to fix this is to turn off Smart Quotes before I begin to type; I can also use Notepad to find all of the "smart quote" symbols (“ ” ‘ ’) and replace them with "normal quote" symbols (" " ' ').
共识似乎是,虽然一些可用的工具在自动解析 ms 工作标签方面取得了一定的成功,但没有一个工具是 100% 完美的。解析这些标签的方法取决于您使用的框架。
正则表达式可能是一个干净的修复。
有关此主题的更多信息可以在此博客文章中找到,
该文章基本上记录了您似乎遇到的同样的困难。
The consensus seems to be that while some tools available are somewhat successful at auto parsing ms work tags, none are 100% perfect. Methods to parse those tags depend upon what framework you are using.
Regular expression would probably be a clean fix.
Some more information about this topic can be found
on this blog post that basically documents the same struggle you seem to be having.