分析 Word 文档中的文本 - 如何让它忽略书签?
我有一个 VSTO 插件,能够与文档正文中的特定代码进行匹配。代码本身只是我在语法上匹配以进行验证的字符串。
我使用 StoryRange 进行的解析工作正常,但当然,我会遇到罕见的异常,即用户在其文档中执行一些奇怪的操作。我注意到一些用户在代码字符串的中间引入了书签,这导致我的验证匹配失败。当您显示 Office 2007 中的隐藏格式时,您将看到类似“34-RID-345”的内容,而不是代码“34-RD-345”。书签格式看起来类似于大写的 i (I),我可以使用功能区中的书签选项看到存在书签。
有谁知道我在扫描文本时如何忽略书签?
也许更好的选择是将我的解析限制为 [aZ][0-9]。这样的事情可能吗?
I have a VSTO add-in that is able to match against specific codes in the body of a document. The codes themselves are just strings that I syntactically match for validation.
My parsing using StoryRange works fine, but of course, I get the rare exception where a user is doing something funky in their document. I've noticed that some users are introducing bookmarks into the middle of the code string and this throws off my validation match. Instead of of code being '34-RD-345', when you reveal the hidden formatting in Office 2007, you will see something like '34-RID-345'. The bookmark formatting looks similiar to an uppercase i (I) and I can see that a bookmark is present using the bookmark option in the ribbon.
Does anyone know how I might be able to ignore the bookmark when I'm scanning the text?
Maybe an even better alternative maybe to just confine my parsing to [a-Z][0-9]. Is something like that possible?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以获取所有书签,然后将它们全部删除,然后再次解析文档。
You can get all bookmarks, then delete them all, then parse the document again.