如何使用 Ruby WIN32OLE 访问 Word 文档中的 TextBox 对象文本
我只是为一个用户团队编写了一个小脚本,该脚本收集目录中的所有 PDF 和 DOC* 文件并解析它们的超链接。 PDF 部分按预期工作,但是为设计提供的 Word 文档(纯文本)与他们使用的实际 Word 文档(文本位于 TextBox 元素中)之间存在差异。
我注意到,当我尝试从这些新文件中收集句子/单词时,我收到的只是文件背景图像的文本(通常是特殊字符)。
我浏览了 API 并尝试了 ole_methods 中列出的相当多的方法,但尚未找到访问 TextBox 以从中提取所需文本的方法。
我知道我可以将 Word 文件转换为 PDF 并以这种方式快捷方式(经过测试和验证),但这需要大量的文件管理,我希望避免使用更简单的解决方案:访问文本。
您可以使用“绘制文本框”功能 (Word 2007+) 复制文档中的元素。
有谁知道如何访问此元素,或者更好地找到文档中的所有文本,无论它位于哪个元素中?
require 'win32ole'
word = WIN32OLE.new('Word.Application')
doc = word.Documents.Open(file)
doc.Sentences.each { |x| puts x.text }
- 亚当
I just put together a small script for a team of users that collects all PDF and DOC* files in a directory and parses them for hyperlinks. The PDF section works as intended, however a difference between the Word doc I was given for design (plain text) differs from the actual Word documents that they are using (text is in a TextBox element).
I noticed that when I tried to gather sentences/words from these new files, all I received was the text for the background image of the file (normally a special character).
I have browsed through the API and tried quite a few methods listed in ole_methods, but have not yet found a way to access the TextBox to pull the required text out of it.
I know that I can convert the Word files to PDF and shortcut it that way (tested and proven), but that entails quite a bit of file management that I'd like to avoid in lieu of the simpler solution: access the text.
You can replicate the element in a document using the Draw Text Box function (Word 2007+).
Does anyone know how to access this element, or better yet find ALL text in the document regardless of what element it is located in?
require 'win32ole'
word = WIN32OLE.new('Word.Application')
doc = word.Documents.Open(file)
doc.Sentences.each { |x| puts x.text }
- Adam
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设相当于 doc.Sentences.each { |x| put x.text } 但对于文本框就足够了,那么这应该对你有用:
它看起来比你浏览句子的方式要混乱一些,但是
x.TextFrame.TextRange.text 将返回文本框中包含的实际文本。
Assuming that something equivalent to
doc.Sentences.each { |x| puts x.text }
but for textboxes will suffice, then this should work for you:It looks quite a bit messier than how you went through the sentences, but the
x.TextFrame.TextRange.text
will return the actual text contained in the text boxes.