在 Ruby 中合并 Word 文档
我有 N 个 Word 文档(Office 2003),我想通过按某种顺序将所有 N 个文档合并在一起来制作一个 Word 文档。我该如何在 Ruby 中做到这一点?谢谢!
这只是在 MS Office 中创建的文档。我不使用 Windows,并且更喜欢非 Windows 解决方案。
编辑:如果文档是 odt 文件而不是 doc 文件,这会很容易吗?
I have N Word documents (Office 2003) from which I want to make a single Word document by merging all the N documents together in some order. How do I go about doing this in Ruby? Thanks!
It's just the documents that are created in MS Office. I do not use Windows and would prefer non-Windows solutions.
EDIT: Will this be easy if the docs are odt files rather than doc files?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
http://rubyonwindows.blogspot 上有一系列关于 word 和 ruby 的非常好的文章。 com/search/label/word。 Word 文件确实很复杂,至少在 2007 年之前是这样,所以你最好让 Word 自动化来完成它。
There is a whole series of really good articles about word and ruby at http://rubyonwindows.blogspot.com/search/label/word. Word files are really complicated, at least before 2007, so you're better off automating word to do it.
据我所知,唯一的非 Windows 解决方案是 POI 中的 Ruby 绑定。之后,代码将与此 .NET 代码非常相似: -VBNET-872.php" rel="nofollow noreferrer">使用 VB.NET 将 Word 文档合并为单个文档的页面。您需要的关键代码是按照您选择的顺序对所需数量的文档使用
Selection.InsertFile
。有关 ODT 文档合并,请参阅此线程:http://cpanforum.com/threads/9938
The only non-Windows solution that I know of is Ruby bindings in POI. After that, the code would be really similar to to this .NET code: Merge Word Documents As Pages Of A Single Document Using VB.NET. The key code you'll want is to use
Selection.InsertFile
for as many doucments as you need in the order you choose.For ODT document merges, see this thread: http://cpanforum.com/threads/9938
理解,这个问题的几乎任何答案都取决于您正在使用的文档文件的约束...
话虽这么说,在我看来,如果您打算这样做,第一个选择是将它们转换为更容易解析的文件格式 - RTF 是一个很好的例子,如果您可以将它们转换为这种格式,O Reilly 的 RTF Pocket Guide 是了解文件结构的绝佳资源。如果您可以在 Linux 计算机上安装 abiword,那么转换文件就非常简单。从命令行,您只需运行:
当然,在 Ruby 中您只需包装这些命令。
合并更为复杂——这取决于您的文件。您必须做出一些程序员决定,决定是否要组合每个单独文档中的样式表、字体表等。内容只是位于该 rtf 文件的中间,但它是所有您必须做出选择的语义和样式数据。这里没有“单一方法”,因为这取决于您想要的另一方。这就是 RTF Pocket Guide 提供很大帮助的地方 - 基本上你会想用它来理解你的 rtf 的结构,并决定你想要什么和不想要什么。
否则,如果您只想要没有任何语义的内容,您可以随时将它们转换为 txt 文件,然后连接它们。该命令非常相似:
这非常简单,它只会拆分文本,您可以将其连接起来并完成它。但同样,您将丢失任何类型的所有格式。
Understand, almost any answer to this question will depend on the constraints of the doc files you are using...
That being said, in my mind the first option if you are going to do this would be to convert them to a more easily parsed format - RTF is a great example, and if you can get them into this format the RTF Pocket Guide from O Reilly is a GREAT resource for understanding the structure of the files. To convert the files is pretty simple if you can install abiword on the Linux machine. From a command line, you'd just run:
Of course, in Ruby you'd just wrap these commands.
It's the merging that is more complicated -- it will depend on your files. You'll have to make some programmer decisions about whether you're going to combine the stylesheets in each individual doc, the font tables, etc, etc, etc. The content just sits in the middle of that rtf file, but it's all the semantic and style data that you'll have to make choices about. There is no 'one way' here, simply because it depends on what you want on the other side. Here is wher ethe RTF Pocket Guide is a great help - basically you'll want to use it to understand the structure of your rtf's, and decide what you do and don't want.
Otherwise, if you just want the content with NONE of the semantics, you could always convert them to txt files, then concat them. The command is very similar:
This is dead simple, it will just split out the text, and you can concat it and be done with it. But again, you'll lose ALL the formatting of any sort.