如何使用 nokogiri 和 rubyzip 编辑 docx
我使用 rubyzip 和 nokogiri 的组合来编辑 .docx 文件。我使用 rubyzip 解压缩 .docx 文件,然后使用 nokogiri 解析和更改 word/document.xml 文件的正文,但每次我最后关闭 rubyzip 时,它都会损坏文件,我无法打开它或修复它。我在桌面上解压 .docx 文件并检查 word/document.xml 文件,内容已更新为我更改的内容,但所有其他文件都混乱了。有人可以帮我解决这个问题吗?这是我的代码:
require 'rubygems'
require 'zip/zip'
require 'nokogiri'
zip = Zip::ZipFile.open("test.docx")
doc = zip.find_entry("word/document.xml")
xml = Nokogiri::XML.parse(doc.get_input_stream)
wt = xml.root.xpath("//w:t", {"w" => "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}).first
wt.content = "New Text"
zip.get_output_stream("word/document.xml") {|f| f << xml.to_s}
zip.close
I'm using a combination of rubyzip and nokogiri to edit a .docx file. I'm using rubyzip to unzip the .docx file and then using nokogiri to parse and change the body of the word/document.xml file but ever time I close rubyzip at the end it corrupts the file and I can't open it or repair it. I unzip the .docx file on desktop and check the word/document.xml file and the content is updated to what I changed it to but all the other files are messed up. Could someone help me with this issue? Here is my code:
require 'rubygems'
require 'zip/zip'
require 'nokogiri'
zip = Zip::ZipFile.open("test.docx")
doc = zip.find_entry("word/document.xml")
xml = Nokogiri::XML.parse(doc.get_input_stream)
wt = xml.root.xpath("//w:t", {"w" => "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}).first
wt.content = "New Text"
zip.get_output_stream("word/document.xml") {|f| f << xml.to_s}
zip.close
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
昨晚我在 rubyzip 中遇到了同样的腐败问题。我通过将所有内容复制到新的 zip 文件并根据需要替换文件来解决这个问题。
这是我的工作概念证明:
I ran into the same corruption problem with rubyzip last night. I solved it by copying everything to a new zip file, replacing files as necessary.
Here's my working proof of concept:
我偶然发现了这篇文章,对 ruby 或 nokogiri 一无所知,但是......
看来您错误地重新压缩了新内容。
我不知道 rubyzip,但你需要一种方法来告诉它更新条目 word/document.xml
然后重新保存/重新压缩文件。
看起来您只是用新数据覆盖该条目,当然,新数据的大小会有所不同,并且完全搞砸了 zip 文件的其余部分。
我在这篇文章中给出了一个Excel的例子 解析文本文件并创建一个 excel 报告,
即使我使用不同的 zip 库和 VB,它也可能有用(我仍然在做你想做的事情,我的代码大约是一半)
这里是部分适用
I stumbled accross the post and know nothing about ruby or nokogiri but ...
It looks like you are reziping the new content incorrectly.
I don't know about rubyzip, but you need a way to tell it to update the entry word/document.xml
and then resave/rezip the file.
It looks like you are just overwriting the entry with new data wich of course is going to be a different size and totally screw up the rest of the zip file.
I give an example for excel in this post Parse text file and create an excel report
which may be of use even though i am using a different zip library and VB (Im still doing exactly what you are trying to do, my code is about half way down)
here is the part that applies
根据 官方 Github 文档,您应该
使用write_buffer 而不是 open
。链接中还有一个代码示例。According to the official Github documentation, you should
Use write_buffer instead open
. There's also a code example at the link.