如何使用 Apache POI 在 Word .docx 文件中正确生成 RSID 属性?
我一直在使用 Apache POI 来操作 Microsoft Word .docx 文件 - 即打开最初在 Microsoft Word 中创建的文档,对其进行修改,然后将其保存到新文档中。
我注意到 Apache POI 创建的新段落缺少修订保存 ID,通常称为 RSID 或 rsidR。 Word 使用它来识别在一个会话中(例如在保存之间)对文档所做的更改。它是可选的——用户可以根据需要在 Microsoft Word 中将其关闭——但实际上几乎每个人都打开了它,因此几乎每个文档都充满了 RSID。阅读 有关 RSID 的精彩解释,您可以了解更多相关信息。
在 Microsoft Word 文档中,word/document.xml
包含如下段落:
<w:p w:rsidR="007809A1" w:rsidRDefault="007809A1" w:rsidP="00191825">
<w:r>
<w:t>Paragraph of text here.</w:t>
</w:r>
</w:p>
然而,POI 创建的同一段落在 word/document.xml
中将如下所示:
<w:p>
<w:r>
<w:t>Paragraph of text here.</w:t>
</w:r>
</w:p>
I'我发现我可以使用如下代码强制 POI 添加 RSID 到每个段落:
byte[] rsid = ???;
XWPFParagraph paragraph = document.createParagraph();
paragraph.getCTP().setRsidR(rsid);
paragraph.getCTP().setRsidRDefault(rsid);
但是我不知道应该如何生成 RSID。
POI 有办法生成和/或跟踪 RSID 吗?如果没有,有什么方法可以确保我生成的 RSID 不会与文档中已有的 RSID 冲突?
I have been using Apache POI to manipulate Microsoft Word .docx files — ie open a document that was originally created in Microsoft Word, modify it, save it to a new document.
I notice that new paragraphs created by Apache POI are missing a Revision Save ID, often known as an RSID or rsidR. This is used by Word to identify changes made to a document in one session, say between saves. It is optional — users could turn it off in Microsoft Word if they want — but in reality almost everyone has it on so almost every document is fulls of RSIDs. Read this excellent explanation of RSIDs for more about that.
In a Microsoft Word document, word/document.xml
contains paragraphs like this:
<w:p w:rsidR="007809A1" w:rsidRDefault="007809A1" w:rsidP="00191825">
<w:r>
<w:t>Paragraph of text here.</w:t>
</w:r>
</w:p>
However the same paragraph created by POI will look like this in word/document.xml
:
<w:p>
<w:r>
<w:t>Paragraph of text here.</w:t>
</w:r>
</w:p>
I've figured out that I can force POI to add an RSID to each paragraph using code like this:
byte[] rsid = ???;
XWPFParagraph paragraph = document.createParagraph();
paragraph.getCTP().setRsidR(rsid);
paragraph.getCTP().setRsidRDefault(rsid);
However I don't know how I should be generating the RSIDs.
Does POI have a way or generate and/or keep track of RSIDs? If not, is there any way I can ensure that an RSID that I generate doesn't conflict with one that's already in the document?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来有效 rsid 条目的列表保存在 word/settings.xml 的
条目中。 XWPF 应该已经可以让您访问它了。您可能想要生成一个 8 十六进制数字长的随机数,检查它是否在其中,如果在则重新生成。一旦您有了一个独特的段落,请将其添加到该列表中,然后用它标记您的段落。
我建议您加入 poi 开发列表(邮件列表详细信息),我们可以帮助您解决它的补丁。我认为要做的事情是:
但我们应该将其添加到开发列表中:)
It looks like the list of valid rsid entries is held in word/settings.xml in the
<w:rsids>
entry. XWPF should be able to give you access to that already.You'd probably want to generate a 8 hex digit long random number, check if that's in there, and re-generate if it is. Once you have a unique one, add it into that list, then tag your paragraphs with it.
What I'd suggest is that you join the poi dev list (mailing list details), and we can give you a hand on working up a patch for it. I think the things to do are:
We should take this to the dev list though :)