如何使用 Apache POI 在 Word .docx 文件中正确生成 RSID 属性?

发布于 2024-10-17 02:14:38 字数 1243 浏览 3 评论 0原文

我一直在使用 Apache POI 来操作 Microsoft Word .docx 文件 - 即打开最初在 Microsoft Word 中创建的文档,对其进行修改,然后将其保存到新文档中。

我注意到 Apache POI 创建的新段落缺少修订保存 ID,通常称为 RSIDrsidR。 Word 使用它来识别在一个会话中(例如在保存之间)对文档所做的更改。它是可选的——用户可以根据需要在 Microsoft Word 中将其关闭——但实际上几乎每个人都打开了它,因此几乎每个文档都充满了 RSID。阅读 有关 RSID 的精彩解释,您可以了解更多相关信息。

在 Microsoft Word 文档中,word/document.xml 包含如下段落:

<w:p w:rsidR="007809A1" w:rsidRDefault="007809A1" w:rsidP="00191825">
  <w:r>
    <w:t>Paragraph of text here.</w:t>
  </w:r>
</w:p>

然而,POI 创建的同一段落在 word/document.xml 中将如下所示:

<w:p>
  <w:r>
    <w:t>Paragraph of text here.</w:t>
  </w:r>
</w:p>

I'我发现我可以使用如下代码强制 POI 添加 RSID 到每个段落:

    byte[] rsid = ???;
    XWPFParagraph paragraph = document.createParagraph();
    paragraph.getCTP().setRsidR(rsid);
    paragraph.getCTP().setRsidRDefault(rsid);

但是我不知道应该如何生成 RSID。

POI 有办法生成和/或跟踪 RSID 吗?如果没有,有什么方法可以确保我生成的 RSID 不会与文档中已有的 RSID 冲突?

I have been using Apache POI to manipulate Microsoft Word .docx files — ie open a document that was originally created in Microsoft Word, modify it, save it to a new document.

I notice that new paragraphs created by Apache POI are missing a Revision Save ID, often known as an RSID or rsidR. This is used by Word to identify changes made to a document in one session, say between saves. It is optional — users could turn it off in Microsoft Word if they want — but in reality almost everyone has it on so almost every document is fulls of RSIDs. Read this excellent explanation of RSIDs for more about that.

In a Microsoft Word document, word/document.xml contains paragraphs like this:

<w:p w:rsidR="007809A1" w:rsidRDefault="007809A1" w:rsidP="00191825">
  <w:r>
    <w:t>Paragraph of text here.</w:t>
  </w:r>
</w:p>

However the same paragraph created by POI will look like this in word/document.xml:

<w:p>
  <w:r>
    <w:t>Paragraph of text here.</w:t>
  </w:r>
</w:p>

I've figured out that I can force POI to add an RSID to each paragraph using code like this:

    byte[] rsid = ???;
    XWPFParagraph paragraph = document.createParagraph();
    paragraph.getCTP().setRsidR(rsid);
    paragraph.getCTP().setRsidRDefault(rsid);

However I don't know how I should be generating the RSIDs.

Does POI have a way or generate and/or keep track of RSIDs? If not, is there any way I can ensure that an RSID that I generate doesn't conflict with one that's already in the document?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

谁把谁当真 2024-10-24 02:14:38

看起来有效 rsid 条目的列表保存在 word/settings.xml 的 条目中。 XWPF 应该已经可以让您访问它了。

您可能想要生成一个 8 十六进制数字长的随机数,检查它是否在其中,如果在则重新生成。一旦您有了一个独特的段落,请将其添加到该列表中,然后用它标记您的段落。

我建议您加入 poi 开发列表(邮件列表详细信息),我们可以帮助您解决它的补丁。我认为要做的事情是:

  • 围绕 word/settings.xml 中的 RSids 条目进行包装,让您轻松获取列表并生成一个新的(唯一的)
  • 围绕段落上的不同 RSid 条目进行包装,并运行
  • 方法段落并运行以获取 RSid 包装器、添加新的包装器或清除现有的包装器,

但我们应该将其添加到开发列表中:)

It looks like the list of valid rsid entries is held in word/settings.xml in the <w:rsids> entry. XWPF should be able to give you access to that already.

You'd probably want to generate a 8 hex digit long random number, check if that's in there, and re-generate if it is. Once you have a unique one, add it into that list, then tag your paragraphs with it.

What I'd suggest is that you join the poi dev list (mailing list details), and we can give you a hand on working up a patch for it. I think the things to do are:

  • Wrapper around the RSids entry in word/settings.xml, to let you easily fetch the list and generate a new (unique one)
  • A wrapper around the different RSid entries on a paragraph and a run
  • Methods on paragraphs and runs to get the RSid wrapper, add a new one, or clear the existing one

We should take this to the dev list though :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文