使用 OpenOffice API 抓取整个文档树

发布于 2024-07-27 19:56:55 字数 1916 浏览 6 评论 0原文

我想在 Writer 文档中获取整个树http://en.wikipedia.org/wiki/OpenOffice.org" rel="nofollow noreferrer">OpenOffice 3.1。我需要收集树中所有元素的数据，而不仅仅是 Text 元素。

通过加载 XTextDocument 并执行 getText() 将给出 XText 元素。更具体地说，使用来自 XText 的 XEnumerationAccess 只会迭代 TextRange。

来自 OpenOffice 文档 /DevGuide/Text/Iterating_over_Text：

com.sun.star.text.Text的第二个接口是XEnumerationAccess。文本服务枚举文本中的所有段落并返回支持com.sun.star.text.Paragraph 的对象。这包括表格，因为作者将表格视为支持 com.sun.star.text.TextTable 服务的专用段落。

这里有一些附加文档：

段落的文本部分枚举不提供属于该段落的内容，但不与文本流融合在一起。这些可以是文本框架、图形对象、嵌入对象或锚定在段落、字符或字符上的绘图形状。 TextPortionType“TextContent”指示是否存在锚定在字符处或作为字符的内容。如果您有 TextContent 部分类型，您就知道存在锚定在字符处或作为字符的形状对象。

我的测试文档表明我确实得到了 XTextContent 和 XTextRange 可以通过getAnchor()收集。但我如何确定我正在收集的内容类型？唯一的方法是getString()。如果对象是嵌入图像，我如何收集其数据？

我正在使用 C++，但我相信 Java 中的解决方案是可移植的。

从答案迁移

由于格式不当，此评论作为答案发布。

感谢您的答复。

我打算使用 API。

我正在尝试从文档中收集 GrahicObjects 的示例。通过使用 XGraphicObjectsSupplier，我可以通过 getGraphicObjects() 获取集合。集合中的对象是 Any，通过 getValueTypeName() 打印类型会得到 XTextContent。

API 描述该集合包含一个 TextGraphicObject“服务”。我如何获取它的实例？

原文

I would like to grab the entire tree for a Writer document in OpenOffice 3.1. I need to collect data on all the elements in the tree, not only the Text elements.

By loading the XTextDocument and doing getText() will give the XText element. More specifically, using an XEnumerationAccess from the XText will only iterate over the TextRange.

From the OpenOffice documentation /DevGuide/Text/Iterating_over_Text:

The second interface of com.sun.star.text.Text is XEnumerationAccess. A Text service enumerates all paragraphs in a text and returns objects which support com.sun.star.text.Paragraph. This includes tables, because writer sees tables as specialized paragraphs that support the com.sun.star.text.TextTable service.

Some additional documentation here:

The text portion enumeration of a paragraph does not supply contents which do belong to the paragraph, but do not fuse together with the text flow. These could be text frames, graphic objects, embedded objects or drawing shapes anchored at the paragraph, characters or as character. The TextPortionType "TextContent" indicate if there is a content anchored at a character or as a character. If you have a TextContent portion type, you know that there are shape objects anchored at a character or as a character.

My test documents indicate that I do get a XTextContent and the XTextRange can be collected via getAnchor(). But how can I determine the type of content that I am collecting? The only method is getString(). If the object was an embedded image, how do I collect its data?

I am using C++ but I believe a solution in Java would be portable.

Migrated From Answer

Due to poor formatting, this comment is posted as an answer.

Thanks for your response.

I intend to use the API.

I am trying the example of collecting GrahicObjects from the document. By using a XGraphicObjectsSupplier I can get a collection via getGraphicObjects(). The object from the collection is Any and printing the type via getValueTypeName() gives XTextContent.

The API describes that the collection holds a TextGraphicObject "service". How do I grab an instance of it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

呆° 2024-08-03 19:56:55

你的问题的答案会很复杂，但我会尽力让自己可以理解。

将文档导出为 XML 将
使用 SAX 更容易处理。如果
使用 XML 方式，您必须
实现XDocumentHandler并读取
文档（可选过滤内容）
你不需要）。其余的工作要么是 XSLT 转换，要么是大文档的 SAX。
如果您更喜欢仅使用 API，
你必须经常玩
XServiceInfo 和 UnoRuntime.queryInterface

回复收藏 0 原文

眸中客 2024-08-03 19:56:55

在java中：

XComponentContext xContext = Bootstrap.bootstrap();
XMultiComponentFactory xMCF = xContext.getServiceManager();
Object oDesktop = xMCF.createInstanceWithContext("com.sun.star.frame.Desktop", xContext);
XDesktop xDesktop = UnoRuntime.queryInterface(XDesktop.class, oDesktop);
XComponentLoader xCompLoader = UnoRuntime.queryInterface(XComponentLoader.class, xDesktop);
XComponent xComp = xCompLoader.loadComponentFromURL("file:///C:/test.odt", "_blank", 0, new Boolean(true));
XTextDocument xDoc = UnoRuntime.queryInterface(XTextDocument.class, xComp);
XModel xModel =UnoRuntime.queryInterface( XModel.class, xDoc );
XDrawPageSupplier xDPS = UnoRuntime.queryInterface(XDrawPageSupplier.class, xModel);
XDrawPage xDrawPage = xDPS.getDrawPage();
XShapes xShapes = UnoRuntime.queryInterface( XShapes.class, xDrawPage );
for (int s=0;s<xDrawPage.getCount();s++) {
   XShape xShape = UnoRuntime.queryInterface( XShape.class, xShapes.getByIndex(s) );
   System.out.println(" -- sh.getShapeType: " + xShape.getShapeType());
   System.out.println(" -- sh.getPosition: " + xShape.getPosition().X + "x" + xShape.getPosition().Y);
   System.out.println(" -- sh.getSize: " + xShape.getSize().Width + "x" + xShape.getSize().Height);
}

in java:

XComponentContext xContext = Bootstrap.bootstrap();
XMultiComponentFactory xMCF = xContext.getServiceManager();
Object oDesktop = xMCF.createInstanceWithContext("com.sun.star.frame.Desktop", xContext);
XDesktop xDesktop = UnoRuntime.queryInterface(XDesktop.class, oDesktop);
XComponentLoader xCompLoader = UnoRuntime.queryInterface(XComponentLoader.class, xDesktop);
XComponent xComp = xCompLoader.loadComponentFromURL("file:///C:/test.odt", "_blank", 0, new Boolean(true));
XTextDocument xDoc = UnoRuntime.queryInterface(XTextDocument.class, xComp);
XModel xModel =UnoRuntime.queryInterface( XModel.class, xDoc );
XDrawPageSupplier xDPS = UnoRuntime.queryInterface(XDrawPageSupplier.class, xModel);
XDrawPage xDrawPage = xDPS.getDrawPage();
XShapes xShapes = UnoRuntime.queryInterface( XShapes.class, xDrawPage );
for (int s=0;s<xDrawPage.getCount();s++) {
   XShape xShape = UnoRuntime.queryInterface( XShape.class, xShapes.getByIndex(s) );
   System.out.println(" -- sh.getShapeType: " + xShape.getShapeType());
   System.out.println(" -- sh.getPosition: " + xShape.getPosition().X + "x" + xShape.getPosition().Y);
   System.out.println(" -- sh.getSize: " + xShape.getSize().Width + "x" + xShape.getSize().Height);
}

回复收藏 0 原文

~没有更多了~