如何使用office interop API枚举word文档?
我想一一遍历word文档的所有元素,并根据元素的类型(标题,句子,表格,图像,文本框,形状等)我想处理该元素。我尝试搜索任何可以代表 Office 互操作 API 中文档元素的枚举器或对象,但未能找到任何枚举器或对象。 API提供句子、段落、形状集合,但不提供可以指向下一个元素的通用对象。 例如:(
<header of document>
<plain text sentences>
<table with many rows,columns>
<text box>
<image>
<footer>
请将其想象为一个Word文档)
所以,现在我想要一些枚举器,它首先给我<文档标题>
,然后在下一次迭代时给我
,然后<具有许多行、列的表格>
等等。 有谁知道我们如何才能实现这一目标?是否可以?
我正在使用 C#、Visual Studio 2005 和 Word 2003。
非常感谢
I want to traverse through all the elements of an word document one by one and according to type of element (header, sentence, table,image,textbox, shape, etc.) I want to process that element. I tried to search any enumerator or object which can represent elements of document in office interop API but failed to find any. API offers sentences, paragraphs, shapes collections but doesnt provide generic object which can point to next element.
For example :
<header of document>
<plain text sentences>
<table with many rows,columns>
<text box>
<image>
<footer>
(Please imagine it as a word document)
So, now I want some enumerator which will first give me <header of document>
, then on next iteration give me <plain text sentences>
, then <table with many rows,columns>
and so on.
Does anyone knows how we can achieve this? Is it possible?
I am using C#, visual studio 2005 and Word 2003.
Thanks a lot
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
例如:
for example:
您没有简单迭代器的原因是 Word 文档可能比问题中概述的简单结构复杂得多。
例如,文档的第一页以及偶数页和奇数页可能有多个页眉和页脚,包含多个具有不同页眉和页脚设置的部分,包含脚注、注释和修订以及表格、文本框等对象、图像和形状可能会与文本内嵌或浮动显示。简而言之,元素没有固定的顺序。
您必须检查输入文档的复杂程度,并根据分析结果决定如何迭代段落以及附加的图像和形状等。
The reason that you don't have a simple iterator is that Word documents can be far more complex than the simple structure outlined in your question.
For example, a document may have multiple headers and footers for the first page as well as even and odd pages, contains more than one section with different header and footer setup, contain footnotes, comments and revisions, and objects such as tables, text boxes, images and shapes may appear inline with text or floating. In short, there is no fix sequence of elements.
You would have to check how complex your input documents are and based on the result of that analysis decide how to iterate over paragraphs and attached images and shapes etc.