区分Word文档中的目录

发布于 2024-11-18 23:19:51 字数 366 浏览 5 评论 0原文

有谁知道如何以编程方式迭代Word文档时，您可以判断一个段落是否构成目录的一部分（或者实际上，构成字段一部分的任何其他内容）。

我问这个问题的原因是，我有一个 VB 程序，该程序应该从文档中提取实质性文本的前几段 - 它是通过迭代 Word.Paragraphs 集合来实现的。我不希望结果包含目录或其他字段，我只想要人类会识别为标题、标题或普通文本段落的内容。然而，事实证明，如果有目录，那么不仅目录本身，而且目录中的每一行都在 Word.Paragraphs 中显示为单独的项目。我不想要这些，但无法在 Paragraph 对象上找到任何可以让我区分的属性，因此忽略它们（我猜我也需要将解决方案应用于其他字段类型，例如 table of数字和权威表，我还没有实际遇到过，但我想可能会导致同样的问题）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

つ低調成傷 2024-11-25 23:19:51

由于 Word 对象模型的限制，我认为实现此目的的最佳方法是暂时删除 TOC 字段代码，迭代 Word 文档，然后重新插入 TOC。在 VBA 中，它看起来像这样：

Dim doc As Document
Dim fld As Field
Dim rng As Range

Set doc = ActiveDocument

For Each fld In doc.Fields
    If fld.Type = wdFieldTOC Then
        fld.Select
        Selection.Collapse
        Set rng = Selection.Range 'capture place to re-insert TOC later
        fld.Cut
    End If
Next

迭代代码以提取段落，然后

Selection.Range = rng
Selection.Paste

如果您在 .NET 中编码，这应该非常接近地翻译。此外，这应该适用于 Word 2003 及更早版本，但对于 Word 2007/2010，目录（根据其创建方式）有时会在其周围有一个类似内容控制的区域，可能需要您编写额外的检测和删除代码。

Because of the limitations in the Word object model I think the best way to achieve this would be to temporarily remove the TOC field code, iterate through the Word document, and then re-insert the TOC. In VBA, it would look like this:

Dim doc As Document
Dim fld As Field
Dim rng As Range

Set doc = ActiveDocument

For Each fld In doc.Fields
    If fld.Type = wdFieldTOC Then
        fld.Select
        Selection.Collapse
        Set rng = Selection.Range 'capture place to re-insert TOC later
        fld.Cut
    End If
Next

Iterate through the code to extract paragraphs and then

Selection.Range = rng
Selection.Paste

If you are coding in .NET this should translate pretty closely. Also, this should work for Word 2003 and earlier as is, but for Word 2007/2010 the TOC, depending on how it is created, sometimes has a Content Control-like region surrounding it that may require you to write additional detect and remove code.

回复收藏 0 原文

孤寂小茶 2024-11-25 23:19:51

不能保证这一点，但如果目录使用标准 Word 样式（极有可能），并且没有人添加自己的带有“TOC”前缀的样式，那么就可以了。这是一个粗略的方法，但是可行。

Dim parCurrentParagraph As Paragraph

If Left(parCurrentParagraph.Format.Style.NameLocal, 3) = "TOC" Then

       '    Do something 

End If

This is not guaranteed, but if the standard Word styles are being used for the TOC (highly likely), and if no one has added their own style prefixed with "TOC", then it is OK. This is a crude approach, but workable.

Dim parCurrentParagraph As Paragraph

If Left(parCurrentParagraph.Format.Style.NameLocal, 3) = "TOC" Then

       '    Do something 

End If

回复收藏 0 原文

像极了他 2024-11-25 23:19:51

您可以做的是为文档的每个部分创建自定义样式。

Word 2003 中的自定义样式（不确定您使用的是哪个版本的 Word）

然后，在迭代时通过您的段落集合，您可以检查 .Style 属性，如果它等于您的 TOCStyle，则可以安全地忽略它。

我相信同样的技术也适用于表格。

回复收藏 0 原文

暖心男生 2024-11-25 23:19:51

以下函数将返回一个在任何目录或图表之后开始的 Range 对象。然后，您可以使用返回的 Range 的 Paragraphs 属性：

Private Function GetMainTextRange() As Range
Dim toc As TableOfContents
Dim tof As TableOfFigures
Dim mainTextStart As Long

mainTextStart = 1
For Each toc In ActiveDocument.TablesOfContents
    If toc.Range.End > mainTextStart Then
        mainTextStart = toc.Range.End + 1
    End If
Next
For Each tof In ActiveDocument.TablesOfFigures
    If tof.Range.End > mainTextStart Then
        mainTextStart = tof.Range.End + 1
    End If
Next

Set GetMainTextRange = ActiveDocument.Range(mainTextStart, ActiveDocument.Range.End)
End Function

The following Function will return a Range object that begins after any Table of Contents or Table of Figures. You can then use the Paragraphs property of the returned Range:

Private Function GetMainTextRange() As Range
Dim toc As TableOfContents
Dim tof As TableOfFigures
Dim mainTextStart As Long

mainTextStart = 1
For Each toc In ActiveDocument.TablesOfContents
    If toc.Range.End > mainTextStart Then
        mainTextStart = toc.Range.End + 1
    End If
Next
For Each tof In ActiveDocument.TablesOfFigures
    If tof.Range.End > mainTextStart Then
        mainTextStart = tof.Range.End + 1
    End If
Next

Set GetMainTextRange = ActiveDocument.Range(mainTextStart, ActiveDocument.Range.End)
End Function

回复收藏 0 原文

~没有更多了~