通过 :ref:? 从 ReST 文档中提取文本块

发布于 2024-12-19 16:50:57 字数 367 浏览 4 评论 0原文

我有一些 reStructuredText 文档。我想在在线帮助中使用其中的片段。似乎一种方法是通过引用“剪掉”标记片段，例如

.. _my_boring_section:

Introductory prose
------------------

blah blah blah

.. _my_interesting_section:

About this dialog
-----------------

talk about stuff which is relevant in contextual help

我如何使用 python/docutils/sphinx 来提取 _my_interesting_section 标记的标记？

原文

I have some reStructuredText documentation. I would like to use snippets from it in online help. It seems like one approach would be to 'snip' out pieces of markup by reference, e.g.

.. _my_boring_section:

Introductory prose
------------------

blah blah blah

.. _my_interesting_section:

About this dialog
-----------------

talk about stuff which is relevant in contextual help

How could I use python/docutils/sphinx to extract the markup for the _my_interesting_section marker?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

飘逸的'云 2024-12-26 16:50:57

我不知道除了子类化和自定义 Docutils 解析器之外，你还能如何做到这一点。如果您只需要 reStructuredText 的相关部分并且不介意丢失一些标记，那么您可以尝试使用以下内容。另外，特定部分的已处理标记（即转换为 HTML 或 LaTeX 的 reStructuredText）也很容易获得。有关提取部分已处理 XML 的示例，请参阅我对此问题的回答。让我知道这是否是您想要的。无论如何，这里是...

您可以使用 Docutils 非常轻松地操作 reStructuredText。首先，您可以使用 Docutils publish_doctree 函数发布 reStructuredText 的 Docutils 文档树 (doctree) 表示形式。可以轻松地遍历该文档树并搜索特定文档元素，即具有特定属性的节。搜索特定部分引用的最简单方法是检查文档树本身的 ids 属性。 doctree.ids 只是一个字典，其中包含所有引用到文档适当部分的映射。

from docutils.core import publish_doctree

s = """.. _my_boring_section:

Introductory prose
------------------

blah blah blah

.. _my_interesting_section:

About this dialog
-----------------

talk about stuff which is relevant in contextual help
"""

# Parse the above string to a Docutils document tree:
doctree = publish_doctree(s)

# Get element in the document with the reference id `my-interesting-section`:
ids = 'my-interesting-section'

try:
    section = doctree.ids[ids]
except KeyError:
    # Do some exception handling here...
    raise KeyError('No section with ids {0}'.format(ids))

# Can also make sure that the element we found was in fact a section:
import docutils.nodes
isinstance(section, docutils.nodes.section) # Should be True

# Finally, get section text
section.astext()

# This will print:
# u'About this dialog\n\ntalk about stuff which is relevant in contextual help'

现在标记已经丢失。如果注释太花哨，可以很容易地在上面结果的第一行下插入一些破折号以返回到您的部分标题。我不确定对于更复杂的内联标记您需要做什么。希望以上内容对您来说是一个良好的起点。

注意：查询 doctree.ids 时，我传递的 ids 属性与 reStructuredText 中的定义略有不同：前导下划线已被删除，并且所有其他下划线均已替换为 -。这就是 Docutils 标准化引用的方式。编写一个函数将 reStructuredText 引用转换为 Docutils 的内部表示非常简单。否则，我确信如果您深入研究 Docuitls，您可以找到执行此操作的例程。

I'm not sure how you could do this other than subclassing and customising the Docutils parser. If you just need the relevant section of reStructuredText and don't mind losing some of the markup then you can try and use the following. Alternatively, the processed markup (i.e. reStructuredText converted to HTML or LaTeX) for a particular section is very easy to get. See my answer to this question for an example of extracting a part of the processed XML. Let me know if this is what you want. Anyway, here goes...

You can manipulate reStructuredText very easily using Docutils. First you could publish the Docutils document tree (doctree) representation of the reStructuredText using the Docutils publish_doctree function. This doctree can be traversed easily and searched for particular document elements, i.e. sections, with particular attributes. The easiest way to search for particular section reference is to inspect the ids attribute of the doctree itself. doctree.ids is simply a dictionary containing a mapping of all references to the appropriate part of the document.

from docutils.core import publish_doctree

s = """.. _my_boring_section:

Introductory prose
------------------

blah blah blah

.. _my_interesting_section:

About this dialog
-----------------

talk about stuff which is relevant in contextual help
"""

# Parse the above string to a Docutils document tree:
doctree = publish_doctree(s)

# Get element in the document with the reference id `my-interesting-section`:
ids = 'my-interesting-section'

try:
    section = doctree.ids[ids]
except KeyError:
    # Do some exception handling here...
    raise KeyError('No section with ids {0}'.format(ids))

# Can also make sure that the element we found was in fact a section:
import docutils.nodes
isinstance(section, docutils.nodes.section) # Should be True

# Finally, get section text
section.astext()

# This will print:
# u'About this dialog\n\ntalk about stuff which is relevant in contextual help'

Now the markup has been lost. If there is noting too fancy, it would be easy to insert some dashes under the first line of the result above to get back to your section heading. I'm not sure what you would need to do for more complicated inline markup. Hopefully the above is a good starting point for you though.

Note: When querying doctree.ids the ids attribute I pass is slightly different to the definition in the reStructuredText: the leading underscore has been removed and all other underscores have been replaced by -s. This is how Docutils normalises references. It would be really straightforward to write a function to convert reStructuredText references to Docutils' internal representation. Otherwise, I'm sure if you dig through Docuitls you can find the routine that does this.

回复收藏 0 原文

~没有更多了~