标签: 在 lxml 中的名称中

发布于 2024-10-19 14:23:06 字数 1272 浏览 5 评论 0原文

我正在尝试使用 lxml.etree 来解析 Wordpress 导出文档(它是 XML,有点像 RSS)。我只对已发布的帖子感兴趣,因此我使用以下内容循环浏览已发布的帖子:

for item in data.findall("item"):
    if item.find("wp:post_type").text != "post":
        continue
    if item.find("wp:status").text != "publish":
        continue
    write_post(item)

其中 data 是在其中找到所有 item 标签的标签。< code>item 标签包含帖子、页面和草稿。我的问题是 lxml 无法找到名称中包含 : 的标签(例如 wp:post_type)。当我尝试 item.find("wp:post_type") 时,出现此错误:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "lxml.etree.pyx", line 1279, in lxml.etree._Element.find (src/lxml/lxml.e
tree.c:38124)
  File "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 210, in f
ind
    it = iterfind(elem, path)
  File "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 200, in i
terfind
    selector = _build_path_iterator(path)
  File "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 184, in _
build_path_iterator
    selector.append(ops[token[0]](_next, token))
KeyError: ':'

我假设 KeyError : ':' 指的是标签名称中的冒号无效。有什么方法可以转义冒号以便 lxml 找到正确的标签吗? : 在这种情况下有什么特殊含义吗?或者我做错了什么?任何帮助将不胜感激。

I'm trying to use lxml.etree to parse a Wordpress export document (it's XML, somewhat RSS like). I'm only interested in published posts, so I'm using the following to loop through published posts:

for item in data.findall("item"):
    if item.find("wp:post_type").text != "post":
        continue
    if item.find("wp:status").text != "publish":
        continue
    write_post(item)

where data is the tag that all item tags are found in. item tags contain posts, pages, and drafts. My problem is that lxml can't find tags that have a : in their name (e.g. wp:post_type). When I try item.find("wp:post_type") I get this error:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "lxml.etree.pyx", line 1279, in lxml.etree._Element.find (src/lxml/lxml.e
tree.c:38124)
  File "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 210, in f
ind
    it = iterfind(elem, path)
  File "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 200, in i
terfind
    selector = _build_path_iterator(path)
  File "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 184, in _
build_path_iterator
    selector.append(ops[token[0]](_next, token))
KeyError: ':'

I assume the KeyError : ':' refers to the colon in the name of the tag being invalid. Is there some way I can escape the colon so that lxml finds the right tag? Does : have some special meaning in this context? Or am I doing something wrong? Any help would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

檐上三寸雪 2024-10-26 14:23:06

: 是 XML 命名空间分隔符。要转义 lxml 中的冒号,您需要将其替换为大括号内的命名空间 URL,如 item.find("{http://example.org/}status").text 中所示。

The : is an XML namespace separator. To escape the colon in lxml, you need to replace it with the namespace URL within curly braces, as in item.find("{http://example.org/}status").text.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文