当前位置：文江博客话题详情

删除 lxml 中的所有命名空间？

发布于 2025-01-06 11:35:15 字数 162 浏览 1 评论 0原文

我正在使用 google 的一些数据 API，使用 python 中的 lxml 库。命名空间在这里是一个很大的麻烦。对于我正在做的很多工作（主要是 xpath 的东西），最好直接忽略它们。

有没有一种简单的方法可以忽略 python/lxml 中的 xml 命名空间？

谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

简美 2025-01-13 11:35:15

如果您想从元素和属性中删除所有名称空间，我建议使用下面所示的代码。

上下文：在我的应用程序中，我获取 SOAP 响应流的 XML 表示形式，但我对在客户端构建对象不感兴趣；我只对 XML 表示本身感兴趣。此外，我对任何名称空间事物都不感兴趣，这只会使事情变得比我的目的所需的更加复杂。因此，我只需从元素中删除名称空间，并删除包含名称空间的所有属性。

def dropns(root):
    for elem in root.iter():
        parts = elem.tag.split(':')
        if len(parts) > 1:
            elem.tag = parts[-1]
        entries = []
        for attrib in elem.attrib:
            if attrib.find(':') > -1:
                entries.append(attrib)
        for entry in entries:
            del elem.attrib[entry]

# Test case
name = '~/tmp/mantisbt/test.xml'
f = open(name, 'rb')
import lxml.etree as etree
parser = etree.XMLParser(ns_clean=True, recover=True)
root = etree.parse(f, parser=parser)
print('=====================================================================')
print etree.tostring(root, pretty_print = True)
print('=====================================================================')
dropns(root)
print etree.tostring(root, pretty_print = True)
print('=====================================================================')

打印：

=====================================================================
<SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
  <SOAP-ENV:Body>
    <ns1:mc_issue_getResponse>
      <return xsi:type="tns:IssueData">
        <id xsi:type="xsd:integer">356</id>
        <view_state xsi:type="tns:ObjectRef">
          <id xsi:type="xsd:integer">10</id>
          <name xsi:type="xsd:string">public</name>
        </view_state>
    </return>
  </ns1:mc_issue_getResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
=====================================================================
<Envelope>
  <Body>
    <mc_issue_getResponse>
      <return>
        <id>356</id>
        <view_state>
          <id>10</id>
          <name>public</name>
        </view_state>
    </return>
  </mc_issue_getResponse>
</Body>
</Envelope>
=====================================================================

If you'd like to remove all namespaces from elements and attributes, I suggest the code shown below.

Context: In my application I'm obtaining XML representations of SOAP response streams, but I'm not interested on building objects on the client side; I'm only interested on XML representations themselves. Moreover, I'm not interested on any namespace thing, which only makes things more complicated than they need to be, for my purposes. So, I simply remove namespaces from elements and I drop all attributes which contain namespaces.

def dropns(root):
    for elem in root.iter():
        parts = elem.tag.split(':')
        if len(parts) > 1:
            elem.tag = parts[-1]
        entries = []
        for attrib in elem.attrib:
            if attrib.find(':') > -1:
                entries.append(attrib)
        for entry in entries:
            del elem.attrib[entry]

# Test case
name = '~/tmp/mantisbt/test.xml'
f = open(name, 'rb')
import lxml.etree as etree
parser = etree.XMLParser(ns_clean=True, recover=True)
root = etree.parse(f, parser=parser)
print('=====================================================================')
print etree.tostring(root, pretty_print = True)
print('=====================================================================')
dropns(root)
print etree.tostring(root, pretty_print = True)
print('=====================================================================')

which prints:

=====================================================================
<SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
  <SOAP-ENV:Body>
    <ns1:mc_issue_getResponse>
      <return xsi:type="tns:IssueData">
        <id xsi:type="xsd:integer">356</id>
        <view_state xsi:type="tns:ObjectRef">
          <id xsi:type="xsd:integer">10</id>
          <name xsi:type="xsd:string">public</name>
        </view_state>
    </return>
  </ns1:mc_issue_getResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
=====================================================================
<Envelope>
  <Body>
    <mc_issue_getResponse>
      <return>
        <id>356</id>
        <view_state>
          <id>10</id>
          <name>public</name>
        </view_state>
    </return>
  </mc_issue_getResponse>
</Body>
</Envelope>
=====================================================================

回复收藏 0 原文

如梦初醒的夏天 2025-01-13 11:35:15

在 lxml 中，如果存在命名空间，则 some_element.tag 是一个类似 {namespace-uri}local-name 的字符串，否则只是 local-name 。请注意，它是非元素节点上的非字符串值（例如注释）。

试试这个：

for node in some_tree.iter():
    startswith = getattr(node 'startswith', None)
    if startswith and startswith('{'):
        node.tag = node.tag.rsplit('}', 1)[-1]

在 Python 2.x 上，标签可以是 ASCII 字节字符串或 Unicode 字符串。 startswith 方法的存在可以测试其中任何一个。

In lxml some_element.tag is a string like {namespace-uri}local-name if there is a namespace, just local-name otherwise. Beware that it is a non string value on non-element nodes (such as comments).

Try this:

for node in some_tree.iter():
    startswith = getattr(node 'startswith', None)
    if startswith and startswith('{'):
        node.tag = node.tag.rsplit('}', 1)[-1]

On Python 2.x the tag can be either an ASCII byte-string or an Unicode string. The existence of a startswith method tests for either.

回复收藏 0 原文

~没有更多了~