使用 ElementTree 保存 XML 文件

发布于 2024-12-28 14:59:16 字数 2093 浏览 3 评论 0原文

我正在尝试开发简单的 Python (3.2) 代码来读取 XML 文件，进行一些更正并将它们存储回来。但是，在存储步骤中，ElementTree 添加了此命名空间术语。例如：

<ns0:trk>
  <ns0:name>ACTIVE LOG</ns0:name>
<ns0:trkseg>
<ns0:trkpt lat="38.5" lon="-120.2">
  <ns0:ele>6.385864</ns0:ele>
  <ns0:time>2011-12-10T17:46:30Z</ns0:time>
</ns0:trkpt>
<ns0:trkpt lat="40.7" lon="-120.95">
  <ns0:ele>5.905273</ns0:ele>
  <ns0:time>2011-12-10T17:46:51Z</ns0:time>
</ns0:trkpt>
<ns0:trkpt lat="43.252" lon="-126.453">
  <ns0:ele>7.347168</ns0:ele>
  <ns0:time>2011-12-10T17:52:28Z</ns0:time>
</ns0:trkpt>
</ns0:trkseg>
</ns0:trk>

代码片段如下：

def parse_gpx_data(gpxdata, tzname=None, npoints=None, filter_window=None,
                   output_file_name=None):
        ET = load_xml_library();

    def find_trksegs_or_route(etree, ns):
        trksegs=etree.findall('.//'+ns+'trkseg')
        if trksegs:
            return trksegs, "trkpt"
        else: # try to display route if track is missing
            rte=etree.findall('.//'+ns+'rte')
            return rte, "rtept"

    # try GPX10 namespace first
    try:
        element = ET.XML(gpxdata)
    except ET.ParseError as v:
        row, column = v.position
        print ("error on row %d, column %d:%d" % row, column, v)

    print ("%s" % ET.tostring(element))
    trksegs,pttag=find_trksegs_or_route(element, GPX10)
    NS=GPX10
    if not trksegs: # try GPX11 namespace otherwise
        trksegs,pttag=find_trksegs_or_route(element, GPX11)
        NS=GPX11
    if not trksegs: # try without any namespace
        trksegs,pttag=find_trksegs_or_route(element, "")
        NS=""

    # Store the results if requested
    if output_file_name:
        ET.register_namespace('', GPX11)
        ET.register_namespace('', GPX10)
        ET.ElementTree(element).write(output_file_name, xml_declaration=True)

    return;

我尝试使用register_namespace，但没有得到积极的结果。此版本的 ElementTree 1.3 有什么具体变化吗？

原文

I'm trying to develop simple Python (3.2) code to read XML files, do some corrections and store them back. However, during the storage step ElementTree adds this namespace nomenclature. For example:

<ns0:trk>
  <ns0:name>ACTIVE LOG</ns0:name>
<ns0:trkseg>
<ns0:trkpt lat="38.5" lon="-120.2">
  <ns0:ele>6.385864</ns0:ele>
  <ns0:time>2011-12-10T17:46:30Z</ns0:time>
</ns0:trkpt>
<ns0:trkpt lat="40.7" lon="-120.95">
  <ns0:ele>5.905273</ns0:ele>
  <ns0:time>2011-12-10T17:46:51Z</ns0:time>
</ns0:trkpt>
<ns0:trkpt lat="43.252" lon="-126.453">
  <ns0:ele>7.347168</ns0:ele>
  <ns0:time>2011-12-10T17:52:28Z</ns0:time>
</ns0:trkpt>
</ns0:trkseg>
</ns0:trk>

The code snippet is below:

def parse_gpx_data(gpxdata, tzname=None, npoints=None, filter_window=None,
                   output_file_name=None):
        ET = load_xml_library();

    def find_trksegs_or_route(etree, ns):
        trksegs=etree.findall('.//'+ns+'trkseg')
        if trksegs:
            return trksegs, "trkpt"
        else: # try to display route if track is missing
            rte=etree.findall('.//'+ns+'rte')
            return rte, "rtept"

    # try GPX10 namespace first
    try:
        element = ET.XML(gpxdata)
    except ET.ParseError as v:
        row, column = v.position
        print ("error on row %d, column %d:%d" % row, column, v)

    print ("%s" % ET.tostring(element))
    trksegs,pttag=find_trksegs_or_route(element, GPX10)
    NS=GPX10
    if not trksegs: # try GPX11 namespace otherwise
        trksegs,pttag=find_trksegs_or_route(element, GPX11)
        NS=GPX11
    if not trksegs: # try without any namespace
        trksegs,pttag=find_trksegs_or_route(element, "")
        NS=""

    # Store the results if requested
    if output_file_name:
        ET.register_namespace('', GPX11)
        ET.register_namespace('', GPX10)
        ET.ElementTree(element).write(output_file_name, xml_declaration=True)

    return;

I have tried using the register_namespace, but with no positive result.
Are there any specific changes for this version of ElementTree 1.3?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦幻之岛 2025-01-04 14:59:16

为了避免 ns0 前缀，应在读取 XML 数据之前设置默认命名空间。

ET.register_namespace('', "http://www.topografix.com/GPX/1/1")
ET.register_namespace('', "http://www.topografix.com/GPX/1/0")

In order to avoid the ns0 prefix the default namespace should be set before reading the XML data.

ET.register_namespace('', "http://www.topografix.com/GPX/1/1")
ET.register_namespace('', "http://www.topografix.com/GPX/1/0")

回复收藏 0 原文

走过海棠暮 2025-01-04 14:59:16

在解析 xml 文件之前，您需要注册所有名称空间。

例如：如果你有这样的输入 xml
功能是元素树的根。

<Capabilities xmlns="http://www.opengis.net/wmts/1.0"
    xmlns:ows="http://www.opengis.net/ows/1.1"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:gml="http://www.opengis.net/gml"
    xsi:schemaLocation="http://www.opengis.net/wmts/1.0 http://schemas.opengis.net/wmts/1.0/wmtsGetCapabilities_response.xsd"
    version="1.0.0">

然后，您必须注册所有名称空间，即使用 xmlns 存在的属性，如下所示：

ET.register_namespace('', "http://www.opengis.net/wmts/1.0")
ET.register_namespace('ows', "http://www.opengis.net/ows/1.1")
ET.register_namespace('xlink', "http://www.w3.org/1999/xlink")
ET.register_namespace('xsi', "http://www.w3.org/2001/XMLSchema-instance")
ET.register_namespace('gml', "http://www.opengis.net/gml")

You need to register all your namespaces before you parse xml file.

For example: If you have your input xml like this
and Capabilities is the root of your Element tree.

<Capabilities xmlns="http://www.opengis.net/wmts/1.0"
    xmlns:ows="http://www.opengis.net/ows/1.1"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:gml="http://www.opengis.net/gml"
    xsi:schemaLocation="http://www.opengis.net/wmts/1.0 http://schemas.opengis.net/wmts/1.0/wmtsGetCapabilities_response.xsd"
    version="1.0.0">

Then you have to register all the namespaces i.e attributes present with xmlns like this:

ET.register_namespace('', "http://www.opengis.net/wmts/1.0")
ET.register_namespace('ows', "http://www.opengis.net/ows/1.1")
ET.register_namespace('xlink', "http://www.w3.org/1999/xlink")
ET.register_namespace('xsi', "http://www.w3.org/2001/XMLSchema-instance")
ET.register_namespace('gml', "http://www.opengis.net/gml")

回复收藏 0 原文

皇甫轩 2025-01-04 14:59:16

如果您尝试打印根目录，您将看到如下内容：
http://www.host.domain/path/to/your/xml/namespace}RootTag' at 0x0000000000558DB8>

因此，为了避免 ns0 前缀，您必须在解析 XML 数据之前更改默认命名空间，如下所示：

ET.register_namespace('', "http://www.host.domain/path/to/your/xml/namespace")

If you try to print the root, you will see something like this:
http://www.host.domain/path/to/your/xml/namespace}RootTag' at 0x0000000000558DB8>

So, to avoid the ns0 prefix, you have to change the default namespace before parsing the XML data as below:

ET.register_namespace('', "http://www.host.domain/path/to/your/xml/namespace")

回复收藏 0 原文

腻橙味 2025-01-04 14:59:16

看来您必须声明您的名称空间，这意味着您需要将 xml 的第一行从：更改

<ns0:trk>

为：

<ns0:trk xmlns:ns0="uri:">

一旦这样做，您将不再得到 ParseError: for unbound prefix: ..., and:

elem.tag = elem.tag[(len('{uri:}'):]

将删除命名空间。

It seems that you have to declare your namespace, meaning that you need to change the first line of your xml from:

<ns0:trk>

to something like:

<ns0:trk xmlns:ns0="uri:">

Once did that you will no longer get ParseError: for unbound prefix: ..., and:

elem.tag = elem.tag[(len('{uri:}'):]

will remove the namespace.

回复收藏 0 原文

烟酉 2025-01-04 14:59:16

或者你可以用正则表达式将其去掉：

def remove_xml_namespace(xml_str: str) -> str:
    xml_str = re.sub(r"<([^:]+):(\w+).+(?=xmlns)[^>]+>([\s\S]*)</(\1):(\2)>", r"\3", xml_str)
    # replace namespace elements from end tag
    xml_str = re.sub(r"</[^:]*:", r"</", xml_str)
    # replace namespace from start tags
    xml_str = re.sub(r"<[^/][^:]*:([^/>]*)(/?)>", r"<\1\2>", xml_str)
    return xml_str

Or you could regex it away:

def remove_xml_namespace(xml_str: str) -> str:
    xml_str = re.sub(r"<([^:]+):(\w+).+(?=xmlns)[^>]+>([\s\S]*)</(\1):(\2)>", r"\3", xml_str)
    # replace namespace elements from end tag
    xml_str = re.sub(r"</[^:]*:", r"</", xml_str)
    # replace namespace from start tags
    xml_str = re.sub(r"<[^/][^:]*:([^/>]*)(/?)>", r"<\1\2>", xml_str)
    return xml_str

回复收藏 0 原文

~没有更多了~