通过指定存在多个儿童的名称来解析XML

发布于 2025-01-22 14:41:07 字数 1590 浏览 0 评论 0原文

我在推断相似的外推时遇到了一些困难,因此将有多个名称不同的孩子的XML线程。例如,这是我正在使用的文件的子集:

<?xml version="1.0" encoding="UTF-8"?>
<SDDXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <RulesetFilename file="T24N_2022.bin"/>
  <Model Name="Proposed">
    ...
  <Model Name="Standard">
    <Proj>
      <Name>project0001</Name>
      <DevMode>1</DevMode>
      <BldgEngyModelVersion>16</BldgEngyModelVersion>
      <AnalysisVersion>220070</AnalysisVersion>
      <CreateDate>1650049043</CreateDate>
      <EnergyUse>
        ..
      <EnergyUse>
        <Name>Efficiency Compliance</Name>
        <EnduseName>Efficiency Compliance</EnduseName>
        <ProposedTDV index="0">270.095</ProposedTDV>
        <StandardTDV index="0">99.089</StandardTDV>
...

我正在尝试以“提议tdv'= 270.095”的值。我已经尝试了Beautifulsoup和ElementTree,但是我只是难以找到语法来指定孩子的名字。 IE因为我无法使用类似的搜索字符串:

Model/Proj/EnergyUse/ProposedTDV

我正在尝试找到更多类似的东西:

Model[Name="Standard"]/Proj/EnergyUse[Name='Efficiency Compliance']/ProposedTDV

或类似的beauftifulsoup(或任何其他XML解析器)。

例如,我尝试过类似的事情

from bs4 import BeautifulSoup
result = open(--xml_file_path--,'r')
contents = result.read()
soup = BeautifulSoup(contents,'xml')
test = soup.Model[Name="Proposed"].Proj.EnergyUse[Name='Efficiency Compliance'].findAll("ProposedTDV")

,但我知道那里的语法是错误的。

I'm having some trouble extrapolating similar SO threads to a larger XML where there are multiple children with different names. For example, here is a subset of a file I'm working with:

<?xml version="1.0" encoding="UTF-8"?>
<SDDXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <RulesetFilename file="T24N_2022.bin"/>
  <Model Name="Proposed">
    ...
  <Model Name="Standard">
    <Proj>
      <Name>project0001</Name>
      <DevMode>1</DevMode>
      <BldgEngyModelVersion>16</BldgEngyModelVersion>
      <AnalysisVersion>220070</AnalysisVersion>
      <CreateDate>1650049043</CreateDate>
      <EnergyUse>
        ..
      <EnergyUse>
        <Name>Efficiency Compliance</Name>
        <EnduseName>Efficiency Compliance</EnduseName>
        <ProposedTDV index="0">270.095</ProposedTDV>
        <StandardTDV index="0">99.089</StandardTDV>
...

And I'm trying to the the value of 'ProposedTDV' = 270.095. I've tried BeautifulSoup and ElementTree, but I'm just having trouble finding the syntax to specify the name of a child. Ie since I can't use a search string like:

Model/Proj/EnergyUse/ProposedTDV

I'm trying to find something more like:

Model[Name="Standard"]/Proj/EnergyUse[Name='Efficiency Compliance']/ProposedTDV

or similar that I could use with BeauftifulSoup (or any other XML parser).

For example, I've tried things like

from bs4 import BeautifulSoup
result = open(--xml_file_path--,'r')
contents = result.read()
soup = BeautifulSoup(contents,'xml')
test = soup.Model[Name="Proposed"].Proj.EnergyUse[Name='Efficiency Compliance'].findAll("ProposedTDV")

But I know that the syntax there is wrong.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

给我一枪 2025-01-29 14:41:07

看看 [python.docs]:xml。 etree.ElementTree-支持的XPath语法

我保存了您的 XML 并将其增强了一点(固定错误并添加了一些虚拟节点),以便有一个工作示例。

blob00.xml

<?xml version="1.0" encoding="UTF-8"?>
<SDDXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <RulesetFilename file="T24N_2022.bin"/>
  <Model Name="Proposed">
    <Proj>
      <!-- Other nodes -->
      <EnergyUse>
        <Name>Efficiency Compliance</Name>
        <EnduseName>Efficiency Compliance</EnduseName>
        <ProposedTDV index="0">1.618</ProposedTDV>
        <StandardTDV index="0">9.809</StandardTDV>
      </EnergyUse>
    </Proj>
  </Model>
  <!-- Other nodes -->
  <Model Name="Standard">
    <Proj>
      <Name>project0001</Name>
      <DevMode>1</DevMode>
      <BldgEngyModelVersion>16</BldgEngyModelVersion>
      <AnalysisVersion>220070</AnalysisVersion>
      <CreateDate>1650049043</CreateDate>
      <EnergyUse/>
      <!-- Other nodes -->
      <EnergyUse>
        <!-- Only this one should be selected! -->>
        <Name>Efficiency Compliance</Name>
        <EnduseName>Efficiency Compliance</EnduseName>
        <ProposedTDV index="0">270.095</ProposedTDV>
        <StandardTDV index="0">99.089</StandardTDV>
      </EnergyUse>
      <EnergyUse>
        <Name>Some name that SHOULD NOT MATCH</Name>
        <EnduseName>Efficiency Compliance</EnduseName>
        <ProposedTDV index="0">3.141593</ProposedTDV>
        <StandardTDV index="0">2.718282</StandardTDV>
      </EnergyUse>
    </Proj>
  </Model>
</SDDXML>

code00.py

#!/usr/bin/env python

from xml.etree import ElementTree as ET
import sys


def main(*argv):
    doc = ET.parse("./blob00.xml")
    root = doc.getroot()
    search_xpath = "./Model[@Name='Standard']/Proj/EnergyUse[Name='Efficiency Compliance']/ProposedTDV"

    # Below are different (less restrictive) filters. Decomment each and see the differences
    #search_xpath = "./Model/Proj/EnergyUse[Name='Efficiency Compliance']/ProposedTDV"
    #search_xpath = "./Model[@Name='Standard']/Proj/EnergyUse/ProposedTDV"
    #search_xpath = "./Model/Proj/EnergyUse/ProposedTDV"

    for proposedtdv_node in root.iterfind(search_xpath):
        print("{:}\nText: {:s}".format(proposedtdv_node, proposedtdv_node.text))


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.")
    sys.exit(rc)

输出

  [cfati@cfati-5510-0:e:\ work \ dev \ stackoverflow \ q071929246]&gt; ”
Python 3.9.9(标签/v3.9.9:CCB0E6A,2021,18:08:50)[MSC V.1929 64 BIT(AMD64)] 064BIT

&lt; element'possedtdv'at 0x00000188cbbee900&gt;
文字:270.095

完毕。
 

Take a look at [Python.Docs]: xml.etree.ElementTree - Supported XPath syntax.

I saved your XML and enhanced it a bit (fixed errors and added some dummy nodes), in order to have a working example.

blob00.xml:

<?xml version="1.0" encoding="UTF-8"?>
<SDDXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <RulesetFilename file="T24N_2022.bin"/>
  <Model Name="Proposed">
    <Proj>
      <!-- Other nodes -->
      <EnergyUse>
        <Name>Efficiency Compliance</Name>
        <EnduseName>Efficiency Compliance</EnduseName>
        <ProposedTDV index="0">1.618</ProposedTDV>
        <StandardTDV index="0">9.809</StandardTDV>
      </EnergyUse>
    </Proj>
  </Model>
  <!-- Other nodes -->
  <Model Name="Standard">
    <Proj>
      <Name>project0001</Name>
      <DevMode>1</DevMode>
      <BldgEngyModelVersion>16</BldgEngyModelVersion>
      <AnalysisVersion>220070</AnalysisVersion>
      <CreateDate>1650049043</CreateDate>
      <EnergyUse/>
      <!-- Other nodes -->
      <EnergyUse>
        <!-- Only this one should be selected! -->>
        <Name>Efficiency Compliance</Name>
        <EnduseName>Efficiency Compliance</EnduseName>
        <ProposedTDV index="0">270.095</ProposedTDV>
        <StandardTDV index="0">99.089</StandardTDV>
      </EnergyUse>
      <EnergyUse>
        <Name>Some name that SHOULD NOT MATCH</Name>
        <EnduseName>Efficiency Compliance</EnduseName>
        <ProposedTDV index="0">3.141593</ProposedTDV>
        <StandardTDV index="0">2.718282</StandardTDV>
      </EnergyUse>
    </Proj>
  </Model>
</SDDXML>

code00.py:

#!/usr/bin/env python

from xml.etree import ElementTree as ET
import sys


def main(*argv):
    doc = ET.parse("./blob00.xml")
    root = doc.getroot()
    search_xpath = "./Model[@Name='Standard']/Proj/EnergyUse[Name='Efficiency Compliance']/ProposedTDV"

    # Below are different (less restrictive) filters. Decomment each and see the differences
    #search_xpath = "./Model/Proj/EnergyUse[Name='Efficiency Compliance']/ProposedTDV"
    #search_xpath = "./Model[@Name='Standard']/Proj/EnergyUse/ProposedTDV"
    #search_xpath = "./Model/Proj/EnergyUse/ProposedTDV"

    for proposedtdv_node in root.iterfind(search_xpath):
        print("{:}\nText: {:s}".format(proposedtdv_node, proposedtdv_node.text))


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.")
    sys.exit(rc)

Output:

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q071929246]> "e:\Work\Dev\VEnvs\py_pc064_03.09_test0\Scripts\python.exe" code00.py
Python 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] 064bit on win32

<Element 'ProposedTDV' at 0x00000188CBBEE900>
Text: 270.095

Done.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文