当标签中有类型时从 xml 检索数据

发布于 2025-01-16 11:46:31 字数 3759 浏览 2 评论 0原文

以下是较大 XML 的一部分。 XML 由多个像这样的条目组成。我想从每个条目中检索一些数据。

<entry>
   <title>S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718</title>
   <link href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/$value"/>
   <link rel="alternative" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/"/>
   <link rel="icon" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/Products('Quicklook')/$value"/>
   <id>c5525116-f3a4-4738-830e-c68e7e7f0c1c</id>
   <summary>Date: 2022-01-20T10:53:51.024Z, Instrument: MSI, Satellite: Sentinel-2, Size: 991.25 MB</summary>
   <ondemand>false</ondemand>
   <date name="generationdate">2022-01-20T13:17:18Z</date>
   <date name="beginposition">2022-01-20T10:53:51.024Z</date>
   <date name="endposition">2022-01-20T10:53:51.024Z</date>
   <date name="ingestiondate">2022-01-20T15:34:49.421Z</date>
   <int name="orbitnumber">34368</int>
   <int name="relativeorbitnumber">51</int>
   <double name="cloudcoverpercentage">25.768485</double>
   <str name="level1cpdiidentifier">S2A_OPER_MSI_L1C_TL_VGS2_20220120T125348_A034368_T31UFV_N03.01</str>
   <str name="gmlfootprint"><gml:Polygon srsName="http://www.opengis.net/gml/srs/epsg.xml#4326" xmlns:gml="http://www.opengis.net/gml"> <gml:outerBoundaryIs> <gml:LinearRing> 
   <gml:coordinates>54.138373318280884,4.530701256365736 54.10530483528773,6.209260084873382 53.119880621881606,6.135320369204087 53.15178549855093,4.495379581512258 54.138373318280884,4.530701256365736</gml:coordinates> </gml:LinearRing> 
   </gml:outerBoundaryIs> </gml:Polygon></str>
   <str name="footprint">MULTIPOLYGON (((6.135320369204087 53.119880621881606, 6.209260084873382 54.10530483528773, 4.530701256365736 54.138373318280884, 4.495379581512258 53.15178549855093, 6.135320369204087 53.119880621881606)))</str>
   <str name="format">SAFE</str>
   <str name="processingbaseline">03.01</str>
   <str name="platformname">Sentinel-2</str>
   <str name="filename">S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718.SAFE</str>
   <str name="producttype">S2MSI2A</str>
   <str name="platformidentifier">2015-028A</str>
   <str name="platformserialidentifier">Sentinel-2A</str>
   <str name="processinglevel">Level-2A</str>
   <str name="datastripidentifier">S2A_OPER_MSI_L2A_DS_VGS2_20220120T131718_S20220120T105350_N03.01</str>
   <str name="granuleidentifier">S2A_OPER_MSI_L2A_TL_VGS2_20220120T131718_A034368_T31UFV_N03.01</str>
   <str name="identifier">S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718</str>
   <str name="uuid">c5525116-f3a4-4738-830e-c68e7e7f0c1c</str>
</entry>

使用此代码，我可以检索一些部分：

import xml.dom.minidom

doc = xml.dom.minidom.parse(xml_file)
nodes_id = doc.getElementsByTagName("id")

我知道列表 nodes_id 由节点组成（例如），但我稍后将它们转换为实际数据。

我还想检索 cloudcoverpercentage 和 producttype。但是，由于标记中现有 str name= 和 double name=，我无法弄清楚如何执行此操作。我尝试了以下方法，但这似乎不起作用。

nodes_cloudcover = doc.getElementsByTagName("cloudcoverpercentage")
nodes_cloudcover = doc.getElementsByTagName("double name='cloudcoverpercentage'")

有人知道我该如何解决这个问题吗？提前致谢！

原文

The following is part of a larger XML. The XML consists of multiple entries like this one. I want to retrieve some data from every entry.

<entry>
   <title>S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718</title>
   <link href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/$value"/>
   <link rel="alternative" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/"/>
   <link rel="icon" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/Products('Quicklook')/$value"/>
   <id>c5525116-f3a4-4738-830e-c68e7e7f0c1c</id>
   <summary>Date: 2022-01-20T10:53:51.024Z, Instrument: MSI, Satellite: Sentinel-2, Size: 991.25 MB</summary>
   <ondemand>false</ondemand>
   <date name="generationdate">2022-01-20T13:17:18Z</date>
   <date name="beginposition">2022-01-20T10:53:51.024Z</date>
   <date name="endposition">2022-01-20T10:53:51.024Z</date>
   <date name="ingestiondate">2022-01-20T15:34:49.421Z</date>
   <int name="orbitnumber">34368</int>
   <int name="relativeorbitnumber">51</int>
   <double name="cloudcoverpercentage">25.768485</double>
   <str name="level1cpdiidentifier">S2A_OPER_MSI_L1C_TL_VGS2_20220120T125348_A034368_T31UFV_N03.01</str>
   <str name="gmlfootprint"><gml:Polygon srsName="http://www.opengis.net/gml/srs/epsg.xml#4326" xmlns:gml="http://www.opengis.net/gml"> <gml:outerBoundaryIs> <gml:LinearRing> 
   <gml:coordinates>54.138373318280884,4.530701256365736 54.10530483528773,6.209260084873382 53.119880621881606,6.135320369204087 53.15178549855093,4.495379581512258 54.138373318280884,4.530701256365736</gml:coordinates> </gml:LinearRing> 
   </gml:outerBoundaryIs> </gml:Polygon></str>
   <str name="footprint">MULTIPOLYGON (((6.135320369204087 53.119880621881606, 6.209260084873382 54.10530483528773, 4.530701256365736 54.138373318280884, 4.495379581512258 53.15178549855093, 6.135320369204087 53.119880621881606)))</str>
   <str name="format">SAFE</str>
   <str name="processingbaseline">03.01</str>
   <str name="platformname">Sentinel-2</str>
   <str name="filename">S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718.SAFE</str>
   <str name="producttype">S2MSI2A</str>
   <str name="platformidentifier">2015-028A</str>
   <str name="platformserialidentifier">Sentinel-2A</str>
   <str name="processinglevel">Level-2A</str>
   <str name="datastripidentifier">S2A_OPER_MSI_L2A_DS_VGS2_20220120T131718_S20220120T105350_N03.01</str>
   <str name="granuleidentifier">S2A_OPER_MSI_L2A_TL_VGS2_20220120T131718_A034368_T31UFV_N03.01</str>
   <str name="identifier">S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718</str>
   <str name="uuid">c5525116-f3a4-4738-830e-c68e7e7f0c1c</str>
</entry>

Using this code I'm able to retrieve some parts:

import xml.dom.minidom

doc = xml.dom.minidom.parse(xml_file)
nodes_id = doc.getElementsByTagName("id")

I know the list nodes_id consists of nodes (like <DOM Element: id at 0x13f7d92e5e8>), but I convert these later to the actually data.

I also would like to retrieve the cloudcoverpercentage and producttype. However, due to the existing str name= and double name= in the tag I haven't been able to figure out how to do this. I tried the following, but this doesn't seem to work.

nodes_cloudcover = doc.getElementsByTagName("cloudcoverpercentage")
nodes_cloudcover = doc.getElementsByTagName("double name='cloudcoverpercentage'")

Does someone know how I can solve this problem? Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

_畞蕅 2025-01-23 11:46:31

我建议使用支持 xpath 的模块，这使得这项任务变得更容易。

import xml.etree.ElementTree as ET


xml_file = "xml_imp.xml"

tree = ET.parse(xml_file)
root = tree.getroot()
cloud = root.find(".//double[@name='cloudcoverpercentage']")
# this reads as: find an element "double" that has an attribute "name" which has content "cloudpercentage"
product = root.find(".//str[@name='producttype']")


print(cloud.text)
print(product.text)

输出：

25.768485
S2MSI2A

I'd suggest using a module that support xpath, which makes this task way easier.

import xml.etree.ElementTree as ET


xml_file = "xml_imp.xml"

tree = ET.parse(xml_file)
root = tree.getroot()
cloud = root.find(".//double[@name='cloudcoverpercentage']")
# this reads as: find an element "double" that has an attribute "name" which has content "cloudpercentage"
product = root.find(".//str[@name='producttype']")


print(cloud.text)
print(product.text)

Output: