当标签中有类型时从 xml 检索数据
以下是较大 XML 的一部分。 XML 由多个像这样的条目组成。我想从每个条目中检索一些数据。
<entry>
<title>S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718</title>
<link href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/$value"/>
<link rel="alternative" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/"/>
<link rel="icon" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/Products('Quicklook')/$value"/>
<id>c5525116-f3a4-4738-830e-c68e7e7f0c1c</id>
<summary>Date: 2022-01-20T10:53:51.024Z, Instrument: MSI, Satellite: Sentinel-2, Size: 991.25 MB</summary>
<ondemand>false</ondemand>
<date name="generationdate">2022-01-20T13:17:18Z</date>
<date name="beginposition">2022-01-20T10:53:51.024Z</date>
<date name="endposition">2022-01-20T10:53:51.024Z</date>
<date name="ingestiondate">2022-01-20T15:34:49.421Z</date>
<int name="orbitnumber">34368</int>
<int name="relativeorbitnumber">51</int>
<double name="cloudcoverpercentage">25.768485</double>
<str name="level1cpdiidentifier">S2A_OPER_MSI_L1C_TL_VGS2_20220120T125348_A034368_T31UFV_N03.01</str>
<str name="gmlfootprint"><gml:Polygon srsName="http://www.opengis.net/gml/srs/epsg.xml#4326" xmlns:gml="http://www.opengis.net/gml"> <gml:outerBoundaryIs> <gml:LinearRing>
<gml:coordinates>54.138373318280884,4.530701256365736 54.10530483528773,6.209260084873382 53.119880621881606,6.135320369204087 53.15178549855093,4.495379581512258 54.138373318280884,4.530701256365736</gml:coordinates> </gml:LinearRing>
</gml:outerBoundaryIs> </gml:Polygon></str>
<str name="footprint">MULTIPOLYGON (((6.135320369204087 53.119880621881606, 6.209260084873382 54.10530483528773, 4.530701256365736 54.138373318280884, 4.495379581512258 53.15178549855093, 6.135320369204087 53.119880621881606)))</str>
<str name="format">SAFE</str>
<str name="processingbaseline">03.01</str>
<str name="platformname">Sentinel-2</str>
<str name="filename">S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718.SAFE</str>
<str name="producttype">S2MSI2A</str>
<str name="platformidentifier">2015-028A</str>
<str name="platformserialidentifier">Sentinel-2A</str>
<str name="processinglevel">Level-2A</str>
<str name="datastripidentifier">S2A_OPER_MSI_L2A_DS_VGS2_20220120T131718_S20220120T105350_N03.01</str>
<str name="granuleidentifier">S2A_OPER_MSI_L2A_TL_VGS2_20220120T131718_A034368_T31UFV_N03.01</str>
<str name="identifier">S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718</str>
<str name="uuid">c5525116-f3a4-4738-830e-c68e7e7f0c1c</str>
</entry>
使用此代码,我可以检索一些部分:
import xml.dom.minidom
doc = xml.dom.minidom.parse(xml_file)
nodes_id = doc.getElementsByTagName("id")
我知道列表 nodes_id
由节点组成(例如
我还想检索 cloudcoverpercentage
和 producttype
。但是,由于标记中现有 str name=
和 double name=
,我无法弄清楚如何执行此操作。我尝试了以下方法,但这似乎不起作用。
nodes_cloudcover = doc.getElementsByTagName("cloudcoverpercentage")
nodes_cloudcover = doc.getElementsByTagName("double name='cloudcoverpercentage'")
有人知道我该如何解决这个问题吗?提前致谢!
The following is part of a larger XML. The XML consists of multiple entries like this one. I want to retrieve some data from every entry.
<entry>
<title>S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718</title>
<link href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/$value"/>
<link rel="alternative" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/"/>
<link rel="icon" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/Products('Quicklook')/$value"/>
<id>c5525116-f3a4-4738-830e-c68e7e7f0c1c</id>
<summary>Date: 2022-01-20T10:53:51.024Z, Instrument: MSI, Satellite: Sentinel-2, Size: 991.25 MB</summary>
<ondemand>false</ondemand>
<date name="generationdate">2022-01-20T13:17:18Z</date>
<date name="beginposition">2022-01-20T10:53:51.024Z</date>
<date name="endposition">2022-01-20T10:53:51.024Z</date>
<date name="ingestiondate">2022-01-20T15:34:49.421Z</date>
<int name="orbitnumber">34368</int>
<int name="relativeorbitnumber">51</int>
<double name="cloudcoverpercentage">25.768485</double>
<str name="level1cpdiidentifier">S2A_OPER_MSI_L1C_TL_VGS2_20220120T125348_A034368_T31UFV_N03.01</str>
<str name="gmlfootprint"><gml:Polygon srsName="http://www.opengis.net/gml/srs/epsg.xml#4326" xmlns:gml="http://www.opengis.net/gml"> <gml:outerBoundaryIs> <gml:LinearRing>
<gml:coordinates>54.138373318280884,4.530701256365736 54.10530483528773,6.209260084873382 53.119880621881606,6.135320369204087 53.15178549855093,4.495379581512258 54.138373318280884,4.530701256365736</gml:coordinates> </gml:LinearRing>
</gml:outerBoundaryIs> </gml:Polygon></str>
<str name="footprint">MULTIPOLYGON (((6.135320369204087 53.119880621881606, 6.209260084873382 54.10530483528773, 4.530701256365736 54.138373318280884, 4.495379581512258 53.15178549855093, 6.135320369204087 53.119880621881606)))</str>
<str name="format">SAFE</str>
<str name="processingbaseline">03.01</str>
<str name="platformname">Sentinel-2</str>
<str name="filename">S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718.SAFE</str>
<str name="producttype">S2MSI2A</str>
<str name="platformidentifier">2015-028A</str>
<str name="platformserialidentifier">Sentinel-2A</str>
<str name="processinglevel">Level-2A</str>
<str name="datastripidentifier">S2A_OPER_MSI_L2A_DS_VGS2_20220120T131718_S20220120T105350_N03.01</str>
<str name="granuleidentifier">S2A_OPER_MSI_L2A_TL_VGS2_20220120T131718_A034368_T31UFV_N03.01</str>
<str name="identifier">S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718</str>
<str name="uuid">c5525116-f3a4-4738-830e-c68e7e7f0c1c</str>
</entry>
Using this code I'm able to retrieve some parts:
import xml.dom.minidom
doc = xml.dom.minidom.parse(xml_file)
nodes_id = doc.getElementsByTagName("id")
I know the list nodes_id
consists of nodes (like <DOM Element: id at 0x13f7d92e5e8>), but I convert these later to the actually data.
I also would like to retrieve the cloudcoverpercentage
and producttype
. However, due to the existing str name=
and double name=
in the tag I haven't been able to figure out how to do this. I tried the following, but this doesn't seem to work.
nodes_cloudcover = doc.getElementsByTagName("cloudcoverpercentage")
nodes_cloudcover = doc.getElementsByTagName("double name='cloudcoverpercentage'")
Does someone know how I can solve this problem? Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我建议使用支持 xpath 的模块,这使得这项任务变得更容易。
输出:
I'd suggest using a module that support xpath, which makes this task way easier.
Output: