如何使用 Python 迭代 XML 来测试子节点是否存在(使用 xml.dom.minidom)

发布于 2024-10-30 21:33:28 字数 535 浏览 1 评论 0原文

我正在使用 Python 和 xml.dom.minidom 来迭代导出的 Excel 电子表格,通过对 .write 的各种调用为我们的餐厅菜单输出 HTML 表。困难在于 Excel 输出的 XML 不是结构化的。为了弥补这一点,我设置了许多变量(day、previousDay、meal 等),当我遇到具有我正在测试的 nodeValue 的子节点时,这些变量就会被设置。我有一堆 if 语句来确定何时启动一个新表(一周中的每一天)或一个新行(当 day != previousDay 时)等等。

不过,我很难弄清楚如何忽略特定节点。有一些节点从 Excel 获取输出,我需要忽略这些节点,我可以根据它们具有特定值的子节点来执行此操作,但我不知道如何实现它。

基本上,我的主 for 循环中需要以下 if 语句:

for node in dome.getElementsByTagName('data'):  
    if node contains childNode with nodeValue == 'test':
        do something

I am using Python, and xml.dom.minidom, to iterate over an exported Excel Spreadsheet, outputting an HTML table for our dining hall menu with various calls to .write. The difficulty lies in that the XML that Excel outputs isn't structured. To compensate for this, I have set up a number of variables (day, previousDay, meal etc.) that get set when I encounter child nodes that have a nodeValue that I am testing against. I have a bunch of if statements to determine when to start a new table (for each day of the week), or a new row (when day != previousDay) and so on.

I am having difficuly in figuring out how to ignore particular nodes though. There are a handful of nodes that get output from Excel that I need to ignore, and I can do this based on their children nodes having particular values, but I can't figure out how to implement it.

Basically, I need the following if statement in my main for loop:

for node in dome.getElementsByTagName('data'):  
    if node contains childNode with nodeValue == 'test':
        do something

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

时常饿 2024-11-06 21:33:29

我的快速倾向是使用一个带有 get-out-of-node-free-card (嗯,例外)的嵌套 for 循环,如下所示。

Class BadNodeException (Exception):
pass
for node in dome.getElementsByTagName('data'):
try:  
    for child in node.childNodes:
        if child.nodeValue == 'test':
           raise BadNodeException
    ## process node as normal
except BadNodeException:
    pass

My quick inclination is to have a nested for-loop with a get-out-of-node-free-card (um, exception) like the following.

Class BadNodeException (Exception):
pass
for node in dome.getElementsByTagName('data'):
try:  
    for child in node.childNodes:
        if child.nodeValue == 'test':
           raise BadNodeException
    ## process node as normal
except BadNodeException:
    pass
暮倦 2024-11-06 21:33:29

您必须使用 xml.dom.minidom 吗?因为这正是 XPath 擅长的事情。例如,使用 lxml.etree 可以找到您想要的所有元素:

my_elements = document.xpath("//data[not(*[.='test'])]")

W3C 的 DOM 确实很难用于解决现实世界的问题,因为它不包含诸如属性返回之类的简单内容一个元素的值。 (XPath 声明一个元素的值是连接在一起的所有子文本节点,这就是上述模式起作用的原因。)

您需要为此类事情实现一个辅助函数,例如:

def element_text(e):
  return "".join(t.nodeValue for t in e.childNodes if t.nodeType == Node.TEXT_NODE)

这使得构建过滤功能,例如:

def element_is_of_interest(e):
   return not any((c for c in e.childNodes if element_text(c) == "test"))

并获取如下元素:

my_elements = filter(element_is_of_interest, d.getElementsByTagName("data"))

Do you have to use xml.dom.minidom? Because this is the kind of thing that XPath shines at. Using lxml.etree, for instance, this finds all of the elements you want:

my_elements = document.xpath("//data[not(*[.='test'])]")

The W3C's DOM is really hard to use for real-world problems, because it doesn't include simple things like an attribute returning an element's value. (XPath declares that an element's value is all of its child text nodes concatenated together, which is why the above pattern works.)

You'll need to implement a helper function for that sort of thing, e.g.:

def element_text(e):
  return "".join(t.nodeValue for t in e.childNodes if t.nodeType == Node.TEXT_NODE)

This makes it easier to build a filter function, e.g.:

def element_is_of_interest(e):
   return not any((c for c in e.childNodes if element_text(c) == "test"))

and get your elements like this:

my_elements = filter(element_is_of_interest, d.getElementsByTagName("data"))
风轻花落早 2024-11-06 21:33:29

您是否考虑过使用 SAX 解析器? Sax 解析器按照节点出现的顺序(深度优先)处理 XML 树结构,并允许您在解析节点值时对其进行处理。

xml.sax.XmlReader

Have you considered using a SAX parser instead? Sax parsers process the XML tree structure in the order of appearance of the nodes (depth first) and allows you to handle the node value at the point of parsing it.

xml.sax.XmlReader

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文