python lxml etree.iterparse。检查当前元素是否符合XPath

发布于 2025-02-13 04:27:37 字数 714 浏览 1 评论 0原文

我想阅读相当大的XML作为流。但是找不到任何使用我的旧XPathes找到元素的方法。以前文件的大小适中，因此足以：

all_elements = []
for xpath in list_of_xpathes:
    all_elements.append(etree.parse(file).getroot().findall(xpath))

现在我在iterparse上挣扎。理想情况下，解决方案是将当前元素的路径与所需的XPath进行比较：

import lxml.etree as et

xml_file = r"my.xml" # quite big xml, that i should read
xml_paths = ['/some/arbitrary/xpath', '/another/xpath']

all_elements = []
iter = et.iterparse(xml_file, events = ('end',))
for event, element in iter:
    for xpath in xml_paths:
        if element_complies_with_xpath(element, xpath):
            all_elements.append(element)
            break

如何使用lxml实现element_complies_with_xpath函数？

原文

I would like to read quite big XML as a stream. But could not find any way to use my old XPathes to find elements.
Previously files were of moderate size, so in was enough to:

all_elements = []
for xpath in list_of_xpathes:
    all_elements.append(etree.parse(file).getroot().findall(xpath))

Now I am struggling with iterparse. Ideally the solution would be to compare path of current element with desired xpath:

import lxml.etree as et

xml_file = r"my.xml" # quite big xml, that i should read
xml_paths = ['/some/arbitrary/xpath', '/another/xpath']

all_elements = []
iter = et.iterparse(xml_file, events = ('end',))
for event, element in iter:
    for xpath in xml_paths:
        if element_complies_with_xpath(element, xpath):
            all_elements.append(element)
            break

How is it possible to implement element_complies_with_xpath function using lxml?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟雨凡馨 2025-02-20 04:27:37

如果可以提取XPATH的第一部分，则其余部分可以如下测试。可以使用＆lt;第一个元素名称＆gt;：＆lt; 的dist，而不是字符串列表。父元素也可以用作dict键。完整的XPath：/some/nutary/xpath dict：{'some'：'./arbitrary/xpath'}

import lxml.etree as et

def element_complies_with_xpath(element, xpath):
    children = element.xpath(xpath)
    print([ "child:" + x.tag for x in children])
    return len(children) > 0

xml_file = r"/home/lmc/tmp/test.xml" # quite big xml, that i should read
xml_paths = [{'membership': './users/user'}, {'entry':'author/name'}]

all_elements = []
iter1 = et.iterparse(xml_file, events = ('end',))

for event, element in iter1:
    for d in xml_paths:
        if element.tag in d and element_complies_with_xpath(element, d[element.tag]):
            all_elements.append(element)
            break

print([x.tag for x in all_elements])

count（） xpath函数也可以使用

def element_complies_with_xpath(element, xpath):
    children = element.xpath(xpath)
    print( f"child exist: {children}")
    return children

xml_file = r"/home/luis/tmp/test.xml" # quite big xml, that i should read
xml_paths = [{'membership': 'count(./users/user) > 0'}, {'entry':'count(author/name) > 0'}]

If first part of the xpath can be extracted then the rest could be tested as follows. Instead of a list of strings, a dict of <first element name>: <rest of the xpath> could be used. Parent element could be used as dict key also.
Full xpath: /some/arbitrary/xpath
dict : {'some': './arbitrary/xpath'}

import lxml.etree as et

def element_complies_with_xpath(element, xpath):
    children = element.xpath(xpath)
    print([ "child:" + x.tag for x in children])
    return len(children) > 0

xml_file = r"/home/lmc/tmp/test.xml" # quite big xml, that i should read
xml_paths = [{'membership': './users/user'}, {'entry':'author/name'}]

all_elements = []
iter1 = et.iterparse(xml_file, events = ('end',))

for event, element in iter1:
    for d in xml_paths:
        if element.tag in d and element_complies_with_xpath(element, d[element.tag]):
            all_elements.append(element)
            break

print([x.tag for x in all_elements])

count() xpath function could be used also

def element_complies_with_xpath(element, xpath):
    children = element.xpath(xpath)
    print( f"child exist: {children}")
    return children

xml_file = r"/home/luis/tmp/test.xml" # quite big xml, that i should read
xml_paths = [{'membership': 'count(./users/user) > 0'}, {'entry':'count(author/name) > 0'}]

回复收藏 0 原文

~没有更多了~