在 Python 中从 amara 切换到 lxml

发布于 2024-10-04 05:42:27 字数 1029 浏览 6 评论 0原文

我正在尝试使用 lxml 库完成这样的事情： http://www.xml.com/pub/a/2005 /01/19/amara.html

from amara import binderytools

container = binderytools.bind_file('labels.xml')
for l in container.labels.label:
    print l.name, 'of', l.address.city

但我经历了最难让我感觉湿透的时候！我想要做的是：下降到名为“X”的根节点，然后下降到名为“Y”的第二个子节点，然后获取其所有子节点“名为 Z”，然后仅保留具有属性的子节点将“name”设置为“bacon”，然后对于每个剩余节点，查看其所有名为“W”的子节点，并仅保留基于某个过滤器的子集，该子集查看 W 的唯一名为 A、B 和 C 的子节点。然后我需要使用以下（未优化的）伪代码来处理它们：

result = []
X = root(doc(parse(xml_file_name)))
Y = X[1] # Second child
Zs = Y.children()
for Z in Zs:
    if Z.name != 'bacon': continue # skip
    Ws = Z.children()
    record = []
    assert(len(Ws) == 9)
    W0 = Ws[0]
    assert(W0.A == '42')
    record.append(str(W0.A) + " " + W0.B + " " + W0.C))
    ...
    W1 = Ws[1]
    assert(W1.A == '256')
    ...
    result.append(record)

这就是我想要完成的任务。在尝试使代码更清晰之前，我想让它工作。

请帮忙，因为我迷失在这个 API 中。如果您有疑问，请告诉我。

原文

I am trying to accomplish with lxml library something like this:
http://www.xml.com/pub/a/2005/01/19/amara.html

from amara import binderytools

container = binderytools.bind_file('labels.xml')
for l in container.labels.label:
    print l.name, 'of', l.address.city

but I have had the hardest time to get my feel wet! What I want to do is: descend to the root node named 'X', then descend to its second child named 'Y', then grab all of its children 'named Z', then of those keep only the children than have an attribute 'name' set to 'bacon', then for each remaining node look at all of its children named 'W', and keep only a subset based on some filter, which looks at W's only children named A, B, and C. Then I need to process them with the following (non-optimized) pseudo-code:

result = []
X = root(doc(parse(xml_file_name)))
Y = X[1] # Second child
Zs = Y.children()
for Z in Zs:
    if Z.name != 'bacon': continue # skip
    Ws = Z.children()
    record = []
    assert(len(Ws) == 9)
    W0 = Ws[0]
    assert(W0.A == '42')
    record.append(str(W0.A) + " " + W0.B + " " + W0.C))
    ...
    W1 = Ws[1]
    assert(W1.A == '256')
    ...
    result.append(record)

This is sort of what I am trying to accomplish. Before I try to make this code cleaner, I would like to make it work.

Please help, as I am lost in this API. Let me know if you have questions.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

路弥 2024-10-11 05:42:27

import lxml.etree as le
import io

content='''\
<foo><X><Y>skip this</Y><Y><Z name="apple"><W>not here</W></Z>
<Z name="bacon"><W><A>42</A><B>b</B><C>c</C></W><W><A>256</A><B>b</B><C>c</C></W></Z>
<Z name="bacon"><W><A>42</A><B>b</B><C>c</C></W><W><A>256</A><B>b</B><C>c</C></W></Z>
</Y></X></foo>
'''
doc=le.parse(io.BytesIO(content))
# print(le.tostring(doc, pretty_print=True))
result=[]
Zs=doc.xpath('//X/Y[2]/Z[@name="bacon"]')
for Z in Zs:
    Ws=Z.xpath('W')
    record=[]
    assert(len(Ws)==2)  #<--- Change to 9        
    abc=Ws[0].xpath('descendant::text()')
    # print(abc)
    # ['42', 'b', 'c']
    assert(abc[0] == '42')
    record.append(' '.join(abc))
    abc=Ws[1].xpath('descendant::text()')    
    assert(abc[0] == '256')
    result.append(record)
print(result)
# [['42 b c'], ['42 b c']]

这可能是一种收紧内部循环的方法，尽管我只是猜测您希望保留哪些记录：

for Z in Zs:
    Ws=Z.xpath('W')
    assert(len(Ws)==2)  #<--- Change to 9
    a_vals=('42','256')
    for W,a_val in zip(Ws,a_vals):
        abc=W.xpath('descendant::text()')
        assert(abc[0] == a_val)
        result.append([' '.join(abc)])
print(result)
# [['42 b c'], ['256 b c'], ['42 b c'], ['256 b c']]

import lxml.etree as le
import io

content='''\
<foo><X><Y>skip this</Y><Y><Z name="apple"><W>not here</W></Z>
<Z name="bacon"><W><A>42</A><B>b</B><C>c</C></W><W><A>256</A><B>b</B><C>c</C></W></Z>
<Z name="bacon"><W><A>42</A><B>b</B><C>c</C></W><W><A>256</A><B>b</B><C>c</C></W></Z>
</Y></X></foo>
'''
doc=le.parse(io.BytesIO(content))
# print(le.tostring(doc, pretty_print=True))
result=[]
Zs=doc.xpath('//X/Y[2]/Z[@name="bacon"]')
for Z in Zs:
    Ws=Z.xpath('W')
    record=[]
    assert(len(Ws)==2)  #<--- Change to 9        
    abc=Ws[0].xpath('descendant::text()')
    # print(abc)
    # ['42', 'b', 'c']
    assert(abc[0] == '42')
    record.append(' '.join(abc))
    abc=Ws[1].xpath('descendant::text()')    
    assert(abc[0] == '256')
    result.append(record)
print(result)
# [['42 b c'], ['42 b c']]

This might be a way to tighten-up the inner loop, though I'm only guessing what records you wish to keep:

for Z in Zs:
    Ws=Z.xpath('W')
    assert(len(Ws)==2)  #<--- Change to 9
    a_vals=('42','256')
    for W,a_val in zip(Ws,a_vals):
        abc=W.xpath('descendant::text()')
        assert(abc[0] == a_val)
        result.append([' '.join(abc)])
print(result)
# [['42 b c'], ['256 b c'], ['42 b c'], ['256 b c']]

回复收藏 0 原文

~没有更多了~