在 Python 中从 amara 切换到 lxml
我正在尝试使用 lxml 库完成这样的事情: http://www.xml.com/pub/a/2005 /01/19/amara.html
from amara import binderytools
container = binderytools.bind_file('labels.xml')
for l in container.labels.label:
print l.name, 'of', l.address.city
但我经历了最难让我感觉湿透的时候!我想要做的是:下降到名为“X”的根节点,然后下降到名为“Y”的第二个子节点,然后获取其所有子节点“名为 Z”,然后仅保留具有属性的子节点将“name”设置为“bacon”,然后对于每个剩余节点,查看其所有名为“W”的子节点,并仅保留基于某个过滤器的子集,该子集查看 W 的唯一名为 A、B 和 C 的子节点。然后我需要使用以下(未优化的)伪代码来处理它们:
result = []
X = root(doc(parse(xml_file_name)))
Y = X[1] # Second child
Zs = Y.children()
for Z in Zs:
if Z.name != 'bacon': continue # skip
Ws = Z.children()
record = []
assert(len(Ws) == 9)
W0 = Ws[0]
assert(W0.A == '42')
record.append(str(W0.A) + " " + W0.B + " " + W0.C))
...
W1 = Ws[1]
assert(W1.A == '256')
...
result.append(record)
这就是我想要完成的任务。在尝试使代码更清晰之前,我想让它工作。
请帮忙,因为我迷失在这个 API 中。如果您有疑问,请告诉我。
I am trying to accomplish with lxml library something like this:
http://www.xml.com/pub/a/2005/01/19/amara.html
from amara import binderytools
container = binderytools.bind_file('labels.xml')
for l in container.labels.label:
print l.name, 'of', l.address.city
but I have had the hardest time to get my feel wet! What I want to do is: descend to the root node named 'X', then descend to its second child named 'Y', then grab all of its children 'named Z', then of those keep only the children than have an attribute 'name' set to 'bacon', then for each remaining node look at all of its children named 'W', and keep only a subset based on some filter, which looks at W's only children named A, B, and C. Then I need to process them with the following (non-optimized) pseudo-code:
result = []
X = root(doc(parse(xml_file_name)))
Y = X[1] # Second child
Zs = Y.children()
for Z in Zs:
if Z.name != 'bacon': continue # skip
Ws = Z.children()
record = []
assert(len(Ws) == 9)
W0 = Ws[0]
assert(W0.A == '42')
record.append(str(W0.A) + " " + W0.B + " " + W0.C))
...
W1 = Ws[1]
assert(W1.A == '256')
...
result.append(record)
This is sort of what I am trying to accomplish. Before I try to make this code cleaner, I would like to make it work.
Please help, as I am lost in this API. Let me know if you have questions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这可能是一种收紧内部循环的方法,尽管我只是猜测您希望保留哪些记录:
This might be a way to tighten-up the inner loop, though I'm only guessing what records you wish to keep: