任何人都有一个使用 lxml.html 中的 element.sourceline 方法的示例

发布于 2024-09-15 21:54:52 字数 1147 浏览 4 评论 0原文

我希望我问的是正确的。我试图弄清楚 element.sourceline 的作用以及是否有某种方法可以使用它的功能。我尝试过多种方式从 html 构建我的元素，但每次我迭代我的元素并请求源代码时，我总是得到 None 。当我尝试使用内置帮助时，我也没有得到任何结果。

我用谷歌搜索了一个例子，但还没有找到。

我知道这是一种元素方法而不是树方法，但这是我能想到的最好的方法。

响应 Jim Garrison 的示例请求

theTree=html.parse(open(r'c:\temp\testlxml.htm'))
check_source
the_elements=[(e,e.sourceline) for e in theTree.iter()]  #trying to get the sourceline
for each in the_elements:
    if each[1]!=None:
    check_source.append(each)

当我运行这个 len(check_source)==0

我的 htm 文件有 19,379 行所以我不确定你想看它

我尝试了一种解决方案

>>> myroot=html.fromstring(xml)
>>> elementlines=[(e,e.sourceline) for e in myroot.iter()]
>>> elementlines
[(<Element doc at 12bb730>, None), (<Element foo at 12bb650>, None)]

当我用 etree 做同样的事情时我得到演示了什么

>>> myroot=etree.fromstring(xml)
>>> elementlines=[(e,e.sourceline) for e in myroot.iter()]
>>> elementlines
[(<Element doc at 36a6b70>, 1), (<Element foo at 277b4e0>, 2)]

但是我的源 htm 太混乱了，我无法使用 etree 来探索树我收到一个错误

原文

I hope I asked that correctly. I am trying to figure out what element.sourceline does and if there is some way I can use its features. I have tried building my elements from the html a number of ways but every time I iterate through my elements and ask for sourceline I always get None. When I tried to use the built-in help I done't get anything either.

I have Googled for an example but none were found yet.

I know it is a method of elements not trees but that is the best I have been able to come up with.

In response to Jim Garrison's request for an example

theTree=html.parse(open(r'c:\temp\testlxml.htm'))
check_source
the_elements=[(e,e.sourceline) for e in theTree.iter()]  #trying to get the sourceline
for each in the_elements:
    if each[1]!=None:
    check_source.append(each)

When I run this len(check_source)==0

My htm file has 19,379 lines so I am not sure you want to see it

I tried one solution

>>> myroot=html.fromstring(xml)
>>> elementlines=[(e,e.sourceline) for e in myroot.iter()]
>>> elementlines
[(<Element doc at 12bb730>, None), (<Element foo at 12bb650>, None)]

When I do the same thing with etree I get what was demonstrated

>>> myroot=etree.fromstring(xml)
>>> elementlines=[(e,e.sourceline) for e in myroot.iter()]
>>> elementlines
[(<Element doc at 36a6b70>, 1), (<Element foo at 277b4e0>, 2)]

But my source htm is so messy I can't use etree to explore the tree
I get an error

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

[浮城] 2024-09-22 21:54:52

sourceline 将返回解析文档时确定的行号。因此它不适用于通过 API 添加的元素。例如：

from lxml import etree

xml = '<doc>\n<foo>rain in spain</foo>\n</doc>'
root = etree.fromstring(xml)

print root.find('foo').sourceline # 2

root.append(etree.Element('bar'))
print etree.tostring(root)
print root.find('bar').sourceline # None

我很确定这同样适用于 lxml.html。

sourceline will return the line number determined at the time of parsing a document. So it won't apply to an Element that was added through the API. For example:

from lxml import etree

xml = '<doc>\n<foo>rain in spain</foo>\n</doc>'
root = etree.fromstring(xml)

print root.find('foo').sourceline # 2

root.append(etree.Element('bar'))
print etree.tostring(root)
print root.find('bar').sourceline # None

I'm pretty sure the same applies to lxml.html.

回复收藏 0 原文

~没有更多了~