任何人都有一个使用 lxml.html 中的 element.sourceline 方法的示例
我希望我问的是正确的。我试图弄清楚 element.sourceline 的作用以及是否有某种方法可以使用它的功能。我尝试过多种方式从 html 构建我的元素,但每次我迭代我的元素并请求源代码时,我总是得到 None 。当我尝试使用内置帮助时,我也没有得到任何结果。
我用谷歌搜索了一个例子,但还没有找到。
我知道这是一种元素方法而不是树方法,但这是我能想到的最好的方法。
响应 Jim Garrison 的示例请求
theTree=html.parse(open(r'c:\temp\testlxml.htm'))
check_source
the_elements=[(e,e.sourceline) for e in theTree.iter()] #trying to get the sourceline
for each in the_elements:
if each[1]!=None:
check_source.append(each)
当我运行这个 len(check_source)==0
我的 htm 文件有 19,379 行所以我不确定你想看它
我尝试了一种解决方案
>>> myroot=html.fromstring(xml)
>>> elementlines=[(e,e.sourceline) for e in myroot.iter()]
>>> elementlines
[(<Element doc at 12bb730>, None), (<Element foo at 12bb650>, None)]
当我用 etree 做同样的事情时我得到演示了什么
>>> myroot=etree.fromstring(xml)
>>> elementlines=[(e,e.sourceline) for e in myroot.iter()]
>>> elementlines
[(<Element doc at 36a6b70>, 1), (<Element foo at 277b4e0>, 2)]
但是我的源 htm 太混乱了,我无法使用 etree 来探索树 我收到一个错误
I hope I asked that correctly. I am trying to figure out what element.sourceline does and if there is some way I can use its features. I have tried building my elements from the html a number of ways but every time I iterate through my elements and ask for sourceline I always get None. When I tried to use the built-in help I done't get anything either.
I have Googled for an example but none were found yet.
I know it is a method of elements not trees but that is the best I have been able to come up with.
In response to Jim Garrison's request for an example
theTree=html.parse(open(r'c:\temp\testlxml.htm'))
check_source
the_elements=[(e,e.sourceline) for e in theTree.iter()] #trying to get the sourceline
for each in the_elements:
if each[1]!=None:
check_source.append(each)
When I run this len(check_source)==0
My htm file has 19,379 lines so I am not sure you want to see it
I tried one solution
>>> myroot=html.fromstring(xml)
>>> elementlines=[(e,e.sourceline) for e in myroot.iter()]
>>> elementlines
[(<Element doc at 12bb730>, None), (<Element foo at 12bb650>, None)]
When I do the same thing with etree I get what was demonstrated
>>> myroot=etree.fromstring(xml)
>>> elementlines=[(e,e.sourceline) for e in myroot.iter()]
>>> elementlines
[(<Element doc at 36a6b70>, 1), (<Element foo at 277b4e0>, 2)]
But my source htm is so messy I can't use etree to explore the tree
I get an error
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
sourceline
将返回解析文档时确定的行号。因此它不适用于通过 API 添加的元素。例如:我很确定这同样适用于
lxml.html
。sourceline
will return the line number determined at the time of parsing a document. So it won't apply to an Element that was added through the API. For example:I'm pretty sure the same applies to
lxml.html
.