在Python中使用lxml获取的标题属性
我想使用 Python 从 this 网站提取 oneel-iner-texts 。 HTML 中的消息如下所示:
<div class="olh_message">
<p>foobarbaz <img src="/static/emoticons/support-our-fruits.gif" title=":necta:" /></p>
</div>
到目前为止,我的代码如下所示:
import lxml.html
url = "http://www.scenemusic.net/demovibes/oneliner/"
xpath = "//div[@class='olh_message']/p"
tree = lxml.html.parse(url)
texts = tree.xpath(xpath)
texts = [text.text_content() for text in texts]
print(texts)
现在,但是,我只得到 foobarbaz,但是我还想获取其中的 img 的标题参数,所以在这个例子中foobarbaz :necta:
。看来我需要 lxml 的 DOM 解析器来做到这一点,但我不知道如何做。任何人都可以给我提示吗?
提前致谢!
I want to extract the onel-iner-texts from this website using Python. The messages in HTML look like this:
<div class="olh_message">
<p>foobarbaz <img src="/static/emoticons/support-our-fruits.gif" title=":necta:" /></p>
</div>
My code looks like this so far:
import lxml.html
url = "http://www.scenemusic.net/demovibes/oneliner/"
xpath = "//div[@class='olh_message']/p"
tree = lxml.html.parse(url)
texts = tree.xpath(xpath)
texts = [text.text_content() for text in texts]
print(texts)
Now, however, I only get foobarbaz
, I however would like to get the title-argument of the img's in it as well, so in this example foobarbaz :necta:
. It seems I need lxml's DOM parser to do it, however I have no idea how. Anyone can give me a hint?
Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
试试这个
try this
使用:
他选择任何
p
元素的所有子节点(元素、文本节点、PI 和注释节点),该元素是任何div 元素,其
class
属性为'olh_message'
。使用 XSLT 作为 XPath 宿主进行验证:
当此转换应用于以下 XML 文档时:
生成所需的正确结果(显示XPath 表达式已经选择了所需的节点):
Use:
his selects all children nodes (elements, text-nodes, PIs and comment-nodes) of any
p
element that is a child of anydiv
element, whoseclass
attribute is'olh_message'
.Verification using XSLT as the host of XPath:
when this transformation is applied on the following XML document:
the wanted, correct result is produced (showing that exactly the wanted nodes have been selected by the XPath expression):