在scrapy中获取与给定页面中的href元素关联的文本
目前,我的 scrapy 蜘蛛中的“产量”如下所示:
yield {
'hreflink':mylink,
'Parentlink':response.url
}
这将返回一个字典
{
'hreflink':"https://www.southeasthealth.org/wp-content/uploads/Southeast-Health-Standard-Charges-2022.xlsx",
'Parentlink': "https://www.southeasthealth.org/financial-information-price-transparency/"
}
现在,我还想要与该特定的 hreflink 关联的“文本”,在该特定的 Parentlink 中。所以我的最终输出应该看起来像
{
'hreflink':"https://www.southeasthealth.org/wp-content/uploads/Southeast-Health-Standard-Charges-2022.xlsx",
'Parentlink': "https://www.southeasthealth.org/financial-information-price-transparency/",
'Yourtext' : "Download Pricing Info"
}
什么是实现这一目标的最简单方法。我想使用 Xpath 表达式来获取父链接中的“文本”,其中 href element = @href 。
到目前为止,这是我绑的 - Yourtext = response.xpath('//a[@href='+json.dumps(each)+']//text()').get() 但它不打印任何东西。我尝试打印我的回复,它返回正确的页面 - 'https://www.southeasthealth.org/financial-information-price-transparency/'
Currently my 'yield' in my scrapy spider looks as follows :
yield {
'hreflink':mylink,
'Parentlink':response.url
}
This returns me a dict
{
'hreflink':"https://www.southeasthealth.org/wp-content/uploads/Southeast-Health-Standard-Charges-2022.xlsx",
'Parentlink': "https://www.southeasthealth.org/financial-information-price-transparency/"
}
Now, I also want the 'text' that is associated with this particular hreflink, in that particular Parentlink. So my final output should look like
{
'hreflink':"https://www.southeasthealth.org/wp-content/uploads/Southeast-Health-Standard-Charges-2022.xlsx",
'Parentlink': "https://www.southeasthealth.org/financial-information-price-transparency/",
'Yourtext' : "Download Pricing Info"
}
What would be the simplest way to achieve that. I want to use Xpath expressions to get the "text" in a parentlink where href element = @href .
So far Here is what I tied -
Yourtext = response.xpath('//a[@href='+json.dumps(each)+']//text()').get()
but its not printing anything. I tried printing my response and it returns the right page - 'https://www.southeasthealth.org/financial-information-price-transparency/'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果我理解正确的话,您想要获取属于链接
下载定价信息
的文本。我建议您尝试使用:
If I understand you correctly you want to get the text belonging to the link
Download Pricing Info
.I suggest you try using:
我找到了我的问题的答案。
这是正确的表达式,但是 href 链接“each”区分大小写,并且需要完全匹配才能使该 Xpath 正常工作。
I found the answer to my question.
This is the correct expression however the href link 'each' is case sensitive and it needs to match exactly for this Xpath to work.