提取文本与正则表达式匹配的 URL - 使用 XPath 1.0
我想使用 Scrapy 中的 XPath 提取这种类型的 URL(链接文本是具有任意位数的数字,href 是随机文本)。
我可以想到类似的东西
HtmlXPathSelector(response).select('//a[matches(text(),"\d+")]/@href')
,但是似乎不支持 XPath 2.0,并且我无法使用正则表达式。
我可以搜索的最佳单行解决方案来自这个问题: xpath expression for regex-likematching? - 有更好的吗scrapy中的方式来实现这一点?
I would like to extract the URL of this type (link text is a number with any number of digits and href is a random text) using an XPath in Scrapy.
<a href="http://www.example.com/link_to_some_page.html>3</a>
<a href="http://www.example.com/another_link-abcd.html>45</a>
I could think of something like
HtmlXPathSelector(response).select('//a[matches(text(),"\d+")]/@href')
However it appears that XPath 2.0 isn't supported and I can't use regex.
The best single line solution I could search was from this question: xpath expression for regex-like matching? - Is there a better way in scrapy to achieve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)