如何使用 CSS 选择器 (Scrapy) 从包含特定文本的类中获取 href
我正在使用以下网站: https://inmuebles.mercadolibre.com.mx/venta/ ,我正在尝试从“Inmueble”部分(红色)的“ver_todos”按钮获取链接。但是,访问该网站时,“Tour virtual”和“Publicados hoy”部分(蓝色)可能会出现,也可能不会出现。
图所示,类 ui-search-filter-dl
包含上图中菜单中的特定部分;而 ui-search-filter-container
类包含网站显示的子部分(例如 Inmueble 的 Casas、Departamento 和 Terrenos)。为了从“Inmueble”部分的“ver todos”按钮获取链接,我使用了这行代码:
ver_todos = response.css('div.ui-search-filter-dl')[2].css('a.ui-search-modal__link').attrib['href']
但由于“Tour virtual”和“Publicados hoy”并不总是在页面中,我无法确定索引 2 处的 ui-search-filter-dl 始终是与“ver todos”按钮对应的索引。
我试图使用这行代码从“ver todos”获取链接:
response.css(''':contains("Inmueble") ~ .ui-search-filter-dt-title
.ui-search-modal__link::attr(href)''').extract()
基本上,我试图从 ui-search-filter-dt-title 获取 href类其中包含标题“Inmueble”。不幸的是,输出是一个空列表。我想通过使用 css 和正则表达式找到“ver todos”的链接,但我遇到了问题。我怎样才能做到这一点?
I am working with the following web site: https://inmuebles.mercadolibre.com.mx/venta/, and I am trying to get the link from "ver_todos" button from "Inmueble" section (in red). However, the "Tour virtual" and "Publicados hoy" sections (in blue) may or may not appear when visiting the site.
As shown in the image below, the classes ui-search-filter-dl
contain the specific sections from the menu from above image; while ui-search-filter-container
classes contain the sub-sections displayed by the site (e.g. Casas, Departamento & Terrenos for Inmueble). With the intention of obtaining the link from "ver todos" button from "Inmueble" section, I was using this line of code:
ver_todos = response.css('div.ui-search-filter-dl')[2].css('a.ui-search-modal__link').attrib['href']
But since "Tour virtual" and "Publicados hoy" are not always in the page, I cannot be sure that ui-search-filter-dl
at index 2 is always the index corresponding to "ver todos" button.
I was trying to get the link from "ver todos" by using this line of code:
response.css(''':contains("Inmueble") ~ .ui-search-filter-dt-title
.ui-search-modal__link::attr(href)''').extract()
Basically, I was trying to get the href from a ui-search-filter-dt-title
class that contains the title "Inmueble". Unfortunately, the output is an empty list. I would like to find the link from "ver todos" by using css and regex but I'm having trouble with it. How may I achieve that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为在大多数情况下xpath更容易选择目标元素:
代码:
实际上,我没有创建一个scrapy项目来检查你的代码。或者,我实现了以下代码:
由于scrapy和lxml之间的xpath应该是相同的,当然,我希望开头显示的代码也能在您的scrapy项目中正常工作。
I think xpath is easier to select the target elements in most cases:
Code:
Actually, I didn't create a scrapy project to check your code. Alternatively, I implemented the following code:
Since the xpath should be the same among scrapy and lxml, of course, I hope the code shown in the beginning will also work fine in your scrapy project.
一种简单的方法是获取所有链接
,然后检查其任何文本是否与
ver todos
匹配。An easy way you could do it is by getting all the link
<a>
and then checking if any of their text matchesver todos
.