如何选择 selenium 对象内的 html 属性

发布于 2025-01-09 03:25:29 字数 656 浏览 1 评论 0原文

我正在使用硒学习网络抓取,并且在尝试选择硒对象内部的属性时遇到了问题。如果我只是在循环内打印 elems.text (这会输出每个列表的整个段落),我可以获得更广泛的数据,但是当我尝试访问此更广泛元素内所有列表的 h2 标题标签的 xpath 时,它只能将第一个列表附加到标题数组中,而我想要所有列表。我检查了 XPATH,每个列表的它们都是相同的。如何获取所有列表而不仅仅是第一个列表?

titles = []
driver.get("https://www.sellmytimesharenow.com/timeshare/All+Timeshare/vacation/buy-timeshare/")

results = driver.find_elements(By.CLASS_NAME, "results-list")

for elems in results:
    print(elems.text) #this prints out full description paragraphs
    elem_title = elems.find_element(By.XPATH, '//*[@id="search-page"]/div[3]/div/div/div[2]/div/div[2]/div/a[1]/div/div[1]/div/h2')
    titles.append(elem_title.text)

I am learning web scraping using selenium and I've come into an issue when trying to select an attribute inside of a selenium object. I can get the broader data if I just print elems.text inside the loop (this outputs the whole paragraph for each listing) however when I try to access the xpath of the h2 title tag of all the listings inside this broader element, it only appends the first listing to the titles array, whereas I want all of them. I checked the XPATH and they are the same for each listing. How can I get all of the listings instead of just the first one?

titles = []
driver.get("https://www.sellmytimesharenow.com/timeshare/All+Timeshare/vacation/buy-timeshare/")

results = driver.find_elements(By.CLASS_NAME, "results-list")

for elems in results:
    print(elems.text) #this prints out full description paragraphs
    elem_title = elems.find_element(By.XPATH, '//*[@id="search-page"]/div[3]/div/div/div[2]/div/div[2]/div/a[1]/div/div[1]/div/h2')
    titles.append(elem_title.text)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

北风几吹夏 2025-01-16 03:25:29

如果您不限于仅通过 XPATH 访问元素,那么这是我的解决方案:

results = driver.find_elements(By.CLASS_NAME, "result-box")
for elems in results:
    titles.append(elems.text.split("\n")[0])

当您尝试获取列表时,您可以使用 find_elements(By.CLASS_NAME, "results-list"),但是在网站上,只有一个类名为“results-list”的元素。这会将这个 div 中的所有文本聚合成一个长字符串,因此您无法获取标题。

但是,有多个类名为 "result-box" 的元素,因此 find_elements 会将每个元素作为其自己的项目存储在 "results" 中。由于每个列表的标题位于第一行,因此您可以通过换行符分割每个元素的文本。

If you aren't limited to accessing the elements by XPATH only, then here is my solution:

results = driver.find_elements(By.CLASS_NAME, "result-box")
for elems in results:
    titles.append(elems.text.split("\n")[0])

When you try getting the listings, you use find_elements(By.CLASS_NAME, "results-list"), but on the website, there is only one element with the class name "results-list". This aggregates all the text in this div into one long string and therefore you can't get the heading.

However, there are multiple elements with the class name "result-box", so find_elements will store each as its own item in "results". Because the title of each listing is on the first line, you can split the text of each element by the newline.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文