如何选择 selenium 对象内的 html 属性
我正在使用硒学习网络抓取,并且在尝试选择硒对象内部的属性时遇到了问题。如果我只是在循环内打印 elems.text (这会输出每个列表的整个段落),我可以获得更广泛的数据,但是当我尝试访问此更广泛元素内所有列表的 h2 标题标签的 xpath 时,它只能将第一个列表附加到标题数组中,而我想要所有列表。我检查了 XPATH,每个列表的它们都是相同的。如何获取所有列表而不仅仅是第一个列表?
titles = []
driver.get("https://www.sellmytimesharenow.com/timeshare/All+Timeshare/vacation/buy-timeshare/")
results = driver.find_elements(By.CLASS_NAME, "results-list")
for elems in results:
print(elems.text) #this prints out full description paragraphs
elem_title = elems.find_element(By.XPATH, '//*[@id="search-page"]/div[3]/div/div/div[2]/div/div[2]/div/a[1]/div/div[1]/div/h2')
titles.append(elem_title.text)
I am learning web scraping using selenium and I've come into an issue when trying to select an attribute inside of a selenium object. I can get the broader data if I just print elems.text inside the loop (this outputs the whole paragraph for each listing) however when I try to access the xpath of the h2 title tag of all the listings inside this broader element, it only appends the first listing to the titles array, whereas I want all of them. I checked the XPATH and they are the same for each listing. How can I get all of the listings instead of just the first one?
titles = []
driver.get("https://www.sellmytimesharenow.com/timeshare/All+Timeshare/vacation/buy-timeshare/")
results = driver.find_elements(By.CLASS_NAME, "results-list")
for elems in results:
print(elems.text) #this prints out full description paragraphs
elem_title = elems.find_element(By.XPATH, '//*[@id="search-page"]/div[3]/div/div/div[2]/div/div[2]/div/a[1]/div/div[1]/div/h2')
titles.append(elem_title.text)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您不限于仅通过 XPATH 访问元素,那么这是我的解决方案:
当您尝试获取列表时,您可以使用
find_elements(By.CLASS_NAME, "results-list")
,但是在网站上,只有一个类名为“results-list”的元素。这会将这个 div 中的所有文本聚合成一个长字符串,因此您无法获取标题。但是,有多个类名为
"result-box"
的元素,因此find_elements
会将每个元素作为其自己的项目存储在"results"
中。由于每个列表的标题位于第一行,因此您可以通过换行符分割每个元素的文本。If you aren't limited to accessing the elements by XPATH only, then here is my solution:
When you try getting the listings, you use
find_elements(By.CLASS_NAME, "results-list")
, but on the website, there is only one element with the class name"results-list"
. This aggregates all the text in this div into one long string and therefore you can't get the heading.However, there are multiple elements with the class name
"result-box"
, sofind_elements
will store each as its own item in"results"
. Because the title of each listing is on the first line, you can split the text of each element by the newline.