网络刮擦UL LI标签
我正在尝试刮擦UL& Capterra产品页面的LI标签。我想在单独的变量中获取并存储的信息是“位于'country”,“ url地址”和产品功能的信息。
目前,我只知道如何为UL&amp中的所有内容打印文本。李,不是具体的。
代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.firefox import GeckoDriverManager
import requests
driver = webdriver.Firefox()
driver.get("https://www.capterra.com/p/81310/AMCS/")
companyProfile = bs(driver.page_source, 'html.parser')
url = companyProfile.find("ul", class_="nb-type-md nb-list-undecorated undefined").text
features = companyProfile.find("div", class_="nb-col-count-1 sm:nb-col-count-2 md:nb-col-count-3 nb-col-gap-xl nb-my-0 nb-mx-auto").text
print(url)
print(features)
driver.close()
输出:
AMCSLocated in United StatesFounded in 2004http://www.amcsgroup.com/
Billing & InvoicingBrokerage ManagementBuy / Sell TicketingContainer ManagementCustomer AccountsCustomer DatabaseDispatch ManagementElectronics RecyclingEquipment TrackingFingerprint ScanningID ScanningIntegrated CamerasInventory ManagementInventory TrackingLogistics Management
如何仅获得URL和国家 /地区,如何整齐地获得功能?
我能够通过:
url = driver.find_element(By. XPATH, "//*[starts-with(., 'http')]").text
location = driver.find_element(By. XPATH, "//*[starts-with(., 'Located in')]").text
仍在寻找功能的解决方案来获取URL和位置。
I am trying to scrape the ul & li tags for capterra product pages. The information I want to get and store in separate variables is the "located in 'country," "the url address," and the product features.
Currently, I only know how to print the text for everything in the ul & li, not something specific.
Code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.firefox import GeckoDriverManager
import requests
driver = webdriver.Firefox()
driver.get("https://www.capterra.com/p/81310/AMCS/")
companyProfile = bs(driver.page_source, 'html.parser')
url = companyProfile.find("ul", class_="nb-type-md nb-list-undecorated undefined").text
features = companyProfile.find("div", class_="nb-col-count-1 sm:nb-col-count-2 md:nb-col-count-3 nb-col-gap-xl nb-my-0 nb-mx-auto").text
print(url)
print(features)
driver.close()
Output:
AMCSLocated in United StatesFounded in 2004http://www.amcsgroup.com/
Billing & InvoicingBrokerage ManagementBuy / Sell TicketingContainer ManagementCustomer AccountsCustomer DatabaseDispatch ManagementElectronics RecyclingEquipment TrackingFingerprint ScanningID ScanningIntegrated CamerasInventory ManagementInventory TrackingLogistics Management
How do I get only the url and the country, and how do I get the features neatly?
I was able to get the URL and the location by:
url = driver.find_element(By. XPATH, "//*[starts-with(., 'http')]").text
location = driver.find_element(By. XPATH, "//*[starts-with(., 'Located in')]").text
Still looking for a solution for the features.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
以下代码将与
ul> li
标签输出:
更新:
输出:
The following code will pull all the text nodes value separately from
ul > li
tagsOutput:
Update:
Output: