美丽的汤问题 - 可以找到正确的问题
我遇到美丽的汤遇到麻烦。我正在尝试刮擦皮划艇,但是当我打印find_all的长度时,它正在返回0。我也将硒与美丽的汤一起使用。
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
from bs4 import BeautifulSoup
from webdriver_manager.chrome import ChromeDriverManager
## Kayak URL
origin = "PIT"
destination = "ARN"
startdate = "2022-12-18"
url = "https://www.kayak.com/flights/" + origin + "-" + destination + "/" +\
startdate + "?sort=price_a "
## Setting Up Webdriver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)
driver.implicitly_wait(40)
driver.get(url)
soup = BeautifulSoup(driver.page_source, "lxml")
print(len(soup.find_all("span", attrs={'class': 'depart-time base-time'})))
deptimes = soup.find_all("span", attrs={'class': 'depart-time base-time'})
arrtimes = soup.find_all('span', attrs={'class': 'arrival-time base-time'})
meridies = soup.find_all('span', attrs={'class': 'time-meridiem meridiem'})
这就是我试图从皮划艇网站上拿走的东西。
<span class="depart-time base-time">12:45 </span>
I am having trouble with Beautiful Soup. I am trying to scrape Kayak, but when I print the length of the find_all it is returning 0. I am using selenium in conjunction with Beautiful Soup as well.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
from bs4 import BeautifulSoup
from webdriver_manager.chrome import ChromeDriverManager
## Kayak URL
origin = "PIT"
destination = "ARN"
startdate = "2022-12-18"
url = "https://www.kayak.com/flights/" + origin + "-" + destination + "/" +\
startdate + "?sort=price_a "
## Setting Up Webdriver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)
driver.implicitly_wait(40)
driver.get(url)
soup = BeautifulSoup(driver.page_source, "lxml")
print(len(soup.find_all("span", attrs={'class': 'depart-time base-time'})))
deptimes = soup.find_all("span", attrs={'class': 'depart-time base-time'})
arrtimes = soup.find_all('span', attrs={'class': 'arrival-time base-time'})
meridies = soup.find_all('span', attrs={'class': 'time-meridiem meridiem'})
This is what I am trying to take from the kayak website.
<span class="depart-time base-time">12:45 </span>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用剧作家 python,您可以做这样的事情,与硒相似。
您需要选择每个框,然后迭代每个元素提取所需的数据。
在这里您有一个示例:
输出:

在这里您有 playwright 文档
我希望我能够为您提供帮助。
With playwright python you can do something like this, it will be very similar with selenium.
You need to select each box of flight and after iterate each element extracting data that you want.
Here you have an example:
OUTPUT:

Here you have playwright documentation.
I hope i was able to help you.