关闭Web驱动程序后,如何循环并收集数据
我试图能够参加比赛HREF,将其输入到URL中,然后在关闭该驾驶员之前循环循环,然后打开新的驾驶员来做同样的事情。由于某种原因,我无法弄清楚将开始URL放在哪里。
from selenium import webdriver
import time
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
PATH = "C:\Program Files (x86)\Chrome\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.maximize_window()
ytournaments = ['/dpworld-tour/abu-dhabi-hsbc-championship-2021/']
roundids = [1, 2, 3, 4]
for tournamentid in ytournaments:
for roundid in roundids:
page = driver.get(f"https://www.europeantour.com{ytournaments}leaderboard?holebyhole=true&round={roundid}")
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
Tour = 'European Tour'
Year = '2021'
tournamentm = soup.find('h1', class_='event-hero__title').text
tournament = tournamentm.strip()
coursem = soup.find('p', class_='event-hero__location').text
course = coursem.strip()
datem = soup.find('p', class_='event-hero__date').text
date = datem.strip()
dfs = pd.read_html(driver.page_source)
df = dfs[0]
ndf = np.squeeze(dfs)
data = pd.DataFrame(ndf)
data["tournament"] = tournament
data["course"] = course
data["date"] = date
data["roundid"] = roundid
data["Tour"] = Tour
data["Year"] = Year
filename = f'{tournament}_{roundid}_{Year}.csv'
data.to_csv(filename)
driver.quit()
driver.quit()
I am trying to be able to go to a tournament href, input that into the url then cycle through the rounds before closing that driver and opening a new one to do the same thing. For some reason i can't figure where to put the beginning url.
from selenium import webdriver
import time
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
PATH = "C:\Program Files (x86)\Chrome\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.maximize_window()
ytournaments = ['/dpworld-tour/abu-dhabi-hsbc-championship-2021/']
roundids = [1, 2, 3, 4]
for tournamentid in ytournaments:
for roundid in roundids:
page = driver.get(f"https://www.europeantour.com{ytournaments}leaderboard?holebyhole=true&round={roundid}")
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
Tour = 'European Tour'
Year = '2021'
tournamentm = soup.find('h1', class_='event-hero__title').text
tournament = tournamentm.strip()
coursem = soup.find('p', class_='event-hero__location').text
course = coursem.strip()
datem = soup.find('p', class_='event-hero__date').text
date = datem.strip()
dfs = pd.read_html(driver.page_source)
df = dfs[0]
ndf = np.squeeze(dfs)
data = pd.DataFrame(ndf)
data["tournament"] = tournament
data["course"] = course
data["date"] = date
data["roundid"] = roundid
data["Tour"] = Tour
data["Year"] = Year
filename = f'{tournament}_{roundid}_{Year}.csv'
data.to_csv(filename)
driver.quit()
driver.quit()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您不需要
.quit()
驱动程序
在每次迭代时,因为您正在更改其目标URL。主要问题应该是构造正确的
url
,因此更改{ytournaments}
:{tormentaindId}
You do not need to
.quit()
thedriver
with every iteration, cause you are changing its target url.Main issue should be to construct the correct
url
, so change{ytournaments}
:to
{tournamentid}