关闭Web驱动程序后,如何循环并收集数据

发布于 2025-02-12 06:09:29 字数 1615 浏览 1 评论 0原文

我试图能够参加比赛HREF,将其输入到URL中,然后在关闭该驾驶员之前循环循环,然后打开新的驾驶员来做同样的事情。由于某种原因,我无法弄清楚将开始URL放在哪里。

from selenium import webdriver
import time
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup


PATH = "C:\Program Files (x86)\Chrome\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.maximize_window()

ytournaments = ['/dpworld-tour/abu-dhabi-hsbc-championship-2021/']

roundids = [1, 2, 3, 4]

for tournamentid in ytournaments:
    
    for roundid in roundids:
        
        page = driver.get(f"https://www.europeantour.com{ytournaments}leaderboard?holebyhole=true&round={roundid}")
        time.sleep(5)
        html = driver.page_source
        soup = BeautifulSoup(html, 'lxml')
        
        Tour = 'European Tour'
        Year = '2021'
        
        tournamentm = soup.find('h1', class_='event-hero__title').text
        tournament = tournamentm.strip()
        
        coursem = soup.find('p', class_='event-hero__location').text
        course = coursem.strip()
        
        datem = soup.find('p', class_='event-hero__date').text
        date = datem.strip()
        
        dfs = pd.read_html(driver.page_source)
        df = dfs[0]
        ndf = np.squeeze(dfs)
        data = pd.DataFrame(ndf)
        
        data["tournament"] = tournament
        data["course"] = course
        data["date"] = date
        data["roundid"] = roundid
        data["Tour"] = Tour
        data["Year"] = Year
        
        filename = f'{tournament}_{roundid}_{Year}.csv'
        data.to_csv(filename)

    driver.quit()
    
driver.quit()

I am trying to be able to go to a tournament href, input that into the url then cycle through the rounds before closing that driver and opening a new one to do the same thing. For some reason i can't figure where to put the beginning url.

from selenium import webdriver
import time
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup


PATH = "C:\Program Files (x86)\Chrome\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.maximize_window()

ytournaments = ['/dpworld-tour/abu-dhabi-hsbc-championship-2021/']

roundids = [1, 2, 3, 4]

for tournamentid in ytournaments:
    
    for roundid in roundids:
        
        page = driver.get(f"https://www.europeantour.com{ytournaments}leaderboard?holebyhole=true&round={roundid}")
        time.sleep(5)
        html = driver.page_source
        soup = BeautifulSoup(html, 'lxml')
        
        Tour = 'European Tour'
        Year = '2021'
        
        tournamentm = soup.find('h1', class_='event-hero__title').text
        tournament = tournamentm.strip()
        
        coursem = soup.find('p', class_='event-hero__location').text
        course = coursem.strip()
        
        datem = soup.find('p', class_='event-hero__date').text
        date = datem.strip()
        
        dfs = pd.read_html(driver.page_source)
        df = dfs[0]
        ndf = np.squeeze(dfs)
        data = pd.DataFrame(ndf)
        
        data["tournament"] = tournament
        data["course"] = course
        data["date"] = date
        data["roundid"] = roundid
        data["Tour"] = Tour
        data["Year"] = Year
        
        filename = f'{tournament}_{roundid}_{Year}.csv'
        data.to_csv(filename)

    driver.quit()
    
driver.quit()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

谜兔 2025-02-19 06:09:31

您不需要.quit() 驱动程序在每次迭代时,因为您正在更改其目标URL。

主要问题应该是构造正确的url,因此更改{ytournaments}

page = driver.get(f"https://www.europeantour.com{ytournaments}leaderboard?holebyhole=true&round={roundid}")

{tormentaindId}

page = driver.get(f"https://www.europeantour.com{tournamentid}leaderboard?holebyhole=true&round={roundid}")

You do not need to .quit() the driver with every iteration, cause you are changing its target url.

Main issue should be to construct the correct url, so change {ytournaments}:

page = driver.get(f"https://www.europeantour.com{ytournaments}leaderboard?holebyhole=true&round={roundid}")

to {tournamentid}

page = driver.get(f"https://www.europeantour.com{tournamentid}leaderboard?holebyhole=true&round={roundid}")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文