硒和美丽的小组更改html元素内的数字

发布于 2025-02-08 01:19:33 字数 1378 浏览 1 评论 0 原文

我想提取每个团队从网站获得的黄牌数量

这是我的代码

driver = webdriver.Chrome(service=chrome_driver_path)
driver.get("https://www.premierleague.com/stats/top/clubs/total_yel_card?se=20")
cards = {}
for i in range(1, 21):
    path = '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table/tbody/tr[' + str(i) + ']/td[2]/a'
    name = driver.find_element(By.XPATH, path).text
    
    path_card = '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table/tbody/tr[' + str(i) + ']/td[3]'
    card = driver.find_element(By.XPATH, path_card).text
    
    cards[name] = card

，但是，卡的数量与html中的卡大不相同。结果

cards

    {'Chelsea': '1,800',
 'Everton': '1,778',
 'Arsenal': '1,739',
 'Tottenham Hotspur': '1,705',
 'Manchester United': '1,685',
 'West Ham United': '1,610',
 'Aston Villa': '1,572',
 'Newcastle United': '1,534',
 'Liverpool': '1,429',
 'Manchester City': '1,409',
 'Southampton': '1,298',
 'Blackburn Rovers': '1,111',
 'Sunderland': '1,095',
 'Middlesbrough': '973',
 'Leeds United': '952',
 'Leicester City': '885',
 'Bolton Wanderers': '845',
 'Fulham': '843',
 'Crystal Palace': '790',
 'West Bromwich Albion': '769'}

这是我确实遇到了这个问题的，但数字通常与货币有关。但是，这一次似乎没有任何需要将实际数字转换为其他数字。

原文

I want to extract the number of yellow cards each team got from the website https://www.premierleague.com/stats/top/clubs/total_yel_card?se=20

Here is my code

driver = webdriver.Chrome(service=chrome_driver_path)
driver.get("https://www.premierleague.com/stats/top/clubs/total_yel_card?se=20")
cards = {}
for i in range(1, 21):
    path = '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table/tbody/tr[' + str(i) + ']/td[2]/a'
    name = driver.find_element(By.XPATH, path).text
    
    path_card = '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table/tbody/tr[' + str(i) + ']/td[3]'
    card = driver.find_element(By.XPATH, path_card).text
    
    cards[name] = card

However, the number of cards is very different than what it is in the HTML. Here is the result

cards

    {'Chelsea': '1,800',
 'Everton': '1,778',
 'Arsenal': '1,739',
 'Tottenham Hotspur': '1,705',
 'Manchester United': '1,685',
 'West Ham United': '1,610',
 'Aston Villa': '1,572',
 'Newcastle United': '1,534',
 'Liverpool': '1,429',
 'Manchester City': '1,409',
 'Southampton': '1,298',
 'Blackburn Rovers': '1,111',
 'Sunderland': '1,095',
 'Middlesbrough': '973',
 'Leeds United': '952',
 'Leicester City': '885',
 'Bolton Wanderers': '845',
 'Fulham': '843',
 'Crystal Palace': '790',
 'West Bromwich Albion': '769'}

I did encounter this problem many times too, but the numbers were generally related to currencies. However, it does not seem that this time there is any need to convert actual numbers to other numbers.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

三生殊途 2025-02-15 01:19:33

我快速查看了该网站，看来您的程序正在删除“所有季节”数据。

如果您在打开时仔细查看网站，则在实际应用过滤器之前，您会发现该表首先显示“所有季节”数据。

通过在声明驱动程序和前面的开始之间添加 time.sleep（1），这是可以解决的。这创建了另一个问题，因为词典的后半部分返回了空。
我通过

这是我运行的代码：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome(service=chrome_driver_path)
driver.get("https://www.premierleague.com/stats/top/clubs/total_yel_card?se=20")

delay = 5 # seconds

try:
    cookieButton = WebDriverWait(driver, delay).until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[1]/div/div/div[1]/div[5]/button[1]')))
    cookieButton.click()
except TimeoutException:
    print("Loading took too much time!")

else:
    cards = {}
    for i in range(1, 21):
        path = '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table/tbody/tr[' + str(i) + ']/td[2]/a'
        name = driver.find_element(By.XPATH, path).text

        path_card = '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table/tbody/tr[' + str(i) + ']/td[3]'
        card = driver.find_element(By.XPATH, path_card).text

        cards[name] = card

    print(cards)

这就是结果：

{ 'Chelsea': '74', 
'Aston Villa': '70', 
'Newcastle United': '67', 
'Wigan Athletic': '67', 
'Blackburn Rovers': '66', 
'Arsenal': '64', 
'Wolverhampton Wanderers': '64', 
'Everton': '60', 
'Stoke City': '60', 
'Sunderland': '60', 
'Norwich City': '58', 
'Fulham': '54', 
'Queens Park Rangers': '54', 
'Liverpool': '53', 
'Manchester City': '51', 
'Manchester United': '51', 
'Bolton Wanderers': '50', 
'West Bromwich Albion': '48', 
'Tottenham Hotspur': '43', 
'Swansea City': '40' }

希望这有帮助！

I took a quick look at the website, and it seems that your program is pulling the "All Seasons" data.

If you take a good look at the website while opening, you see that the table first shows the "All Seasons" data, before actually applying the filter.

This was solvable with adding a time.sleep(1) between declaring the driver and the start of your for-loop. This created another issue, since the second half of the dictionary returned empty.
I solved this by explicitly waiting for the "accept all cookies" button to appear, to then click it.

Here's the code I ran:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome(service=chrome_driver_path)
driver.get("https://www.premierleague.com/stats/top/clubs/total_yel_card?se=20")

delay = 5 # seconds

try:
    cookieButton = WebDriverWait(driver, delay).until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[1]/div/div/div[1]/div[5]/button[1]')))
    cookieButton.click()
except TimeoutException:
    print("Loading took too much time!")

else:
    cards = {}
    for i in range(1, 21):
        path = '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table/tbody/tr[' + str(i) + ']/td[2]/a'
        name = driver.find_element(By.XPATH, path).text

        path_card = '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table/tbody/tr[' + str(i) + ']/td[3]'
        card = driver.find_element(By.XPATH, path_card).text

        cards[name] = card

    print(cards)

Here is the result:

{ 'Chelsea': '74', 
'Aston Villa': '70', 
'Newcastle United': '67', 
'Wigan Athletic': '67', 
'Blackburn Rovers': '66', 
'Arsenal': '64', 
'Wolverhampton Wanderers': '64', 
'Everton': '60', 
'Stoke City': '60', 
'Sunderland': '60', 
'Norwich City': '58', 
'Fulham': '54', 
'Queens Park Rangers': '54', 
'Liverpool': '53', 
'Manchester City': '51', 
'Manchester United': '51', 
'Bolton Wanderers': '50', 
'West Bromwich Albion': '48', 
'Tottenham Hotspur': '43', 
'Swansea City': '40' }

Hope this helps!

回复收藏 0 原文

~没有更多了~