网页抓取英格兰曲棍球 Python BeautifulSoup

发布于 2025-01-11 14:38:51 字数 1019 浏览 2 评论 0原文

我试图通过 BeautifulSoup 获取此链接中找到的表格： https://gms.englandhockey.co.uk/fixtures-and-results/competitions.php?comp=4154007

这是一个英格兰曲棍球网站，基本上我想下载表格并将其放入 DataFrame 中，并最终获得赛程表。

每当我尝试找到正确的 div 或 table 时，它都会返回 None。

这是我尝试过的：

url = "https://gms.englandhockey.co.uk/fixtures-and-results/club.php?id=Royal%20Holloway%20HC&prev=4153800"
page = requests.get(url)

soup = BeautifulSoup(page.text, "html.parser")

我尝试查找表所在的 div，但它返回 None。

bread_crumbs = soup.find("div", class_="container")
print(bread_crumbs

我再次尝试查找表，但它返回None。

bread_crumbs = soup.find("table")
print(bread_crumbs)

如果有人能建议一种访问该表的方法，我将不胜感激！ Selenium 可能会更好，但我还没有使用过 Selenium，所以我不确定它会如何开始。

正如您从链接中看到的，这是一个 php 网站，那么这可能是部分原因吗？

原文

I am trying to BeautifulSoup to get the table found in this link: https://gms.englandhockey.co.uk/fixtures-and-results/competitions.php?comp=4154007

It's an England Hockey website and basically I want to download the table and put it in a DataFrame, and also eventually get the fixtures as well.

Whenever I try and find the right div or table, it returns None.

Here's what I have tried:

url = "https://gms.englandhockey.co.uk/fixtures-and-results/club.php?id=Royal%20Holloway%20HC&prev=4153800"
page = requests.get(url)

soup = BeautifulSoup(page.text, "html.parser")

I have tried to find the div the table is within, but it returns None.

bread_crumbs = soup.find("div", class_="container")
print(bread_crumbs

Again, I try to find the table but it returns None.

bread_crumbs = soup.find("table")
print(bread_crumbs)

If anyone can suggest a way to access the table, I would be grateful! It might be that Selenium would be better for this, but I haven't used Selenium yet so I am not sure how it would start.

As you can see from the link, it's a php website, so could this be part of the reason?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

只怪假的太真实 2025-01-18 14:38:51

因为要访问此网站，您必须同意使用 cookie，并接受其条款和条件，

用以下代码替换请求并重试

import requests
from bs4 import BeautifulSoup

url = "https://gms.englandhockey.co.uk/fixtures-and-results/club.php?id=Royal%20Holloway%20HC&prev=4153800"

headers = {
  'Cookie': 'visitor-id=vPF0YU5Q; visitor-id-2=bQJHxVCjcBs4Qlmoy72Wzw%3D%3D; ImportantCookie=0; consentCookie=1; ImportantCookie=0; visitor-id=vPF0YUxZ; visitor-id-2=ZrPCssshUkv7rwB6MVkM2A%3D%3D'
}

page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.text, "html.parser")

bread_crumbs = soup.find("table")
print(bread_crumbs)

Because to access this site you must agree to the use of cookies, and accept their terms and condition

replace request with below code and try again

import requests
from bs4 import BeautifulSoup

url = "https://gms.englandhockey.co.uk/fixtures-and-results/club.php?id=Royal%20Holloway%20HC&prev=4153800"

headers = {
  'Cookie': 'visitor-id=vPF0YU5Q; visitor-id-2=bQJHxVCjcBs4Qlmoy72Wzw%3D%3D; ImportantCookie=0; consentCookie=1; ImportantCookie=0; visitor-id=vPF0YUxZ; visitor-id-2=ZrPCssshUkv7rwB6MVkM2A%3D%3D'
}

page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.text, "html.parser")

bread_crumbs = soup.find("table")
print(bread_crumbs)

回复收藏 0 原文

成熟稳重的好男人 2025-01-18 14:38:51

您可以让 Selenium 渲染页面然后拉取，或者需要在标头中使用 Cookie。

两种解决方案都有效。我的不同之处在于它使用会话来创建 cookie，而不是对其进行硬编码。

代码：

import pandas as pd
import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}

# Crerate the Session and acces the site to aquire the cookies
s = requests.Session()
response = s.get('https://gms.englandhockey.co.uk/fixtures-and-results/', headers=headers)

# Create the cookie string to be used in the ehaders
cookies = response.cookies.get_dict()
cookieStr = 'consentCookie=1;'
for k, v in cookies.items():
    cookieStr += f'{k}={v};'

headers.update({'Cookie':cookieStr})

# Go get the tables
url = 'https://gms.englandhockey.co.uk/fixtures-and-results/club.php?id=Royal%20Holloway%20HC&prev=4153800'
response = s.get(url, headers=headers)
dfs = pd.read_html(response.text)

for df in dfs:
    print(df)

输出：

for df in dfs:
    print(df.to_string())
     Date   Time                         Competition               Home Team             Away Team                         Venue  Officials
0  12-Mar  12:30  South East Women's Division 2 Oaks    Royal Holloway 1 (F)      Addiscombe 1 (F)                           NaN        NaN
1     NaN  12:30    South East Men's Division 6 Oaks  Fleet And Ewshot 4 (M)  Royal Holloway 1 (M)  Army Hockey Centre - Pitch 2        NaN
2  19-Mar  10:30  South East Women's Division 2 Oaks             Oxted 2 (F)  Royal Holloway 1 (F)                  Oxted School        NaN
3     NaN  12:30    South East Men's Division 6 Oaks    Royal Holloway 1 (M)          Woking 6 (M)                           NaN        NaN
4  26-Mar  12:30  South East Women's Division 2 Oaks    Royal Holloway 1 (F)           Epsom 4 (F)                           NaN        NaN
5  02-Apr    NaN  South East Women's Division 2 Oaks         Guildford 4 (F)  Royal Holloway 1 (F)                           NaN        NaN
6     NaN  12:30    South East Men's Division 6 Oaks    Royal Holloway 1 (M)       Cranleigh 2 (M)                           NaN        NaN
      Date   Time                         Competition                               Home Team   Score                               Away Team                               Venue
0   05-Mar  14:15  South East Women's Division 2 Oaks             Aldershot And Farnham 3 (F)   1 : 0                    Royal Holloway 1 (F)                           Heath End
1      NaN  12:30    South East Men's Division 6 Oaks                    Royal Holloway 1 (M)   2 : 2                             Epsom 7 (M)                                 NaN
2   26-Feb  12:30  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   1 : 7                            Woking 4 (F)                                 NaN
3      NaN  10:00    South East Men's Division 6 Oaks                            Kenley 3 (M)   0 : 7                    Royal Holloway 1 (M)  Warlingham County Secondary School
4   12-Feb  12:30  South East Women's Division 2 Oaks                    Reigate Priory 3 (F)     NaN                    Royal Holloway 1 (F)                     St Bedes School
5      NaN  12:30    South East Men's Division 6 Oaks                    Royal Holloway 1 (M)   8 : 1                    Reigate Priory 6 (M)                                 NaN
6   05-Feb  12:30  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)     NaN                            Horley 1 (F)                                 NaN
7      NaN  10:30    South East Men's Division 6 Oaks                            Horley 3 (M)   0 : 3                    Royal Holloway 1 (M)    Hurstpierpoint College - Pitch 2
8   29-Jan    NaN  South East Women's Division 2 Oaks                      Kenley 1 (F) (HWO)   5 : 0                    Royal Holloway 1 (F)                                 NaN
9      NaN  12:30    South East Men's Division 6 Oaks              Royal Holloway 1 (M) (HWO)   5 : 0                     Old Reigatian 3 (M)                                 NaN
10  22-Jan  10:00  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   0 : 4                         Cranleigh 1 (F)          Cranleigh School - Pitch 1
11     NaN  15:30    South East Men's Division 6 Oaks  Croydon & Old Whitgiftian Rustlers (M)   1 : 4                    Royal Holloway 1 (M)            Monks Hill Sports Centre
12  15-Jan  11:00  South East Women's Division 2 Oaks         Ashford (Middlesex) 1 (F) (HWO)   5 : 0                    Royal Holloway 1 (F)         Ashford (Middx) Hockey Club
13     NaN  12:30    South East Men's Division 6 Oaks                    Royal Holloway 1 (M)   2 : 0                    Sanderstead Cats (M)                                 NaN
14  11-Dec    NaN  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   0 : 7                         Guildford 4 (F)                                 NaN
15     NaN  10:00    South East Men's Division 6 Oaks                         Cranleigh 2 (M)   1 : 1                    Royal Holloway 1 (M)          Cranleigh School - Pitch 1
16  04-Dec  09:45  South East Women's Division 2 Oaks                             Epsom 4 (F)   3 : 1                    Royal Holloway 1 (F)         Epsom HC - Old Schools Lane
17  27-Nov  12:30  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   1 : 3                             Oxted 2 (F)                                 NaN
18     NaN  11:00    South East Men's Division 6 Oaks                            Woking 6 (M)   2 : 4                    Royal Holloway 1 (M)                 Woking HC - Pitch 2
19  20-Nov  15:30  South East Women's Division 2 Oaks                        Addiscombe 1 (F)   1 : 0                    Royal Holloway 1 (F)                   Woldingham School
20     NaN  12:30    South East Men's Division 6 Oaks                    Royal Holloway 1 (M)   0 : 1                  Fleet And Ewshot 4 (M)                                 NaN
21  13-Nov  12:30  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   1 : 2             Aldershot And Farnham 3 (F)                                 NaN
22     NaN  15:30    South East Men's Division 6 Oaks                             Epsom 7 (M)   0 : 2                    Royal Holloway 1 (M)                    Therfield School
23  06-Nov  12:30  South East Women's Division 2 Oaks                      Woking 4 (F) (HWO)   5 : 0                    Royal Holloway 1 (F)                 Woking HC - Pitch 2
24     NaN  12:30    South East Men's Division 6 Oaks              Royal Holloway 1 (M) (HWO)   5 : 0                            Kenley 3 (M)                                 NaN
25  30-Oct  12:30  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   1 : 7                    Reigate Priory 3 (F)                                 NaN
26     NaN  14:00    South East Men's Division 6 Oaks                    Reigate Priory 6 (M)   0 : 6                    Royal Holloway 1 (M)                     St Bedes School
27  16-Oct  12:00  South East Women's Division 2 Oaks                      Horley 1 (F) (HWO)   5 : 0                    Royal Holloway 1 (F)                        Worth School
28     NaN  12:30    South East Men's Division 6 Oaks                    Royal Holloway 1 (M)   5 : 0                            Horley 3 (M)                                 NaN
29  09-Oct  15:00  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   3 : 4                            Kenley 1 (F)                                 NaN
30     NaN  13:00    South East Men's Division 6 Oaks                     Old Reigatian 3 (M)   0 : 4                    Royal Holloway 1 (M)     Royal Alexandra & Albert School
31  02-Oct  14:30  South East Women's Division 2 Oaks                         Cranleigh 1 (F)   5 : 0                    Royal Holloway 1 (F)                                 NaN
32     NaN  12:30    South East Men's Division 6 Oaks                    Royal Holloway 1 (M)   2 : 3  Croydon & Old Whitgiftian Rustlers (M)                                 NaN
33  25-Sep    NaN  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)  1 : 17               Ashford (Middlesex) 1 (F)                                 NaN
34     NaN  14:30    South East Men's Division 6 Oaks                    Sanderstead Cats (M)   1 : 1                    Royal Holloway 1 (M)                     Caterham School

You could let Selenium render the page then pull, or need to use the Cookie in the headers.

Both solutions provided work. The difference in mine is it utilizes a Session to create the cookie, as opposed to hard coding it.

Code:

import pandas as pd
import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}

# Crerate the Session and acces the site to aquire the cookies
s = requests.Session()
response = s.get('https://gms.englandhockey.co.uk/fixtures-and-results/', headers=headers)

# Create the cookie string to be used in the ehaders
cookies = response.cookies.get_dict()
cookieStr = 'consentCookie=1;'
for k, v in cookies.items():
    cookieStr += f'{k}={v};'

headers.update({'Cookie':cookieStr})

# Go get the tables
url = 'https://gms.englandhockey.co.uk/fixtures-and-results/club.php?id=Royal%20Holloway%20HC&prev=4153800'
response = s.get(url, headers=headers)
dfs = pd.read_html(response.text)

for df in dfs:
    print(df)

Output:

for df in dfs:
    print(df.to_string())
     Date   Time                         Competition               Home Team             Away Team                         Venue  Officials
0  12-Mar  12:30  South East Women's Division 2 Oaks    Royal Holloway 1 (F)      Addiscombe 1 (F)                           NaN        NaN
1     NaN  12:30    South East Men's Division 6 Oaks  Fleet And Ewshot 4 (M)  Royal Holloway 1 (M)  Army Hockey Centre - Pitch 2        NaN
2  19-Mar  10:30  South East Women's Division 2 Oaks             Oxted 2 (F)  Royal Holloway 1 (F)                  Oxted School        NaN
3     NaN  12:30    South East Men's Division 6 Oaks    Royal Holloway 1 (M)          Woking 6 (M)                           NaN        NaN
4  26-Mar  12:30  South East Women's Division 2 Oaks    Royal Holloway 1 (F)           Epsom 4 (F)                           NaN        NaN
5  02-Apr    NaN  South East Women's Division 2 Oaks         Guildford 4 (F)  Royal Holloway 1 (F)                           NaN        NaN
6     NaN  12:30    South East Men's Division 6 Oaks    Royal Holloway 1 (M)       Cranleigh 2 (M)                           NaN        NaN
      Date   Time                         Competition                               Home Team   Score                               Away Team                               Venue
0   05-Mar  14:15  South East Women's Division 2 Oaks             Aldershot And Farnham 3 (F)   1 : 0                    Royal Holloway 1 (F)                           Heath End
1      NaN  12:30    South East Men's Division 6 Oaks                    Royal Holloway 1 (M)   2 : 2                             Epsom 7 (M)                                 NaN
2   26-Feb  12:30  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   1 : 7                            Woking 4 (F)                                 NaN
3      NaN  10:00    South East Men's Division 6 Oaks                            Kenley 3 (M)   0 : 7                    Royal Holloway 1 (M)  Warlingham County Secondary School
4   12-Feb  12:30  South East Women's Division 2 Oaks                    Reigate Priory 3 (F)     NaN                    Royal Holloway 1 (F)                     St Bedes School
5      NaN  12:30    South East Men's Division 6 Oaks                    Royal Holloway 1 (M)   8 : 1                    Reigate Priory 6 (M)                                 NaN
6   05-Feb  12:30  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)     NaN                            Horley 1 (F)                                 NaN
7      NaN  10:30    South East Men's Division 6 Oaks                            Horley 3 (M)   0 : 3                    Royal Holloway 1 (M)    Hurstpierpoint College - Pitch 2
8   29-Jan    NaN  South East Women's Division 2 Oaks                      Kenley 1 (F) (HWO)   5 : 0                    Royal Holloway 1 (F)                                 NaN
9      NaN  12:30    South East Men's Division 6 Oaks              Royal Holloway 1 (M) (HWO)   5 : 0                     Old Reigatian 3 (M)                                 NaN
10  22-Jan  10:00  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   0 : 4                         Cranleigh 1 (F)          Cranleigh School - Pitch 1
11     NaN  15:30    South East Men's Division 6 Oaks  Croydon & Old Whitgiftian Rustlers (M)   1 : 4                    Royal Holloway 1 (M)            Monks Hill Sports Centre
12  15-Jan  11:00  South East Women's Division 2 Oaks         Ashford (Middlesex) 1 (F) (HWO)   5 : 0                    Royal Holloway 1 (F)         Ashford (Middx) Hockey Club
13     NaN  12:30    South East Men's Division 6 Oaks                    Royal Holloway 1 (M)   2 : 0                    Sanderstead Cats (M)                                 NaN
14  11-Dec    NaN  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   0 : 7                         Guildford 4 (F)                                 NaN
15     NaN  10:00    South East Men's Division 6 Oaks                         Cranleigh 2 (M)   1 : 1                    Royal Holloway 1 (M)          Cranleigh School - Pitch 1
16  04-Dec  09:45  South East Women's Division 2 Oaks                             Epsom 4 (F)   3 : 1                    Royal Holloway 1 (F)         Epsom HC - Old Schools Lane
17  27-Nov  12:30  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   1 : 3                             Oxted 2 (F)                                 NaN
18     NaN  11:00    South East Men's Division 6 Oaks                            Woking 6 (M)   2 : 4                    Royal Holloway 1 (M)                 Woking HC - Pitch 2
19  20-Nov  15:30  South East Women's Division 2 Oaks                        Addiscombe 1 (F)   1 : 0                    Royal Holloway 1 (F)                   Woldingham School
20     NaN  12:30    South East Men's Division 6 Oaks                    Royal Holloway 1 (M)   0 : 1                  Fleet And Ewshot 4 (M)                                 NaN
21  13-Nov  12:30  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   1 : 2             Aldershot And Farnham 3 (F)                                 NaN
22     NaN  15:30    South East Men's Division 6 Oaks                             Epsom 7 (M)   0 : 2                    Royal Holloway 1 (M)                    Therfield School
23  06-Nov  12:30  South East Women's Division 2 Oaks                      Woking 4 (F) (HWO)   5 : 0                    Royal Holloway 1 (F)                 Woking HC - Pitch 2
24     NaN  12:30    South East Men's Division 6 Oaks              Royal Holloway 1 (M) (HWO)   5 : 0                            Kenley 3 (M)                                 NaN
25  30-Oct  12:30  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   1 : 7                    Reigate Priory 3 (F)                                 NaN
26     NaN  14:00    South East Men's Division 6 Oaks                    Reigate Priory 6 (M)   0 : 6                    Royal Holloway 1 (M)                     St Bedes School
27  16-Oct  12:00  South East Women's Division 2 Oaks                      Horley 1 (F) (HWO)   5 : 0                    Royal Holloway 1 (F)                        Worth School
28     NaN  12:30    South East Men's Division 6 Oaks                    Royal Holloway 1 (M)   5 : 0                            Horley 3 (M)                                 NaN
29  09-Oct  15:00  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)   3 : 4                            Kenley 1 (F)                                 NaN
30     NaN  13:00    South East Men's Division 6 Oaks                     Old Reigatian 3 (M)   0 : 4                    Royal Holloway 1 (M)     Royal Alexandra & Albert School
31  02-Oct  14:30  South East Women's Division 2 Oaks                         Cranleigh 1 (F)   5 : 0                    Royal Holloway 1 (F)                                 NaN
32     NaN  12:30    South East Men's Division 6 Oaks                    Royal Holloway 1 (M)   2 : 3  Croydon & Old Whitgiftian Rustlers (M)                                 NaN
33  25-Sep    NaN  South East Women's Division 2 Oaks                    Royal Holloway 1 (F)  1 : 17               Ashford (Middlesex) 1 (F)                                 NaN
34     NaN  14:30    South East Men's Division 6 Oaks                    Sanderstead Cats (M)   1 : 1                    Royal Holloway 1 (M)                     Caterham School

回复收藏 0 原文

~没有更多了~