网页抓取英格兰曲棍球 Python BeautifulSoup
我试图通过 BeautifulSoup 获取此链接中找到的表格: https://gms.englandhockey.co.uk/fixtures-and-results/competitions.php?comp=4154007
这是一个英格兰曲棍球网站,基本上我想下载表格并将其放入 DataFrame 中,并最终获得赛程表。
每当我尝试找到正确的 div
或 table
时,它都会返回 None
。
这是我尝试过的:
url = "https://gms.englandhockey.co.uk/fixtures-and-results/club.php?id=Royal%20Holloway%20HC&prev=4153800"
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
我尝试查找表所在的 div
,但它返回 None
。
bread_crumbs = soup.find("div", class_="container")
print(bread_crumbs
我再次尝试查找表
,但它返回None
。
bread_crumbs = soup.find("table")
print(bread_crumbs)
如果有人能建议一种访问该表的方法,我将不胜感激! Selenium 可能会更好,但我还没有使用过 Selenium,所以我不确定它会如何开始。
正如您从链接中看到的,这是一个 php 网站,那么这可能是部分原因吗?
I am trying to BeautifulSoup to get the table found in this link: https://gms.englandhockey.co.uk/fixtures-and-results/competitions.php?comp=4154007
It's an England Hockey website and basically I want to download the table and put it in a DataFrame, and also eventually get the fixtures as well.
Whenever I try and find the right div
or table
, it returns None
.
Here's what I have tried:
url = "https://gms.englandhockey.co.uk/fixtures-and-results/club.php?id=Royal%20Holloway%20HC&prev=4153800"
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
I have tried to find the div
the table is within, but it returns None
.
bread_crumbs = soup.find("div", class_="container")
print(bread_crumbs
Again, I try to find the table
but it returns None
.
bread_crumbs = soup.find("table")
print(bread_crumbs)
If anyone can suggest a way to access the table, I would be grateful! It might be that Selenium would be better for this, but I haven't used Selenium yet so I am not sure how it would start.
As you can see from the link, it's a php website, so could this be part of the reason?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
因为要访问此网站,您必须同意使用 cookie,并接受其条款和条件,
用以下代码替换请求并重试
Because to access this site you must agree to the use of cookies, and accept their terms and condition
replace request with below code and try again
您可以让 Selenium 渲染页面然后拉取,或者需要在标头中使用 Cookie。
两种解决方案都有效。我的不同之处在于它使用会话来创建 cookie,而不是对其进行硬编码。
代码:
输出:
You could let Selenium render the page then pull, or need to use the Cookie in the headers.
Both solutions provided work. The difference in mine is it utilizes a Session to create the cookie, as opposed to hard coding it.
Code:
Output: