python网络刮擦表

发布于 2025-02-13 20:57:58 字数 504 浏览 2 评论 0原文

我正在尝试从njlottery.com刮擦过去获胜的数字表。 我可以通过我的浏览器< table class =“表格式扎下的卡施加 - 赢得桌面胜利 - 赢得胜利”>

在我使用beautifulsoup获得整个网页时, 可以看到表格使用“ Inspect”表=然后打印出来,我看到表格,

<table class="table table-striped cardcash-winning-numbers  table-winning-numbers'+
((__t=((portalGameCode == 'PICK3' || portalGameCode == 'PICK4') ? '-pick' : ''))==null?'':__t)+
'">

但是当我尝试使用table_body = soup.find_all('table')找到表行时,结果不包括我正在寻找的桌子。 知道我在做什么错吗?

I'm trying to scrape the table of past winning numbers from njlottery.com.
I can see the table using 'inspect' through my browser <table class="table table-striped cardcash-winning-numbers table-winning-numbers">

When I get the whole webpage using BeautifulSoup and print it out, I see the table as

<table class="table table-striped cardcash-winning-numbers  table-winning-numbers'+
((__t=((portalGameCode == 'PICK3' || portalGameCode == 'PICK4') ? '-pick' : ''))==null?'':__t)+
'">

But when I try to find the table rows using table_body = soup.find_all('table'), the results do not include the table I am looking for.
Any idea what I'm doing wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

壹場煙雨 2025-02-20 20:57:58

原因似乎只有在加载页面后才通过AJAX请求加载表。使用无头浏览器可以解决此问题。

或其他选择是使用API​​提供商为;
https://rapidapi.com/scrapingbuddy-scrapingbuddy-default/api/scrapingbuddy/

以下代码显示PIC3游戏的表:

import requests

url = "https://scrapingbuddy.p.rapidapi.com/v1/scrape"

querystring = {"url":"https://www.njlottery.com/en-us/drawgames/pick3.html"}

headers = {
    "X-RapidAPI-Key": "XXX",
    "X-RapidAPI-Host": "XXX"
}

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)

The reason seems that the tables are loaded via AJAX request only after the page loaded. Using a headless browser can solve this.

Or another option is using an API provider as;
https://rapidapi.com/scrapingbuddy-scrapingbuddy-default/api/scrapingbuddy/

The following code shows the tables of the pic3 game:

import requests

url = "https://scrapingbuddy.p.rapidapi.com/v1/scrape"

querystring = {"url":"https://www.njlottery.com/en-us/drawgames/pick3.html"}

headers = {
    "X-RapidAPI-Key": "XXX",
    "X-RapidAPI-Host": "XXX"
}

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文