python网络刮擦表

发布于 2025-02-13 20:57:58 字数 504 浏览 2 评论 0原文

我正在尝试从njlottery.com刮擦过去获胜的数字表。我可以通过我的浏览器＆lt; table class =“表格式扎下的卡施加 - 赢得桌面胜利 - 赢得胜利”＆gt;

在我使用beautifulsoup获得整个网页时，可以看到表格使用“ Inspect”表=然后打印出来，我看到表格，

<table class="table table-striped cardcash-winning-numbers  table-winning-numbers'+
((__t=((portalGameCode == 'PICK3' || portalGameCode == 'PICK4') ? '-pick' : ''))==null?'':__t)+
'">

但是当我尝试使用table_body = soup.find_all（'table'）找到表行时，结果不包括我正在寻找的桌子。知道我在做什么错吗？

原文

I'm trying to scrape the table of past winning numbers from njlottery.com.
I can see the table using 'inspect' through my browser <table class="table table-striped cardcash-winning-numbers table-winning-numbers">

When I get the whole webpage using BeautifulSoup and print it out, I see the table as

<table class="table table-striped cardcash-winning-numbers  table-winning-numbers'+
((__t=((portalGameCode == 'PICK3' || portalGameCode == 'PICK4') ? '-pick' : ''))==null?'':__t)+
'">

But when I try to find the table rows using table_body = soup.find_all('table'), the results do not include the table I am looking for.
Any idea what I'm doing wrong?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

壹場煙雨 2025-02-20 20:57:58

原因似乎只有在加载页面后才通过AJAX请求加载表。使用无头浏览器可以解决此问题。

或其他选择是使用API提供商为；
https://rapidapi.com/scrapingbuddy-scrapingbuddy-default/api/scrapingbuddy/

以下代码显示PIC3游戏的表：

import requests

url = "https://scrapingbuddy.p.rapidapi.com/v1/scrape"

querystring = {"url":"https://www.njlottery.com/en-us/drawgames/pick3.html"}

headers = {
    "X-RapidAPI-Key": "XXX",
    "X-RapidAPI-Host": "XXX"
}

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)

The reason seems that the tables are loaded via AJAX request only after the page loaded. Using a headless browser can solve this.

Or another option is using an API provider as;
https://rapidapi.com/scrapingbuddy-scrapingbuddy-default/api/scrapingbuddy/

The following code shows the tables of the pic3 game:

import requests

url = "https://scrapingbuddy.p.rapidapi.com/v1/scrape"

querystring = {"url":"https://www.njlottery.com/en-us/drawgames/pick3.html"}

headers = {
    "X-RapidAPI-Key": "XXX",
    "X-RapidAPI-Host": "XXX"
}

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)

回复收藏 0 原文

~没有更多了~