我正在尝试从IPLT20网站上刮擦表格,它不断返回空白[]

发布于 2025-02-12 18:42:32 字数 255 浏览 1 评论 0原文

from bs4 import BeautifulSoup

import requests

url = 'https://www.iplt20.com/stats/2021/most-runs'

source = requests.get(url)

soup = BeautifulSoup(source.text, 'html.parser')

soup.find_all('table', class_ ='np-mostruns_table')
from bs4 import BeautifulSoup

import requests

url = 'https://www.iplt20.com/stats/2021/most-runs'

source = requests.get(url)

soup = BeautifulSoup(source.text, 'html.parser')

soup.find_all('table', class_ ='np-mostruns_table')

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

清晰传感 2025-02-19 18:42:32

该网站完全是JavaScript,您无法加载JavaScript。

您必须使用自动浏览器,例如selenium 或类似

我还建议您在刮擦以禁用JavaScript时使用扩展名(像The/Off打开/关)这样的

切换JS

The website is fully javascript, you can't load javascript with requests.

You have to use an automated browser like selenium or similar.

I also suggest using an extension when you are scraping to disable javascript (toggle on/off) like this

Toggle JS

南渊 2025-02-19 18:42:32

如果您想找到一个带课堂的表,则应使用:

soup.find("table",{"class":"np-mostruns_table"})

If you are looking to find a table with class, you should use:

soup.find("table",{"class":"np-mostruns_table"})
誰認得朕 2025-02-19 18:42:32

您无法获得该桌子,因为它是动态加载的。您需要找到加载它的查询,并从中构建表。它的字段比网站上显示的要多,因此您可以添加所需的其他字段。我只用网站

import requests
import json
import pandas as pd


url = 'https://ipl-stats-sports-mechanic.s3.ap-south-1.amazonaws.com/ipl/feeds/stats/60-toprunsscorers.js?callback=ontoprunsscorers'
results = []
response = requests.get(url)
json_data = json.loads(response.text[response.text.find('(')+1:response.text.find(')')])
for player in json_data['toprunsscorers']:
    data = {
        'Player': player['StrikerName'],
        'Mat': player['Matches'],
        'Inns': player['Innings'],
        'NO': player['NotOuts'],
        'Runs': player['TotalRuns'],
        'HS': player['HighestScore'],
        'AVG': player['BattingAverage'],
        'BF': player['Balls'],
        'SR': player['StrikeRate'],
        '100': player['Centuries'],
        '50': player['FiftyPlusRuns'],
        '4s': player['Fours'],
        '6s': player['Sixes']
    }
    results.append(data)
df = pd.DataFrame(results)
print(df)

输出上的那些字段给出了一个示例:

                  Player Mat Inns NO Runs    HS  ...   BF      SR 100 50  4s  6s
0            Jos Buttler  17   17  2  863   116  ...  579  149.05   4  4  83  45
1              K L Rahul  15   15  3  616  103*  ...  455  135.38   2  4  45  30
2        Quinton De Kock  15   15  1  508  140*  ...  341  148.97   1  3  47  23
3          Hardik Pandya  15   15  4  487   87*  ...  371  131.26   0  4  49  12
4           Shubman Gill  16   16  2  483    96  ...  365  132.32   0  4  51  11
..                   ...  ..  ... ..  ...   ...  ...  ...     ...  .. ..  ..  ..
157     Fazalhaq Farooqi   3    1  1    2    2*  ...    8   25.00   0  0   0   0
158   Jagadeesha Suchith   5    2  0    2     2  ...    8   25.00   0  0   0   0
159          Tim Southee   9    5  1    2    1*  ...   12   16.66   0  0   0   0
160  Nathan Coulter-Nile   1    1  1    1    1*  ...    2   50.00   0  0   0   0
161        Anrich Nortje   6    1  1    1    1*  ...    6   16.66   0  0   0   0

You can't get the table because it's loaded dynamically. You need to find the query that loads it, and build your table from it. It has many more fields than shown on the site, so you can add additional fields that you need. I gave an example only with those fields that are on the site

import requests
import json
import pandas as pd


url = 'https://ipl-stats-sports-mechanic.s3.ap-south-1.amazonaws.com/ipl/feeds/stats/60-toprunsscorers.js?callback=ontoprunsscorers'
results = []
response = requests.get(url)
json_data = json.loads(response.text[response.text.find('(')+1:response.text.find(')')])
for player in json_data['toprunsscorers']:
    data = {
        'Player': player['StrikerName'],
        'Mat': player['Matches'],
        'Inns': player['Innings'],
        'NO': player['NotOuts'],
        'Runs': player['TotalRuns'],
        'HS': player['HighestScore'],
        'AVG': player['BattingAverage'],
        'BF': player['Balls'],
        'SR': player['StrikeRate'],
        '100': player['Centuries'],
        '50': player['FiftyPlusRuns'],
        '4s': player['Fours'],
        '6s': player['Sixes']
    }
    results.append(data)
df = pd.DataFrame(results)
print(df)

OUTPUT:

                  Player Mat Inns NO Runs    HS  ...   BF      SR 100 50  4s  6s
0            Jos Buttler  17   17  2  863   116  ...  579  149.05   4  4  83  45
1              K L Rahul  15   15  3  616  103*  ...  455  135.38   2  4  45  30
2        Quinton De Kock  15   15  1  508  140*  ...  341  148.97   1  3  47  23
3          Hardik Pandya  15   15  4  487   87*  ...  371  131.26   0  4  49  12
4           Shubman Gill  16   16  2  483    96  ...  365  132.32   0  4  51  11
..                   ...  ..  ... ..  ...   ...  ...  ...     ...  .. ..  ..  ..
157     Fazalhaq Farooqi   3    1  1    2    2*  ...    8   25.00   0  0   0   0
158   Jagadeesha Suchith   5    2  0    2     2  ...    8   25.00   0  0   0   0
159          Tim Southee   9    5  1    2    1*  ...   12   16.66   0  0   0   0
160  Nathan Coulter-Nile   1    1  1    1    1*  ...    2   50.00   0  0   0   0
161        Anrich Nortje   6    1  1    1    1*  ...    6   16.66   0  0   0   0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文