我正在尝试从IPLT20网站上刮擦表格，它不断返回空白[]

发布于 2025-02-12 18:42:32 字数 255 浏览 1 评论 0原文

from bs4 import BeautifulSoup

import requests

url = 'https://www.iplt20.com/stats/2021/most-runs'

source = requests.get(url)

soup = BeautifulSoup(source.text, 'html.parser')

soup.find_all('table', class_ ='np-mostruns_table')

原文

from bs4 import BeautifulSoup

import requests

url = 'https://www.iplt20.com/stats/2021/most-runs'

source = requests.get(url)

soup = BeautifulSoup(source.text, 'html.parser')

soup.find_all('table', class_ ='np-mostruns_table')

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清晰传感 2025-02-19 18:42:32

该网站完全是JavaScript，您无法加载JavaScript。

您必须使用自动浏览器，例如selenium 或类似。

我还建议您在刮擦以禁用JavaScript时使用扩展名（像The/Off打开/关）这样的

切换JS

回复收藏 0 原文

南渊 2025-02-19 18:42:32

如果您想找到一个带课堂的表，则应使用：

soup.find("table",{"class":"np-mostruns_table"})

If you are looking to find a table with class, you should use:

soup.find("table",{"class":"np-mostruns_table"})

回复收藏 0 原文

誰認得朕 2025-02-19 18:42:32

您无法获得该桌子，因为它是动态加载的。您需要找到加载它的查询，并从中构建表。它的字段比网站上显示的要多，因此您可以添加所需的其他字段。我只用网站

import requests
import json
import pandas as pd


url = 'https://ipl-stats-sports-mechanic.s3.ap-south-1.amazonaws.com/ipl/feeds/stats/60-toprunsscorers.js?callback=ontoprunsscorers'
results = []
response = requests.get(url)
json_data = json.loads(response.text[response.text.find('(')+1:response.text.find(')')])
for player in json_data['toprunsscorers']:
    data = {
        'Player': player['StrikerName'],
        'Mat': player['Matches'],
        'Inns': player['Innings'],
        'NO': player['NotOuts'],
        'Runs': player['TotalRuns'],
        'HS': player['HighestScore'],
        'AVG': player['BattingAverage'],
        'BF': player['Balls'],
        'SR': player['StrikeRate'],
        '100': player['Centuries'],
        '50': player['FiftyPlusRuns'],
        '4s': player['Fours'],
        '6s': player['Sixes']
    }
    results.append(data)
df = pd.DataFrame(results)
print(df)

输出上的那些字段给出了一个示例：

                  Player Mat Inns NO Runs    HS  ...   BF      SR 100 50  4s  6s
0            Jos Buttler  17   17  2  863   116  ...  579  149.05   4  4  83  45
1              K L Rahul  15   15  3  616  103*  ...  455  135.38   2  4  45  30
2        Quinton De Kock  15   15  1  508  140*  ...  341  148.97   1  3  47  23
3          Hardik Pandya  15   15  4  487   87*  ...  371  131.26   0  4  49  12
4           Shubman Gill  16   16  2  483    96  ...  365  132.32   0  4  51  11
..                   ...  ..  ... ..  ...   ...  ...  ...     ...  .. ..  ..  ..
157     Fazalhaq Farooqi   3    1  1    2    2*  ...    8   25.00   0  0   0   0
158   Jagadeesha Suchith   5    2  0    2     2  ...    8   25.00   0  0   0   0
159          Tim Southee   9    5  1    2    1*  ...   12   16.66   0  0   0   0
160  Nathan Coulter-Nile   1    1  1    1    1*  ...    2   50.00   0  0   0   0
161        Anrich Nortje   6    1  1    1    1*  ...    6   16.66   0  0   0   0

You can't get the table because it's loaded dynamically. You need to find the query that loads it, and build your table from it. It has many more fields than shown on the site, so you can add additional fields that you need. I gave an example only with those fields that are on the site

import requests
import json
import pandas as pd


url = 'https://ipl-stats-sports-mechanic.s3.ap-south-1.amazonaws.com/ipl/feeds/stats/60-toprunsscorers.js?callback=ontoprunsscorers'
results = []
response = requests.get(url)
json_data = json.loads(response.text[response.text.find('(')+1:response.text.find(')')])
for player in json_data['toprunsscorers']:
    data = {
        'Player': player['StrikerName'],
        'Mat': player['Matches'],
        'Inns': player['Innings'],
        'NO': player['NotOuts'],
        'Runs': player['TotalRuns'],
        'HS': player['HighestScore'],
        'AVG': player['BattingAverage'],
        'BF': player['Balls'],
        'SR': player['StrikeRate'],
        '100': player['Centuries'],
        '50': player['FiftyPlusRuns'],
        '4s': player['Fours'],
        '6s': player['Sixes']
    }
    results.append(data)
df = pd.DataFrame(results)
print(df)

OUTPUT:

                  Player Mat Inns NO Runs    HS  ...   BF      SR 100 50  4s  6s
0            Jos Buttler  17   17  2  863   116  ...  579  149.05   4  4  83  45
1              K L Rahul  15   15  3  616  103*  ...  455  135.38   2  4  45  30
2        Quinton De Kock  15   15  1  508  140*  ...  341  148.97   1  3  47  23
3          Hardik Pandya  15   15  4  487   87*  ...  371  131.26   0  4  49  12
4           Shubman Gill  16   16  2  483    96  ...  365  132.32   0  4  51  11
..                   ...  ..  ... ..  ...   ...  ...  ...     ...  .. ..  ..  ..
157     Fazalhaq Farooqi   3    1  1    2    2*  ...    8   25.00   0  0   0   0
158   Jagadeesha Suchith   5    2  0    2     2  ...    8   25.00   0  0   0   0
159          Tim Southee   9    5  1    2    1*  ...   12   16.66   0  0   0   0
160  Nathan Coulter-Nile   1    1  1    1    1*  ...    2   50.00   0  0   0   0
161        Anrich Nortje   6    1  1    1    1*  ...    6   16.66   0  0   0   0

回复收藏 0 原文

~没有更多了~