无法在Python中获取BS4表内容

发布于 2025-02-05 02:36:02 字数 1082 浏览 1 评论 0原文

我想获取此链接中存在的所有用户处理/a>

这是尝试的代码,

import requests
from bs4 import BeautifulSoup
URL = 'https://practice.geeksforgeeks.org/leaderboard/'
def getdata(url):
    r = requests.get(url)
    return r.text

htmldata = getdata(URL)
soup = BeautifulSoup(htmldata, 'html.parser')
table= soup.find_all('table',{"id":"leaderboardTable"})
print(table[0].find_all('tbody')[1])
print(table[0].find_all('tbody')[1].tr)

输出:

<tbody id="overall_ranking">
</tbody>

None

代码正在获取表,但是当我尝试打印表中存在的TR或TD标签时,它没有显示。我也尝试了另一种使用大熊猫的方法,也正在发生同样的方法。

我只希望该表中的所有用户处理( https://practice.geeksforgeeks.org/leaderboard/ < /a>)

I want to fetch all the user handles present in this link https://practice.geeksforgeeks.org/leaderboard/

This is the code which tried,

import requests
from bs4 import BeautifulSoup
URL = 'https://practice.geeksforgeeks.org/leaderboard/'
def getdata(url):
    r = requests.get(url)
    return r.text

htmldata = getdata(URL)
soup = BeautifulSoup(htmldata, 'html.parser')
table= soup.find_all('table',{"id":"leaderboardTable"})
print(table[0].find_all('tbody')[1])
print(table[0].find_all('tbody')[1].tr)

Output:

<tbody id="overall_ranking">
</tbody>

None

The code is fetching the table but when i try to print the tr or td tags present in the table it is showing None. I tried another approach also using pandas, the same is happening.

I just want all the user handles present in this table (https://practice.geeksforgeeks.org/leaderboard/)
enter image description here

enter image description here

Any solution for this problem will be will be highly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

只有影子陪我不离不弃 2025-02-12 02:36:02

URL是动态的,美丽的套件无法呈现JavaScript,但是数据是从API生成的,这意味着网站使用API​​。

import requests
api_url='https://practiceapi.geeksforgeeks.org/api/v1/leaderboard/ranking/?ranking_type=overall&page={page}'
for page in range(1,11):
    data=requests.get(api_url.format(page=page)).json()
    for handle in data:
        print(handle['user_handle'])
            

输出:

Ibrahim Nash
blackshadows
mb1973
Quandray
akhayrutdinov
saiujwal13083
shivendr7
kirtidee18
mantu_singh
cfwong8
harshvardhancse1934
sgupta9519
sanjay05
samiranroy0407
Maverick_H
sreerammuthyam999
gfgaccount
sushant_a
verma_ji
balkar81199
marius_valentin_dragoi
ishu2001mitra
_tony_stark_01
ta7anas17113011638
yups0608
himanshujainmalpura
yujjwal9700
parthabhunia_04
KshamaGupta
the_coder95
ayush_gupta4
khushbooguptaciv18
aditya dhiman
dilipsuthar00786
adityajain9560
dharmsharma0811
Aegon_Targeryan
1032180422
mangeshagarwal1974
naveedaamir484
raj_271
Pulkit__Sharma__
aroranayan999
surbhi_7
ruchika1004ajmera
cs845418
shadymasum
lonewolf13325
user_1_4_13_19_22
SubhankarMajumdar

The url is dynamic and beautifulsoup can't render JavaScript but Data is generating from API meaning the website is using API.

import requests
api_url='https://practiceapi.geeksforgeeks.org/api/v1/leaderboard/ranking/?ranking_type=overall&page={page}'
for page in range(1,11):
    data=requests.get(api_url.format(page=page)).json()
    for handle in data:
        print(handle['user_handle'])
            

Output:

Ibrahim Nash
blackshadows
mb1973
Quandray
akhayrutdinov
saiujwal13083
shivendr7
kirtidee18
mantu_singh
cfwong8
harshvardhancse1934
sgupta9519
sanjay05
samiranroy0407
Maverick_H
sreerammuthyam999
gfgaccount
sushant_a
verma_ji
balkar81199
marius_valentin_dragoi
ishu2001mitra
_tony_stark_01
ta7anas17113011638
yups0608
himanshujainmalpura
yujjwal9700
parthabhunia_04
KshamaGupta
the_coder95
ayush_gupta4
khushbooguptaciv18
aditya dhiman
dilipsuthar00786
adityajain9560
dharmsharma0811
Aegon_Targeryan
1032180422
mangeshagarwal1974
naveedaamir484
raj_271
Pulkit__Sharma__
aroranayan999
surbhi_7
ruchika1004ajmera
cs845418
shadymasum
lonewolf13325
user_1_4_13_19_22
SubhankarMajumdar
晒暮凉 2025-02-12 02:36:02

您可以使用硒来获得此功能。

from selenium import webdriver

driver = webdriver.Chrome(executable_path = "<webdriver path>")
driver.get("https://practice.geeksforgeeks.org/leaderboard/")

user_names = driver.find_elements(by = "xpath", value = "//tbody[@id = 'overall_ranking']/tr/td/a")
user_names = list(map(lambda name:name.text, user_names))

driver.quit()

You can get this using Selenium.

from selenium import webdriver

driver = webdriver.Chrome(executable_path = "<webdriver path>")
driver.get("https://practice.geeksforgeeks.org/leaderboard/")

user_names = driver.find_elements(by = "xpath", value = "//tbody[@id = 'overall_ranking']/tr/td/a")
user_names = list(map(lambda name:name.text, user_names))

driver.quit()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文