可以弄清楚如何用美丽的小组刮擦ID
试图用ID刮擦网站,但我不知道如何修复它:
from bs4 import BeautifulSoup
import requests
url= "Website"
page= requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all ('div', class_="position-relative")
for list in lists:
Value = list.find('h5', id_= "player_value")
print (Value)
现在它将打印:
None
这是网站检查模式的样子:
Trying to scrape a site with an ID but I can't figure out how to fix it:
from bs4 import BeautifulSoup
import requests
url= "Website"
page= requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all ('div', class_="position-relative")
for list in lists:
Value = list.find('h5', id_= "player_value")
print (Value)
Now with that it will just print:
None
Here is what the website inspect mode looks like:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
从attribute参数
ID
中删除_
:为什么
_
需要从可以使用关键字参数class_示例
假设有一个唯一的ID,您可以直接获得值:
如果您的
< H5>
不是唯一的,并且您想获得全部 - 避免使用其他保留单词,例如list
:Remove the
_
from attribute parameterid
:Why
_
is needed for theclass
from the docs:Example
Assuming that there is an unique id you could get your value directly:
If the id of your
<h5>
is not unique and you want to get all - Avoid also to use other reserved words likelist
:您需要在dict中通过班级,请尝试以下操作:
You need to pass the class in a dict, try that: