如何从网站提取的信息中获取 url
所以基本上我遇到了一个问题,我不知道如何从网站提取的数据中获取 URL。
这是我的代码:
import requests
from bs4 import BeautifulSoup
req = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1')
soup = BeautifulSoup(req.content, "html.parser")
print(soup.prettify())
我得到了很多关于输出的信息,但我唯一需要的是网址,我希望有人可以帮助我。
PS:
它给了我这个信息:
{"response":{"items":[{"url":"https:\/\/2ch.hk\/b\/src\/262671212\/16440825183970.webm","type":"video\/webm","filesize":"20259","width":1280,"height":720,"name":"1521967932778.webm","board":"b","thread":"262671212"},{"url":"https:\/\/2ch.hk\/b\/src\/261549765\/16424501976450.webm","type":"video\/webm","filesize":"12055","width":1280,"height":720,"name":"1526793203110.webm","board":"b","thread":"261549765"}...
但我只需要所有东西中的这一部分 https:\/\/2ch.hk\/b\/src\/261549765\/16424501976450.webm
(不完全是这个网址,只是作为示例)
So basically I am stuck on the problem where I don't know how to the url from the extracted data from a website.
Here is my code:
import requests
from bs4 import BeautifulSoup
req = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1')
soup = BeautifulSoup(req.content, "html.parser")
print(soup.prettify())
I get a lot of information on output, but the only thing I need is the url, I hope someone can help me.
P.S:
It gives me this information:
{"response":{"items":[{"url":"https:\/\/2ch.hk\/b\/src\/262671212\/16440825183970.webm","type":"video\/webm","filesize":"20259","width":1280,"height":720,"name":"1521967932778.webm","board":"b","thread":"262671212"},{"url":"https:\/\/2ch.hk\/b\/src\/261549765\/16424501976450.webm","type":"video\/webm","filesize":"12055","width":1280,"height":720,"name":"1526793203110.webm","board":"b","thread":"261549765"}...
But i only need this part out of all the thingshttps:\/\/2ch.hk\/b\/src\/261549765\/16424501976450.webm
(Not exactly this url, but just as an example)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
你可以这样做:
我想如果API返回JSON数据那么直接解析它应该更好。
You can do it this way:
I guess if the API returns JSON data then it should be better to just parse it directly.
url 生成 json 数据。 Beautifulsoup无法抓取json数据,要抓取json数据,可以按照下一个例子。
输出:
The url produces json data. Beautifulsoup can't grab json data and to grab json data, you can follow the next example.
Output:
问题是您告诉 BeautifulSoup 将 JSON 数据解析为 HTML。您可以通过以下代码更直接地获取您需要的URL
The problem is you are telling BeautifulSoup to parse JSON data as HTML. You can get the URL you need more directly with the following code