如何仅从Subreddit下载唯一的图片/媒体
我试图在Python中做一个模因刮擦脚本。
脚本现在很简单。它只需搜索子雷迪特,然后根据过滤器下载媒体。
以下是我写的脚本。我正在使用Praw库:
import praw
import requests
subreddit = reddit.subreddit("memes")
for submission in subreddit.top(limit = 10):
img_data = requests.get(submission.url).content
filename = submission.url.split('/')
with open(filename[-1], 'wb') as handler:
handler.write(img_data)
print(submission.url)
问题是有时可能会反复下载同一媒体。因此,我想配置它,以使其运行时,它总是为我返回10个独特的媒体。我可以知道是否有Reddit Pros知道该怎么做?
I was trying to do a meme scraper script in python.
The script is quite simple for now. It simply searches a subreddit and download the media based on the filter.
Below is the script i wrote. I am using the praw library:
import praw
import requests
subreddit = reddit.subreddit("memes")
for submission in subreddit.top(limit = 10):
img_data = requests.get(submission.url).content
filename = submission.url.split('/')
with open(filename[-1], 'wb') as handler:
handler.write(img_data)
print(submission.url)
The problem is that it may sometimes download the same media repeatedly. Therefore, I would like to configure it such that when it runs, it always returns 10 unique media for me. May I know if any reddit pros know how to do that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是我建议使用的代码。基本上,它检查给定ID是否已保存在数据库中。 ID可以通过Praw找到。
This is the code that I was suggested to use. Basically, it checks if a given id is already saved in the database. The id can be found via PRAW.