如何仅从Subreddit下载唯一的图片/媒体

发布于 2025-02-10 18:23:14 字数 512 浏览 3 评论 0原文

我试图在Python中做一个模因刮擦脚本。

脚本现在很简单。它只需搜索子雷迪特,然后根据过滤器下载媒体。

以下是我写的脚本。我正在使用Praw库:

import praw
import requests

subreddit = reddit.subreddit("memes")

for submission in subreddit.top(limit = 10):
    img_data = requests.get(submission.url).content
    filename = submission.url.split('/')
    with open(filename[-1], 'wb') as handler:
        handler.write(img_data)
    print(submission.url)

问题是有时可能会反复下载同一媒体。因此,我想配置它,以使其运行时,它总是为我返回10个独特的媒体。我可以知道是否有Reddit Pros知道该怎么做?

I was trying to do a meme scraper script in python.

The script is quite simple for now. It simply searches a subreddit and download the media based on the filter.

Below is the script i wrote. I am using the praw library:

import praw
import requests

subreddit = reddit.subreddit("memes")

for submission in subreddit.top(limit = 10):
    img_data = requests.get(submission.url).content
    filename = submission.url.split('/')
    with open(filename[-1], 'wb') as handler:
        handler.write(img_data)
    print(submission.url)

The problem is that it may sometimes download the same media repeatedly. Therefore, I would like to configure it such that when it runs, it always returns 10 unique media for me. May I know if any reddit pros know how to do that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

拥醉 2025-02-17 18:23:14

这是我建议使用的代码。基本上,它检查给定ID是否已保存在数据库中。 ID可以通过Praw找到。

def save_id(submission_id):
    write_down_id_somewhere(...)

def is_known_id(submission_id):
    # check here to see if it's saved somewhere

for submission in subreddit.top(limit = 10):
    if is_known_id(submission.fullname):
        continue
    img_data = requests.get(submission.url).content
    filename = submission.url.split('/')
    with open(filename[-1], 'wb') as handler:
        handler.write(img_data)
    save_id(submission.fullname)
    print(submission.url)

This is the code that I was suggested to use. Basically, it checks if a given id is already saved in the database. The id can be found via PRAW.

def save_id(submission_id):
    write_down_id_somewhere(...)

def is_known_id(submission_id):
    # check here to see if it's saved somewhere

for submission in subreddit.top(limit = 10):
    if is_known_id(submission.fullname):
        continue
    img_data = requests.get(submission.url).content
    filename = submission.url.split('/')
    with open(filename[-1], 'wb') as handler:
        handler.write(img_data)
    save_id(submission.fullname)
    print(submission.url)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文