PRAW 从提交结果中引用 Reddit 和 Subreddit 数据/实例(?)的困难
我目前的工作代码可以按名称抓取 subreddit 并提取最新的 1,000 个提交,将其数据插入数据库。
现在我想做一些类似但不同的事情。我想获取用户的最后 1,000 条提交(帖子,而不是评论)(如果我可以按用户每个 subreddit 执行此操作,那就更好了,但我认为 API 不允许这样做)。
我基本上已经完成了这一点,我认为我正在以“正确的方式”做这件事,除了一些我认为我访问不正确的数据,尽管我已经尝试查看 PRAW 的文档并进行了尽职调查,我找不到访问这些东西的正确方法。让我告诉你。
这是我正在工作的“通过 subreddit 抓取”代码:
for subreddit in _PRAW_SUBREDDITS:
for submission in reddit.subreddit(subreddit).new(limit=_PRAW_LIMIT):
cursor.execute(
"""INSERT INTO reddit (
name,
created_utc,
author,
link_flair_text,
num_comments,
score,
subreddit,
permalink,
title,
selftext)
VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(name)
DO UPDATE SET num_comments=excluded.num_comments,
score=excluded.score,
selftext=excluded.selftext
""",
(
submission.name,
int(submission.created_utc),
str(submission.author),
submission.link_flair_text,
submission.num_comments,
submission.score,
str(submission.subreddit),
submission.permalink,
submission.title,
submission.selftext,
),
)
这是相同代码的新版本,我正在尝试创建一个新函数,它将“按用户”抓取:
for submission in reddit.redditor(_PRAW_REDDITOR).submissions.new(limit=1):
print(
f"{submission.name=}"
f"{submission.created_utc=}"
f"{submission.author=}"
f"{submission.link_flair_text=}"
f"{submission.num_comments=}"
f"{submission.score=}"
f"{submission.subreddit=}"
f"{submission.permalink=}"
f"{submission.title=}"
f"{submission.selftext=}"
)
我的大多数结果都正常,除了对用户和 subreddit 的引用。
# This is the output to console:
submission.author=Redditor(name='JoeBloeUsername')
submission.link_flair_text=None
submission.num_comments=10
submission.score=137
submission.subreddit=Subreddit(display_name='u_JoeBlowUsername')
这就是我在代码开头创建 reddit 实例的方式:
# Create Reddit instance in PRAW
reddit = praw.Reddit(
client_id="[REDACTED]",
client_secret="[REDACTED]",
user_agent="Windows 10:randoapp:0.00002",
)
显然,我很难理解如何在此处使用不同的 Submission、Reddit、Comment 实例。我以为我已经掌握了它们,但在我的例子中,当我尝试使用它们时,它就崩溃了。
如果有人有足够的 PRAW 经验来启发我,我将不胜感激。
I currently have working code that scrapes subreddit's by name and pulls out the latest 1,000 submissions, inserting their data into a DB.
Now I want to do something similar, but different. I want to grab the last 1,000 submissions (posts, not comments) by a USER (if I could do this by user per subreddit, that'd be better, but I don't think the API allows it).
I have mostly accomplished this and I think I'm doing it the "right way", except for a couple pieces of data that I think I'm accessing incorrectly and though I've tried to review PRAW's docs and done due-diligence, I can't find the right way to access these things. Let me show you.
Here is my working "grab by subreddit" code:
for subreddit in _PRAW_SUBREDDITS:
for submission in reddit.subreddit(subreddit).new(limit=_PRAW_LIMIT):
cursor.execute(
"""INSERT INTO reddit (
name,
created_utc,
author,
link_flair_text,
num_comments,
score,
subreddit,
permalink,
title,
selftext)
VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(name)
DO UPDATE SET num_comments=excluded.num_comments,
score=excluded.score,
selftext=excluded.selftext
""",
(
submission.name,
int(submission.created_utc),
str(submission.author),
submission.link_flair_text,
submission.num_comments,
submission.score,
str(submission.subreddit),
submission.permalink,
submission.title,
submission.selftext,
),
)
And here is a new version of the same code that I'm trying to create a new function with that will grab "by user":
for submission in reddit.redditor(_PRAW_REDDITOR).submissions.new(limit=1):
print(
f"{submission.name=}"
f"{submission.created_utc=}"
f"{submission.author=}"
f"{submission.link_flair_text=}"
f"{submission.num_comments=}"
f"{submission.score=}"
f"{submission.subreddit=}"
f"{submission.permalink=}"
f"{submission.title=}"
f"{submission.selftext=}"
)
Most of my results come out normal, except the reference to the user and the subreddit.
# This is the output to console:
submission.author=Redditor(name='JoeBloeUsername')
submission.link_flair_text=None
submission.num_comments=10
submission.score=137
submission.subreddit=Subreddit(display_name='u_JoeBlowUsername')
This is how I'm creating the reddit instance at the start of my code:
# Create Reddit instance in PRAW
reddit = praw.Reddit(
client_id="[REDACTED]",
client_secret="[REDACTED]",
user_agent="Windows 10:randoapp:0.00002",
)
Clearly, I'm having a hard time wrapping my head around how to utilize the different Submission, Redditor, Comment instances here. I thought I had a grasp on them, but then it falls apart when I try to use them, in my example.
If anyone has enough experience with PRAW to enlighten me, you'd have my gratitude.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
submission.author
是一个具有许多属性的 Redditor 对象。如果您想存储 Redditor 名称,则可以使用
submission.author.name
。submission.subreddit
非常相似,你可以使用submission.subreddit.display_name来获取subreddit名称(以u_开头意味着它被发布到用户个人资料,而不是subreddit)
PRAW文档很棒并且涵盖了所有这些属性通过示例,我强烈建议您阅读它
https://praw.readthedocs.io/en/latest/code_overview/models/redditor.html
https://praw.readthedocs.io/en/latest/code_overview/模型/subreddit.html
submission.author
is a Redditor object that has many attibutes.if you wanted to store the Redditor name, you would use
submission.author.name
.submission.subreddit
is very similar,you can use
submission.subreddit.display_name
to get the subreddit name (it starting with u_ means that it was posted to a user profile, not a subreddit)The PRAW documentation is great and cover all of these attibutes with examples, I highly recommend you read it
https://praw.readthedocs.io/en/latest/code_overview/models/redditor.html
https://praw.readthedocs.io/en/latest/code_overview/models/subreddit.html