PRAW 从提交结果中引用 Reddit 和 Subreddit 数据/实例(?)的困难
我目前的工作代码可以按名称抓取 subreddit 并提取最新的 1,000 个提交,将其数据插入数据库。
现在我想做一些类似但不同的事情。我想获取用户的最后 1,000 条提交(帖子,而不是评论)(如果我可以按用户每个 subreddit 执行此操作,那就更好了,但我认为 API 不允许这样做)。
我基本上已经完成了这一点,我认为我正在以“正确的方式”做这件事,除了一些我认为我访问不正确的数据,尽管我已经尝试查看 PRAW 的文档并进行了尽职调查,我找不到访问这些东西的正确方法。让我告诉你。
这是我正在工作的“通过 subreddit 抓取”代码:
for subreddit in _PRAW_SUBREDDITS:
for submission in reddit.subreddit(subreddit).new(limit=_PRAW_LIMIT):
cursor.execute(
"""INSERT INTO reddit (
name,
created_utc,
author,
link_flair_text,
num_comments,
score,
subreddit,
permalink,
title,
selftext)
VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(name)
DO UPDATE SET num_comments=excluded.num_comments,
score=excluded.score,
selftext=excluded.selftext
""",
(
submission.name,
int(submission.created_utc),
str(submission.author),
submission.link_flair_text,
submission.num_comments,
submission.score,
str(submission.subreddit),
submission.permalink,
submission.title,
submission.selftext,
),
)
这是相同代码的新版本,我正在尝试创建一个新函数,它将“按用户”抓取:
for submission in reddit.redditor(_PRAW_REDDITOR).submissions.new(limit=1):
print(
f"{submission.name=}"
f"{submission.created_utc=}"
f"{submission.author=}"
f"{submission.link_flair_text=}"
f"{submission.num_comments=}"
f"{submission.score=}"
f"{submission.subreddit=}"
f"{submission.permalink=}"
f"{submission.title=}"
f"{submission.selftext=}"
)
我的大多数结果都正常,除了对用户和 subreddit 的引用。
# This is the output to console:
submission.author=Redditor(name='JoeBloeUsername')
submission.link_flair_text=None
submission.num_comments=10
submission.score=137
submission.subreddit=Subreddit(display_name='u_JoeBlowUsername')
这就是我在代码开头创建 reddit 实例的方式:
# Create Reddit instance in PRAW
reddit = praw.Reddit(
client_id="[REDACTED]",
client_secret="[REDACTED]",
user_agent="Windows 10:randoapp:0.00002",
)
显然,我很难理解如何在此处使用不同的 Submission、Reddit、Comment 实例。我以为我已经掌握了它们,但在我的例子中,当我尝试使用它们时,它就崩溃了。
如果有人有足够的 PRAW 经验来启发我,我将不胜感激。
I currently have working code that scrapes subreddit's by name and pulls out the latest 1,000 submissions, inserting their data into a DB.
Now I want to do something similar, but different. I want to grab the last 1,000 submissions (posts, not comments) by a USER (if I could do this by user per subreddit, that'd be better, but I don't think the API allows it).
I have mostly accomplished this and I think I'm doing it the "right way", except for a couple pieces of data that I think I'm accessing incorrectly and though I've tried to review PRAW's docs and done due-diligence, I can't find the right way to access these things. Let me show you.
Here is my working "grab by subreddit" code:
for subreddit in _PRAW_SUBREDDITS:
for submission in reddit.subreddit(subreddit).new(limit=_PRAW_LIMIT):
cursor.execute(
"""INSERT INTO reddit (
name,
created_utc,
author,
link_flair_text,
num_comments,
score,
subreddit,
permalink,
title,
selftext)
VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(name)
DO UPDATE SET num_comments=excluded.num_comments,
score=excluded.score,
selftext=excluded.selftext
""",
(
submission.name,
int(submission.created_utc),
str(submission.author),
submission.link_flair_text,
submission.num_comments,
submission.score,
str(submission.subreddit),
submission.permalink,
submission.title,
submission.selftext,
),
)
And here is a new version of the same code that I'm trying to create a new function with that will grab "by user":
for submission in reddit.redditor(_PRAW_REDDITOR).submissions.new(limit=1):
print(
f"{submission.name=}"
f"{submission.created_utc=}"
f"{submission.author=}"
f"{submission.link_flair_text=}"
f"{submission.num_comments=}"
f"{submission.score=}"
f"{submission.subreddit=}"
f"{submission.permalink=}"
f"{submission.title=}"
f"{submission.selftext=}"
)
Most of my results come out normal, except the reference to the user and the subreddit.
# This is the output to console:
submission.author=Redditor(name='JoeBloeUsername')
submission.link_flair_text=None
submission.num_comments=10
submission.score=137
submission.subreddit=Subreddit(display_name='u_JoeBlowUsername')
This is how I'm creating the reddit instance at the start of my code:
# Create Reddit instance in PRAW
reddit = praw.Reddit(
client_id="[REDACTED]",
client_secret="[REDACTED]",
user_agent="Windows 10:randoapp:0.00002",
)
Clearly, I'm having a hard time wrapping my head around how to utilize the different Submission, Redditor, Comment instances here. I thought I had a grasp on them, but then it falls apart when I try to use them, in my example.
If anyone has enough experience with PRAW to enlighten me, you'd have my gratitude.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
submission.author
是一个具有许多属性的 Redditor 对象。如果您想存储 Redditor 名称,则可以使用
submission.author.name
。submission.subreddit
非常相似,你可以使用submission.subreddit.display_name来获取subreddit名称(以u_开头意味着它被发布到用户个人资料,而不是subreddit)
PRAW文档很棒并且涵盖了所有这些属性通过示例,我强烈建议您阅读它
https://praw.readthedocs.io/en/latest/code_overview/models/redditor.html
https://praw.readthedocs.io/en/latest/code_overview/模型/subreddit.html
submission.author
is a Redditor object that has many attibutes.if you wanted to store the Redditor name, you would use
submission.author.name
.submission.subreddit
is very similar,you can use
submission.subreddit.display_name
to get the subreddit name (it starting with u_ means that it was posted to a user profile, not a subreddit)The PRAW documentation is great and cover all of these attibutes with examples, I highly recommend you read it
https://praw.readthedocs.io/en/latest/code_overview/models/redditor.html
https://praw.readthedocs.io/en/latest/code_overview/models/subreddit.html