list() 返回视频中不同数量的评论

发布于 2025-01-10 09:27:28 字数 1283 浏览 0 评论 0原文

我正在尝试使用 youtube API 抓取给定 videoId 的评论。 但爬取的评论数量少于其实际数量。 你对此有什么想法吗?我的代码如下所示。

from googleapiclient.discovery import build
from typing import List

def get_comments(api, video_id: str, fields: str)-> List[List[str]]:
    comments = list()
    response = api.commentThreads().list(part='snippet', fields=fields, videoId=video_id, maxResults=50).execute()
    
    all_comment_crawled = True
    while all_comment_crawled:
        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']
            comments.append([comment['textOriginal'], comment['likeCount']])

        if 'nextPageToken' in response:
            response = api.commentThreads().list(part='snippet', videoId=video_id, fields=fields, pageToken=response['nextPageToken'], maxResults=50).execute()
        else:
            all_comment_crawled = False
         
    return comments

api_key = "MY_API_KEY"
api_obj = build('youtube', 'v3', developerKey=api_key)

video_id = 'fgSvGLxanCo'
fields = 'items(snippet(totalReplyCount, topLevelComment(snippet(textOriginal, likeCount)))), nextPageToken'

comments = get_comments(api_obj, video_id, fields)
print(len(comments)) # returns 1,945 actually is over 2,000

I'm trying to crawl comments of a given videoId with youtube API.
But the number of crawled comments is less than its actual number.
Do you have any idea about this? My code is like the below.

from googleapiclient.discovery import build
from typing import List

def get_comments(api, video_id: str, fields: str)-> List[List[str]]:
    comments = list()
    response = api.commentThreads().list(part='snippet', fields=fields, videoId=video_id, maxResults=50).execute()
    
    all_comment_crawled = True
    while all_comment_crawled:
        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']
            comments.append([comment['textOriginal'], comment['likeCount']])

        if 'nextPageToken' in response:
            response = api.commentThreads().list(part='snippet', videoId=video_id, fields=fields, pageToken=response['nextPageToken'], maxResults=50).execute()
        else:
            all_comment_crawled = False
         
    return comments

api_key = "MY_API_KEY"
api_obj = build('youtube', 'v3', developerKey=api_key)

video_id = 'fgSvGLxanCo'
fields = 'items(snippet(totalReplyCount, topLevelComment(snippet(textOriginal, likeCount)))), nextPageToken'

comments = get_comments(api_obj, video_id, fields)
print(len(comments)) # returns 1,945 actually is over 2,000

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

如歌彻婉言 2025-01-17 09:27:28

在列出 YouTube 视频上的评论时,有一个陷阱(我宣布另一个很容易陷入的陷阱):

  1. YouTube 上的评论计数统计所有(未过滤的)评论。 回复已包含在此计数中,但您尚未在算法中考虑它们。查看CommentThreads:列表
  2. 使用 时的回复commentThreads 被给予最多 5 个回复
    如果评论的回复数超过 5 条,您必须使用评论:列表 列出全部。

此处提供了处理视频所有评论的 Python 脚本示例。

There is a trap (and I announce another trap that it is easy to fall into) when listing comments on a YouTube video:

  1. The comments count on YouTube counts all (not filtered) comments. The replies are included in this count and you haven't considered them in your algorithm. Have a look to CommentThreads: list
  2. The replies when using commentThreads are given up to 5 replies.
    If the comment have more than 5 replies you have to use Comments: list to list them all.

An example of Python script treating all comments of a video is available here.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文