在Twitter API中仅进行一次搜索电话后,我一直在达到速率限制

发布于 2025-02-07 13:04:29 字数 2219 浏览 2 评论 0原文

我正在对带有来自用户列表的关键字的推文进行完整的存档搜索。我循环浏览每个用户名的搜索查询,并检查关键字“共和党人”。问题在于,它将在达到速率限制之前循环遍历数量不错的用户名,然后每个其他用户搜索都会提示速率限制等待,而不是完全刷新。我的问题基本上是为什么它会迫使我在一次搜索电话后等待,我该怎么做才能避免这种情况?

df = pd.read_csv("RealIdMasterList.csv")
id_str_df = df['id_str'].tolist()
theta_df = df['theta'].tolist()
accounts_followed_df=df['accounts_followed'].tolist()

# Your bearer token here
t = Twarc2(bearer_token="<token>")
# Start and end times must be in UTC
start_time = datetime.datetime(2010, 3, 21, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2022, 3, 22, 0, 0, 0, 0, datetime.timezone.utc)
# search_results is a generator, max_results is max tweets per page, 500 max for full archive search.
i = 0 
pings = 0

with open('realtweets.csv', 'w', encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    while(pings <=10):
        for x in range(len(id_str_df)):
            userid = str(int(id_str_df[i]))
            print(userid)
            q = "republican lang:en -is:retweet from:" + userid
            try: 
                search_results = list(t.search_all(query=q, start_time=start_time, end_time=end_time, max_results=100))
                count = 0 
                if search_results: 
                    pings+=1
                    for page in search_results:
                        if count<1:   
                            for tweet in ensure_flattened(page):
                                writer.writerow([tweet['id'], tweet['author_id'],  tweet['text'],  theta_df[0],  0, accounts_followed_df[0],  tweet['created_at']])
                                #print(tweet['text'] + "," + tweet['author_id'] + ',' + tweet['created_at'])
                        # Do something with the tweet
                        #print(tweet)
                    # Stop iteration prematurely, to only get 1 page of results.
                # break
            except Exception as e: 
                print(e)
                print("ANAL SACK")
                pass
            i+=1
output: 
757877990969659520
2848529739
rate limit exceeded: sleeping 909.0393960475922 secs
902416406771244928
rate limit exceeded: sleeping 909.7210428714752 secs

I'm performing a full archive search for tweets with a keyword that come from a list of users. I loop through search queries for each username and check for the keyword 'republican'. The problem is that it will loop through a decent number of usernames before reaching a rate limit, then each additional user search prompts a rate limit wait instead of refreshing completely. My question is basically why does it force me to wait after a single search call after a bit and what can I do to avoid this?

df = pd.read_csv("RealIdMasterList.csv")
id_str_df = df['id_str'].tolist()
theta_df = df['theta'].tolist()
accounts_followed_df=df['accounts_followed'].tolist()

# Your bearer token here
t = Twarc2(bearer_token="<token>")
# Start and end times must be in UTC
start_time = datetime.datetime(2010, 3, 21, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2022, 3, 22, 0, 0, 0, 0, datetime.timezone.utc)
# search_results is a generator, max_results is max tweets per page, 500 max for full archive search.
i = 0 
pings = 0

with open('realtweets.csv', 'w', encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    while(pings <=10):
        for x in range(len(id_str_df)):
            userid = str(int(id_str_df[i]))
            print(userid)
            q = "republican lang:en -is:retweet from:" + userid
            try: 
                search_results = list(t.search_all(query=q, start_time=start_time, end_time=end_time, max_results=100))
                count = 0 
                if search_results: 
                    pings+=1
                    for page in search_results:
                        if count<1:   
                            for tweet in ensure_flattened(page):
                                writer.writerow([tweet['id'], tweet['author_id'],  tweet['text'],  theta_df[0],  0, accounts_followed_df[0],  tweet['created_at']])
                                #print(tweet['text'] + "," + tweet['author_id'] + ',' + tweet['created_at'])
                        # Do something with the tweet
                        #print(tweet)
                    # Stop iteration prematurely, to only get 1 page of results.
                # break
            except Exception as e: 
                print(e)
                print("ANAL SACK")
                pass
            i+=1
output: 
757877990969659520
2848529739
rate limit exceeded: sleeping 909.0393960475922 secs
902416406771244928
rate limit exceeded: sleeping 909.7210428714752 secs

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文