在Twitter API中仅进行一次搜索电话后，我一直在达到速率限制

发布于 2025-02-07 13:04:29 字数 2219 浏览 2 评论 0原文

我正在对带有来自用户列表的关键字的推文进行完整的存档搜索。我循环浏览每个用户名的搜索查询，并检查关键字“共和党人”。问题在于，它将在达到速率限制之前循环遍历数量不错的用户名，然后每个其他用户搜索都会提示速率限制等待，而不是完全刷新。我的问题基本上是为什么它会迫使我在一次搜索电话后等待，我该怎么做才能避免这种情况？

df = pd.read_csv("RealIdMasterList.csv")
id_str_df = df['id_str'].tolist()
theta_df = df['theta'].tolist()
accounts_followed_df=df['accounts_followed'].tolist()

# Your bearer token here
t = Twarc2(bearer_token="<token>")
# Start and end times must be in UTC
start_time = datetime.datetime(2010, 3, 21, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2022, 3, 22, 0, 0, 0, 0, datetime.timezone.utc)
# search_results is a generator, max_results is max tweets per page, 500 max for full archive search.
i = 0 
pings = 0

with open('realtweets.csv', 'w', encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    while(pings <=10):
        for x in range(len(id_str_df)):
            userid = str(int(id_str_df[i]))
            print(userid)
            q = "republican lang:en -is:retweet from:" + userid
            try: 
                search_results = list(t.search_all(query=q, start_time=start_time, end_time=end_time, max_results=100))
                count = 0 
                if search_results: 
                    pings+=1
                    for page in search_results:
                        if count<1:   
                            for tweet in ensure_flattened(page):
                                writer.writerow([tweet['id'], tweet['author_id'],  tweet['text'],  theta_df[0],  0, accounts_followed_df[0],  tweet['created_at']])
                                #print(tweet['text'] + "," + tweet['author_id'] + ',' + tweet['created_at'])
                        # Do something with the tweet
                        #print(tweet)
                    # Stop iteration prematurely, to only get 1 page of results.
                # break
            except Exception as e: 
                print(e)
                print("ANAL SACK")
                pass
            i+=1

output: 
757877990969659520
2848529739
rate limit exceeded: sleeping 909.0393960475922 secs
902416406771244928
rate limit exceeded: sleeping 909.7210428714752 secs

原文

I'm performing a full archive search for tweets with a keyword that come from a list of users. I loop through search queries for each username and check for the keyword 'republican'. The problem is that it will loop through a decent number of usernames before reaching a rate limit, then each additional user search prompts a rate limit wait instead of refreshing completely. My question is basically why does it force me to wait after a single search call after a bit and what can I do to avoid this?

df = pd.read_csv("RealIdMasterList.csv")
id_str_df = df['id_str'].tolist()
theta_df = df['theta'].tolist()
accounts_followed_df=df['accounts_followed'].tolist()

# Your bearer token here
t = Twarc2(bearer_token="<token>")
# Start and end times must be in UTC
start_time = datetime.datetime(2010, 3, 21, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2022, 3, 22, 0, 0, 0, 0, datetime.timezone.utc)
# search_results is a generator, max_results is max tweets per page, 500 max for full archive search.
i = 0 
pings = 0

with open('realtweets.csv', 'w', encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    while(pings <=10):
        for x in range(len(id_str_df)):
            userid = str(int(id_str_df[i]))
            print(userid)
            q = "republican lang:en -is:retweet from:" + userid
            try: 
                search_results = list(t.search_all(query=q, start_time=start_time, end_time=end_time, max_results=100))
                count = 0 
                if search_results: 
                    pings+=1
                    for page in search_results:
                        if count<1:   
                            for tweet in ensure_flattened(page):
                                writer.writerow([tweet['id'], tweet['author_id'],  tweet['text'],  theta_df[0],  0, accounts_followed_df[0],  tweet['created_at']])
                                #print(tweet['text'] + "," + tweet['author_id'] + ',' + tweet['created_at'])
                        # Do something with the tweet
                        #print(tweet)
                    # Stop iteration prematurely, to only get 1 page of results.
                # break
            except Exception as e: 
                print(e)
                print("ANAL SACK")
                pass
            i+=1

output: 
757877990969659520
2848529739
rate limit exceeded: sleeping 909.0393960475922 secs
902416406771244928
rate limit exceeded: sleeping 909.7210428714752 secs

分享到QQ

分享到微博