有了Tweepy,如何从一个国家 /地区找到推文并按关键字过滤?

发布于 2025-02-12 13:54:27 字数 3406 浏览 0 评论 0原文

请,在Python3中,有可能只从某个国家搜索推文并包括其他类型的搜索?

在下面的示例中,我尝试以自2022-01-01自墨西哥的西班牙语中搜索推文,过滤转推,并在同一条推文中使用术语(Activistas+Ambientales+Ambientales+prilityes)

,但返回空白 有人知道可能是什么问题吗?

import tweepy

consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)

places = api.geo_search(query="Mexico", granularity="country")
place_id = places[0].id
place_id
'25530ba03b7d90c6'

new_search = "place:%s AND activistas+ambientales+criminales -filter:retweets" % (place_id)

tweets = tweepy.Cursor(api.search,
                   q=new_search,
                   lang="es",
                   since='2022-01-01').items(100)

米克尔·马丁内斯(Mickael Martinez)的回复之后,编辑了7/9/2022:

现在我得到了推文!谢谢!我只想知道返回的几个案例是否正常:

# I upgrade before
!pip install --upgrade tweepy

import pandas as pd
import tweepy

api_key = ''
api_key_secret = ''
bearer_token  = ''

client = tweepy.Client(bearer_token)

# In this query I search for two words + that the tweet is in Spanish + that it is in Mexico + that it is not a retweet
query = "activistas ambientales lang:es place_country:mx -is:retweet"

# I ask to fetch from the beginning of 2022
# And delimit more tweets and user fields
response = client.search_all_tweets(query,
                                    start_time = "2022-01-01T00:00:00Z",
                                    tweet_fields=["id", "author_id", "text", "created_at", "attachments", "context_annotations", "entities", "geo"],
                                    user_fields=["id", "name", "username", "created_at", "description"],
                                    expansions='author_id'
                                    )

tweets = response.data
# Save user data
users = {u["id"]: u for u in response.includes['users']}

# Create a dataframe with the data
my_demo_list = []
for tweet in tweets:
    #print(tweet.id)
    #print(tweet.text)
    #print(tweet.geo)
    
    # captures user data from the tweet that is in the iteration
    author = tweet.author_id
    #print(author)
    for tweetu in response.data:
      if users[tweetu.author_id]:
        user = users[tweetu.author_id]
        if user.id == author:
          name = user.name
          username = user.username
          user_created_at = user.created_at
          user_description = user.description

    my_demo_list.append({'tweet_id': str(tweet.id),
                             'text': str(tweet.text),
                         'name': str(name),
                         'author_id': str(tweet.author_id),
                         'username': str(username),
                         'user_created_at': str(user_created_at),
                         'user_description': str(user_description),
                             'attachments': str(tweet.attachments),
                             'author_id': str(tweet.author_id),
                             'created_at': str(tweet.created_at),
                             'context_annotations': str(tweet.context_annotations),
                             'entities': str(tweet.entities),
                             'geo': str(tweet.geo)
                            })

all_tweets_found = pd.DataFrame(my_demo_list)
all_tweets_found.shape
(10, 12)

Please, in python3 and with tweepy is it possible to search for tweets only from a certain country and include other types of search?

In the example below I try to search for tweets from Mexico, in Spanish, since 2022-01-01, filtering retweets, and with the terms in the same tweet (activistas+ambientales+criminales)

But returns empty
Does anyone know what could be wrong?

import tweepy

consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)

places = api.geo_search(query="Mexico", granularity="country")
place_id = places[0].id
place_id
'25530ba03b7d90c6'

new_search = "place:%s AND activistas+ambientales+criminales -filter:retweets" % (place_id)

tweets = tweepy.Cursor(api.search,
                   q=new_search,
                   lang="es",
                   since='2022-01-01').items(100)

Edited 7/9/2022 after Mickael Martinez's reply:

Now I got the tweets! Thanks! I just want to know if the few cases returned are normal:

# I upgrade before
!pip install --upgrade tweepy

import pandas as pd
import tweepy

api_key = ''
api_key_secret = ''
bearer_token  = ''

client = tweepy.Client(bearer_token)

# In this query I search for two words + that the tweet is in Spanish + that it is in Mexico + that it is not a retweet
query = "activistas ambientales lang:es place_country:mx -is:retweet"

# I ask to fetch from the beginning of 2022
# And delimit more tweets and user fields
response = client.search_all_tweets(query,
                                    start_time = "2022-01-01T00:00:00Z",
                                    tweet_fields=["id", "author_id", "text", "created_at", "attachments", "context_annotations", "entities", "geo"],
                                    user_fields=["id", "name", "username", "created_at", "description"],
                                    expansions='author_id'
                                    )

tweets = response.data
# Save user data
users = {u["id"]: u for u in response.includes['users']}

# Create a dataframe with the data
my_demo_list = []
for tweet in tweets:
    #print(tweet.id)
    #print(tweet.text)
    #print(tweet.geo)
    
    # captures user data from the tweet that is in the iteration
    author = tweet.author_id
    #print(author)
    for tweetu in response.data:
      if users[tweetu.author_id]:
        user = users[tweetu.author_id]
        if user.id == author:
          name = user.name
          username = user.username
          user_created_at = user.created_at
          user_description = user.description

    my_demo_list.append({'tweet_id': str(tweet.id),
                             'text': str(tweet.text),
                         'name': str(name),
                         'author_id': str(tweet.author_id),
                         'username': str(username),
                         'user_created_at': str(user_created_at),
                         'user_description': str(user_description),
                             'attachments': str(tweet.attachments),
                             'author_id': str(tweet.author_id),
                             'created_at': str(tweet.created_at),
                             'context_annotations': str(tweet.context_annotations),
                             'entities': str(tweet.entities),
                             'geo': str(tweet.geo)
                            })

all_tweets_found = pd.DataFrame(my_demo_list)
all_tweets_found.shape
(10, 12)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小兔几 2025-02-19 13:54:27

该地点操作员似乎是Twitter v1.1 API中的高级操作员(请参阅文档),因此我不确定您是否可以在标准搜索方法中使用它。

由于您可能具有高度的访问权限,因此您应该更新到最新版本的Tweepy,并且可以访问search_30_days方法。也许您可以在其中使用该位置操作员。如果那不起作用,我想您必须为高级访问付费。

在您的评论后进行编辑:

由于您有学术访问,我会为您提供:

  • 更新到当前的4.10.0版本(pip install -pip install -upgrade tweepy),因为,根据您的代码,您似乎正在使用过时的3.xx版本。
  • 这将使您可以使用Twitter API的V2,尤其是tweepy中的search_all_tweets方法(请参阅在这里)您可以在其中使用ploteplot_country操作员。
  • 将对您的代码进行一些更改,以使其适应4.x版本的Tweepy和Twitter API的V2。例如,以下代码可能是初稿:
query = ""
bearer_token = ""

client = tweepy.Client(bearer_token)

response = client.search_all_tweets(query)

print(response.data)
print(response.meta)
print(response.includes)
print(response.errors)

tweets = response.data

for tweet in tweets:
    print(tweet.id)
    print(tweet.text)

您可以阅读文档在这里构建查询。

但是...您真的需要使用tweepy吗?您是为未来用户构建应用程序,还是只是为研究收集数据?如果第二个答案是好的,我强烈建议您使用 twarc ,一种更适合此目的的命令行工具。

The place operator seems to be a premium operator in the Twitter V1.1 API (see the documentation here), so I'm not sure that you can use it in the standard search method.

Since you probably have an elevated access, you should update to the latest version of Tweepy and you may have access to the search_30_days method. Maybe that you can use the place operator in it. If that does not work, I guess that you will have to pay for a premium access.

Edit after your comment:

Since you have an academic access, I would sugest you to:

  • Update Tweepy to its current 4.10.0 version (pip install --upgrade tweepy) because, based on your code, you seem to be using an outdated 3.X.X version.
  • That will allow you to use the V2 of the Twitter API and particularly the search_all_tweets method in Tweepy (see here) where you can use place and place_country operators.
  • There will a few changes to make to your code to adapt it to the 4.X version of Tweepy and the V2 of the Twitter API. The following code could be a first draft for example:
query = ""
bearer_token = ""

client = tweepy.Client(bearer_token)

response = client.search_all_tweets(query)

print(response.data)
print(response.meta)
print(response.includes)
print(response.errors)

tweets = response.data

for tweet in tweets:
    print(tweet.id)
    print(tweet.text)

And you could read the documentation here to build the query.

But... do you really need to use Tweepy? Are you building an app for future users or are you just gathering data for your research? If the second answer is the good one, I would strongly suggest you to use Twarc, a command-line tool which is more suited for this purpose.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文