有了Tweepy,如何从一个国家 /地区找到推文并按关键字过滤?
请,在Python3中,有可能只从某个国家搜索推文并包括其他类型的搜索?
在下面的示例中,我尝试以自2022-01-01自墨西哥的西班牙语中搜索推文,过滤转推,并在同一条推文中使用术语(Activistas+Ambientales+Ambientales+prilityes)
,但返回空白 有人知道可能是什么问题吗?
import tweepy
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
places = api.geo_search(query="Mexico", granularity="country")
place_id = places[0].id
place_id
'25530ba03b7d90c6'
new_search = "place:%s AND activistas+ambientales+criminales -filter:retweets" % (place_id)
tweets = tweepy.Cursor(api.search,
q=new_search,
lang="es",
since='2022-01-01').items(100)
米克尔·马丁内斯(Mickael Martinez)的回复之后,编辑了7/9/2022:
现在我得到了推文!谢谢!我只想知道返回的几个案例是否正常:
# I upgrade before
!pip install --upgrade tweepy
import pandas as pd
import tweepy
api_key = ''
api_key_secret = ''
bearer_token = ''
client = tweepy.Client(bearer_token)
# In this query I search for two words + that the tweet is in Spanish + that it is in Mexico + that it is not a retweet
query = "activistas ambientales lang:es place_country:mx -is:retweet"
# I ask to fetch from the beginning of 2022
# And delimit more tweets and user fields
response = client.search_all_tweets(query,
start_time = "2022-01-01T00:00:00Z",
tweet_fields=["id", "author_id", "text", "created_at", "attachments", "context_annotations", "entities", "geo"],
user_fields=["id", "name", "username", "created_at", "description"],
expansions='author_id'
)
tweets = response.data
# Save user data
users = {u["id"]: u for u in response.includes['users']}
# Create a dataframe with the data
my_demo_list = []
for tweet in tweets:
#print(tweet.id)
#print(tweet.text)
#print(tweet.geo)
# captures user data from the tweet that is in the iteration
author = tweet.author_id
#print(author)
for tweetu in response.data:
if users[tweetu.author_id]:
user = users[tweetu.author_id]
if user.id == author:
name = user.name
username = user.username
user_created_at = user.created_at
user_description = user.description
my_demo_list.append({'tweet_id': str(tweet.id),
'text': str(tweet.text),
'name': str(name),
'author_id': str(tweet.author_id),
'username': str(username),
'user_created_at': str(user_created_at),
'user_description': str(user_description),
'attachments': str(tweet.attachments),
'author_id': str(tweet.author_id),
'created_at': str(tweet.created_at),
'context_annotations': str(tweet.context_annotations),
'entities': str(tweet.entities),
'geo': str(tweet.geo)
})
all_tweets_found = pd.DataFrame(my_demo_list)
all_tweets_found.shape
(10, 12)
Please, in python3 and with tweepy is it possible to search for tweets only from a certain country and include other types of search?
In the example below I try to search for tweets from Mexico, in Spanish, since 2022-01-01, filtering retweets, and with the terms in the same tweet (activistas+ambientales+criminales)
But returns empty
Does anyone know what could be wrong?
import tweepy
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
places = api.geo_search(query="Mexico", granularity="country")
place_id = places[0].id
place_id
'25530ba03b7d90c6'
new_search = "place:%s AND activistas+ambientales+criminales -filter:retweets" % (place_id)
tweets = tweepy.Cursor(api.search,
q=new_search,
lang="es",
since='2022-01-01').items(100)
Edited 7/9/2022 after Mickael Martinez's reply:
Now I got the tweets! Thanks! I just want to know if the few cases returned are normal:
# I upgrade before
!pip install --upgrade tweepy
import pandas as pd
import tweepy
api_key = ''
api_key_secret = ''
bearer_token = ''
client = tweepy.Client(bearer_token)
# In this query I search for two words + that the tweet is in Spanish + that it is in Mexico + that it is not a retweet
query = "activistas ambientales lang:es place_country:mx -is:retweet"
# I ask to fetch from the beginning of 2022
# And delimit more tweets and user fields
response = client.search_all_tweets(query,
start_time = "2022-01-01T00:00:00Z",
tweet_fields=["id", "author_id", "text", "created_at", "attachments", "context_annotations", "entities", "geo"],
user_fields=["id", "name", "username", "created_at", "description"],
expansions='author_id'
)
tweets = response.data
# Save user data
users = {u["id"]: u for u in response.includes['users']}
# Create a dataframe with the data
my_demo_list = []
for tweet in tweets:
#print(tweet.id)
#print(tweet.text)
#print(tweet.geo)
# captures user data from the tweet that is in the iteration
author = tweet.author_id
#print(author)
for tweetu in response.data:
if users[tweetu.author_id]:
user = users[tweetu.author_id]
if user.id == author:
name = user.name
username = user.username
user_created_at = user.created_at
user_description = user.description
my_demo_list.append({'tweet_id': str(tweet.id),
'text': str(tweet.text),
'name': str(name),
'author_id': str(tweet.author_id),
'username': str(username),
'user_created_at': str(user_created_at),
'user_description': str(user_description),
'attachments': str(tweet.attachments),
'author_id': str(tweet.author_id),
'created_at': str(tweet.created_at),
'context_annotations': str(tweet.context_annotations),
'entities': str(tweet.entities),
'geo': str(tweet.geo)
})
all_tweets_found = pd.DataFrame(my_demo_list)
all_tweets_found.shape
(10, 12)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该地点操作员似乎是Twitter v1.1 API中的高级操作员(请参阅文档),因此我不确定您是否可以在标准搜索方法中使用它。
由于您可能具有高度的访问权限,因此您应该更新到最新版本的Tweepy,并且可以访问
search_30_days
方法。也许您可以在其中使用该位置操作员。如果那不起作用,我想您必须为高级访问付费。在您的评论后进行编辑:
由于您有学术访问,我会为您提供:
pip install -pip install -upgrade tweepy
),因为,根据您的代码,您似乎正在使用过时的3.xx版本。search_all_tweets
方法(请参阅在这里)您可以在其中使用plote
和plot_country
操作员。您可以阅读文档在这里构建查询。
但是...您真的需要使用tweepy吗?您是为未来用户构建应用程序,还是只是为研究收集数据?如果第二个答案是好的,我强烈建议您使用 twarc ,一种更适合此目的的命令行工具。
The place operator seems to be a premium operator in the Twitter V1.1 API (see the documentation here), so I'm not sure that you can use it in the standard search method.
Since you probably have an elevated access, you should update to the latest version of Tweepy and you may have access to the
search_30_days
method. Maybe that you can use the place operator in it. If that does not work, I guess that you will have to pay for a premium access.Edit after your comment:
Since you have an academic access, I would sugest you to:
pip install --upgrade tweepy
) because, based on your code, you seem to be using an outdated 3.X.X version.search_all_tweets
method in Tweepy (see here) where you can useplace
andplace_country
operators.And you could read the documentation here to build the query.
But... do you really need to use Tweepy? Are you building an app for future users or are you just gathering data for your research? If the second answer is the good one, I would strongly suggest you to use Twarc, a command-line tool which is more suited for this purpose.