如何为我的 pymongo/twitter 脚本创建函数?
我正在使用 python、mongodb 和 pymongo 模块创建脚本来获取 Twitter API 的某些方面并将它们存储在 mongo 数据库中。我编写了一些脚本来执行不同的操作:访问搜索 API、访问 user_timeline 等等。然而,我刚刚开始了解我正在使用的所有工具,现在是我回去并提高其效率的时候了。因此,现在我正在努力向我的脚本添加函数和类。这是我的一个没有函数或类的脚本:
#!/usr/local/bin/python
import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection
# Twitter handle that we are scraping mentions for
SCREEN_NAME = '@twitterapi'
# Connect to the database
connection = Connection()
db = connection.test
collection = db.twitterapi_mentions # Change the name of this database
t = twitter.Twitter(domain='search.twitter.com')
# Fetch the information from the API
results = []
for i in range(2):
i+=1
response = t.search(q=SCREEN_NAME, result_type='recent', rpp=100, page=i)['results']
results.extend(response)
# Create a document in the database for each item taken from the API
for tweet in results:
id_str = tweet['id_str']
twitter_id = tweet['from_user']
tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
date = created_at.date().strftime("%m/%d/%y")
time = created_at.time().strftime("%H:%M:%S")
text = tweet['text']
identifier = {'id' : id_str}
entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
collection.update(identifier, entries, upsert = True)
这些脚本对我来说运行良好,但我必须为多个 Twitter 句柄运行相同的脚本。例如,我将复制相同的脚本并更改以下两行:
SCREEN_NAME = '@cocacola'
collection = db.cocacola_mentions
因此,我同时提到了 @twitterapi 和 @cocacola。我想了很多关于如何将其变成一个函数。我遇到的最大问题是找到一种方法来更改集合的名称。例如,考虑这个脚本:
#!/usr/local/bin/python
import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection
def getMentions(screen_name):
# Connect to the database
connection = Connection()
db = connection.test
collection = db.screen_name # Change the name of this database
t = twitter.Twitter(domain='search.twitter.com')
# Fetch the information from the API
results = []
for i in range(2):
i+=1
response = t.search(q=screen_name, result_type='recent', rpp=100, page=i) ['results']
results.extend(response)
# Create a document in the database for each item taken from the API
for tweet in results:
id_str = tweet['id_str']
twitter_id = tweet['from_user']
tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
date = created_at.date().strftime("%m/%d/%y")
time = created_at.time().strftime("%H:%M:%S")
text = tweet['text']
identifier = {'id' : id_str}
entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
collection.update(identifier, entries, upsert = True)
getMentions("@twitterapi")
getMentions("@cocacola")
如果我使用上面的脚本,那么所有数据都存储在集合“screen_name”中,但我希望它存储在传递的屏幕名称中。理想情况下,我希望 @twitterapi 提及位于“twitterapi_mentions”集合中,并且我希望 @cocacola 提及位于“cocacola_mentions”集合中。我相信使用 pymongo 的 Collection 类可能是答案,我已经阅读了文档,但似乎无法让它工作。如果您对如何使该脚本更加高效有其他建议,我们将非常感激。否则,请原谅我所犯的任何错误,正如我所说,我是新手。
I'm working on creating scripts using python, mongodb and the pymongo module to fetch certain aspects of the Twitter API and store them in a mongo database. I've written some scripts to do different things: access the search API, access the user_timeline, and more. However, I have been just getting to know all of the tools that I'm working with and it's time for me to go back and make it more efficient. Thus, right now I'm working on adding functions and classes to my scripts. Here is one of my scripts without functions or classes:
#!/usr/local/bin/python
import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection
# Twitter handle that we are scraping mentions for
SCREEN_NAME = '@twitterapi'
# Connect to the database
connection = Connection()
db = connection.test
collection = db.twitterapi_mentions # Change the name of this database
t = twitter.Twitter(domain='search.twitter.com')
# Fetch the information from the API
results = []
for i in range(2):
i+=1
response = t.search(q=SCREEN_NAME, result_type='recent', rpp=100, page=i)['results']
results.extend(response)
# Create a document in the database for each item taken from the API
for tweet in results:
id_str = tweet['id_str']
twitter_id = tweet['from_user']
tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
date = created_at.date().strftime("%m/%d/%y")
time = created_at.time().strftime("%H:%M:%S")
text = tweet['text']
identifier = {'id' : id_str}
entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
collection.update(identifier, entries, upsert = True)
These scripts have been working well for me, but I have to run the same script for multiple twitter handles. For instance I'll copy the same script and change the following two lines:
SCREEN_NAME = '@cocacola'
collection = db.cocacola_mentions
Thus I'm getting mentions for both @twitterapi and @cocacola. I've thought a lot about how I can make this into a function. The biggest problem that I've run into is finding a way to change the name of the collection. For instance, consider this script:
#!/usr/local/bin/python
import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection
def getMentions(screen_name):
# Connect to the database
connection = Connection()
db = connection.test
collection = db.screen_name # Change the name of this database
t = twitter.Twitter(domain='search.twitter.com')
# Fetch the information from the API
results = []
for i in range(2):
i+=1
response = t.search(q=screen_name, result_type='recent', rpp=100, page=i) ['results']
results.extend(response)
# Create a document in the database for each item taken from the API
for tweet in results:
id_str = tweet['id_str']
twitter_id = tweet['from_user']
tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
date = created_at.date().strftime("%m/%d/%y")
time = created_at.time().strftime("%H:%M:%S")
text = tweet['text']
identifier = {'id' : id_str}
entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
collection.update(identifier, entries, upsert = True)
getMentions("@twitterapi")
getMentions("@cocacola")
If I use the above script then all of the data is stored in the collection "screen_name" but I want it to be stored in the screen name that is passed through. Ideally, I want @twitterapi mentions to be in a "twitterapi_mentions" collection and I want @cocacola mentions to be in a "cocacola_mentions" collection. I believe that using the Collection class of pymongo might be the answer and I've read the documentation but can't seem to get it to work. If you have other suggestions of how I should make this script more efficient they would be incredibly appreciated. Otherwise, please excuse any mistakes I've made, as I said, I'm new to this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用 getattr 按字符串名称检索属性:
Use getattr to retrieve the attribute by string name:
我会同意:
我认为这更简单。
I'd go with:
I think it's more straightforward.