如何为我的 pymongo/twitter 脚本创建函数？

发布于 2024-12-07 17:13:31 字数 3439 浏览 1 评论 0原文

我正在使用 python、mongodb 和 pymongo 模块创建脚本来获取 Twitter API 的某些方面并将它们存储在 mongo 数据库中。我编写了一些脚本来执行不同的操作：访问搜索 API、访问 user_timeline 等等。然而，我刚刚开始了解我正在使用的所有工具，现在是我回去并提高其效率的时候了。因此，现在我正在努力向我的脚本添加函数和类。这是我的一个没有函数或类的脚本：

#!/usr/local/bin/python

import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection

# Twitter handle that we are scraping mentions for
SCREEN_NAME = '@twitterapi'

# Connect to the database
connection = Connection()
db = connection.test    
collection = db.twitterapi_mentions  # Change the name of this database
t = twitter.Twitter(domain='search.twitter.com')

# Fetch the information from the API
results = []
for i in range(2):
    i+=1
    response = t.search(q=SCREEN_NAME, result_type='recent', rpp=100, page=i)['results']
    results.extend(response)

# Create a document in the database for each item taken from the API
for tweet in results:
    id_str = tweet['id_str']
    twitter_id = tweet['from_user']
    tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
    created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
    date = created_at.date().strftime("%m/%d/%y")
    time = created_at.time().strftime("%H:%M:%S")
    text = tweet['text']
    identifier = {'id' : id_str}
    entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
    collection.update(identifier, entries, upsert = True)

这些脚本对我来说运行良好，但我必须为多个 Twitter 句柄运行相同的脚本。例如，我将复制相同的脚本并更改以下两行：

SCREEN_NAME = '@cocacola'

collection = db.cocacola_mentions

因此，我同时提到了 @twitterapi 和 @cocacola。我想了很多关于如何将其变成一个函数。我遇到的最大问题是找到一种方法来更改集合的名称。例如，考虑这个脚本：

#!/usr/local/bin/python

import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection

def getMentions(screen_name):

    # Connect to the database
    connection = Connection()
    db = connection.test    
    collection = db.screen_name  # Change the name of this database
    t = twitter.Twitter(domain='search.twitter.com')

    # Fetch the information from the API
    results = []
    for i in range(2):
        i+=1
        response = t.search(q=screen_name, result_type='recent', rpp=100, page=i)    ['results']
        results.extend(response)

    # Create a document in the database for each item taken from the API
    for tweet in results:
        id_str = tweet['id_str']
        twitter_id = tweet['from_user']
        tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
        created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
        date = created_at.date().strftime("%m/%d/%y")
        time = created_at.time().strftime("%H:%M:%S")
        text = tweet['text']
        identifier = {'id' : id_str}
        entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
        collection.update(identifier, entries, upsert = True)

getMentions("@twitterapi")
getMentions("@cocacola")

如果我使用上面的脚本，那么所有数据都存储在集合“screen_name”中，但我希望它存储在传递的屏幕名称中。理想情况下，我希望 @twitterapi 提及位于“twitterapi_mentions”集合中，并且我希望 @cocacola 提及位于“cocacola_mentions”集合中。我相信使用 pymongo 的 Collection 类可能是答案，我已经阅读了文档，但似乎无法让它工作。如果您对如何使该脚本更加高效有其他建议，我们将非常感激。否则，请原谅我所犯的任何错误，正如我所说，我是新手。

原文

I'm working on creating scripts using python, mongodb and the pymongo module to fetch certain aspects of the Twitter API and store them in a mongo database. I've written some scripts to do different things: access the search API, access the user_timeline, and more. However, I have been just getting to know all of the tools that I'm working with and it's time for me to go back and make it more efficient. Thus, right now I'm working on adding functions and classes to my scripts. Here is one of my scripts without functions or classes:

#!/usr/local/bin/python

import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection

# Twitter handle that we are scraping mentions for
SCREEN_NAME = '@twitterapi'

# Connect to the database
connection = Connection()
db = connection.test    
collection = db.twitterapi_mentions  # Change the name of this database
t = twitter.Twitter(domain='search.twitter.com')

# Fetch the information from the API
results = []
for i in range(2):
    i+=1
    response = t.search(q=SCREEN_NAME, result_type='recent', rpp=100, page=i)['results']
    results.extend(response)

# Create a document in the database for each item taken from the API
for tweet in results:
    id_str = tweet['id_str']
    twitter_id = tweet['from_user']
    tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
    created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
    date = created_at.date().strftime("%m/%d/%y")
    time = created_at.time().strftime("%H:%M:%S")
    text = tweet['text']
    identifier = {'id' : id_str}
    entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
    collection.update(identifier, entries, upsert = True)

These scripts have been working well for me, but I have to run the same script for multiple twitter handles. For instance I'll copy the same script and change the following two lines:

SCREEN_NAME = '@cocacola'

collection = db.cocacola_mentions

Thus I'm getting mentions for both @twitterapi and @cocacola. I've thought a lot about how I can make this into a function. The biggest problem that I've run into is finding a way to change the name of the collection. For instance, consider this script:

#!/usr/local/bin/python

import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection

def getMentions(screen_name):

    # Connect to the database
    connection = Connection()
    db = connection.test    
    collection = db.screen_name  # Change the name of this database
    t = twitter.Twitter(domain='search.twitter.com')

    # Fetch the information from the API
    results = []
    for i in range(2):
        i+=1
        response = t.search(q=screen_name, result_type='recent', rpp=100, page=i)    ['results']
        results.extend(response)

    # Create a document in the database for each item taken from the API
    for tweet in results:
        id_str = tweet['id_str']
        twitter_id = tweet['from_user']
        tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
        created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
        date = created_at.date().strftime("%m/%d/%y")
        time = created_at.time().strftime("%H:%M:%S")
        text = tweet['text']
        identifier = {'id' : id_str}
        entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
        collection.update(identifier, entries, upsert = True)

getMentions("@twitterapi")
getMentions("@cocacola")

If I use the above script then all of the data is stored in the collection "screen_name" but I want it to be stored in the screen name that is passed through. Ideally, I want @twitterapi mentions to be in a "twitterapi_mentions" collection and I want @cocacola mentions to be in a "cocacola_mentions" collection. I believe that using the Collection class of pymongo might be the answer and I've read the documentation but can't seem to get it to work. If you have other suggestions of how I should make this script more efficient they would be incredibly appreciated. Otherwise, please excuse any mistakes I've made, as I said, I'm new to this.

分享到QQ

分享到微博