在 Python 中附加到 JSON（最好是由于 RAM 限制）

发布于 2024-10-10 17:54:11 字数 1213 浏览 4 评论 0原文

我正在尝试找到使用 Python 将一些数据附加到 json 文件的最佳方法。基本上发生的情况是，我有大约 100 个线程打开，将数据存储到数组中。完成后，他们使用 json.dump 将其发送到 json 文件。然而，由于建立阵列可能需要几个小时，我最终会耗尽 RAM。因此，我试图了解在此过程中使用最少 RAM 的最佳方法是什么。以下是我所拥有的消耗大量内存的内容。

        i               = 0
        twitter_data    = {}
        for null in range(0,1):
            while True:
                try:
                    for friends in Cursor(api.followers_ids,screen_name=self.ip).items():
                        twitter_data[i]                     = {}
                        twitter_data[i]['fu']               = self.ip
                        twitter_data[i]['su']               = friends
                        i = i + 1
                except tweepy.TweepError, e:
                    print "ERROR on " + str(self.ip) + " Reason: ", e
                    with open('C:/Twitter/errors.txt', mode='a') as a_file:
                        new_ii = "ERROR on " + str(self.ip) + " Reason: " + str(e) + "\n"
                        a_file.write(new_ii)
                break

        ## Save data

        with open('C:/Twitter/user_' + str(self.id) + '.json', mode='w') as f:
                json.dump(twitter_data, f, indent=2, encoding='utf-8')

谢谢

原文

I'm trying to find the optimal way to append some data to a json file using Python. Basically what happens is I have about say 100 threads open storing data to an array. When they are done they send that to a json file using json.dump. However since this can take a few hours for the array to build up I end up running out of RAM eventually. So I'm trying to see what's the best way to use the least amount of RAM in this process. The following is what I have which consumes to much RAM.

        i               = 0
        twitter_data    = {}
        for null in range(0,1):
            while True:
                try:
                    for friends in Cursor(api.followers_ids,screen_name=self.ip).items():
                        twitter_data[i]                     = {}
                        twitter_data[i]['fu']               = self.ip
                        twitter_data[i]['su']               = friends
                        i = i + 1
                except tweepy.TweepError, e:
                    print "ERROR on " + str(self.ip) + " Reason: ", e
                    with open('C:/Twitter/errors.txt', mode='a') as a_file:
                        new_ii = "ERROR on " + str(self.ip) + " Reason: " + str(e) + "\n"
                        a_file.write(new_ii)
                break

        ## Save data

        with open('C:/Twitter/user_' + str(self.id) + '.json', mode='w') as f:
                json.dump(twitter_data, f, indent=2, encoding='utf-8')

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

机场等船 2024-10-17 17:54:11

在创建单个项目时将其输出为数组，并手动为其周围的数组创建 JSON 格式。 JSON 是一种简单的格式，因此这很简单。

这是一个简单的示例，它打印出 JSON 数组，而无需将整个内容保存在内存中；一次只需存储数组中的一个元素。

def get_item():
    return { "a": 5, "b": 10 }

def get_array():
    results = []
    yield "["
    for x in xrange(5):
        if x > 0:
            yield ","
        yield json.dumps(get_item())
    yield "]"

if __name__ == "__main__":
    for s in get_array():
        sys.stdout.write(s)
    sys.stdout.write("\n")

Output the individual items as an array as they're created, creating the JSON formatting for the array around it manually. JSON is a simple format, so this is trivial to do.

Here's a simple example that prints out a JSON array, without having to hold the entire contents in memory; only a single element in the array needs to be stored at once.

def get_item():
    return { "a": 5, "b": 10 }

def get_array():
    results = []
    yield "["
    for x in xrange(5):
        if x > 0:
            yield ","
        yield json.dumps(get_item())
    yield "]"

if __name__ == "__main__":
    for s in get_array():
        sys.stdout.write(s)
    sys.stdout.write("\n")

回复收藏 0 原文

傾城如夢未必闌珊 2024-10-17 17:54:11

我的看法是，建立在格伦回答的想法的基础上，但按照OP的要求序列化一个大字典，并使用更Pythonic的enumerate而不是手动递增i（可以采取错误通过为它们保留单独的计数并在写入 f 之前从 i 中减去它来考虑）：

with open('C:/Twitter/user_' + str(self.id) + '.json', mode='w') as f:
   f.write('{')
   for i, friends in enumerate(Cursor(api.followers_ids,screen_name=self.ip).items()):
        if i>0:
            f.write(", ")
        f.write("%s:%s" % (json.dumps(i), json.dumps(dict(fu=self.ip, su=friends))))
   f.write("}")

My take, building on the idea from Glenn's answer but serializing a big dict as requested by the OP and using the more pythonic enumerate instead of manually incrementing i (errors can be taken into account by keeping a separate count for them and subtracting it from i before wriring to f):

with open('C:/Twitter/user_' + str(self.id) + '.json', mode='w') as f:
   f.write('{')
   for i, friends in enumerate(Cursor(api.followers_ids,screen_name=self.ip).items()):
        if i>0:
            f.write(", ")
        f.write("%s:%s" % (json.dumps(i), json.dumps(dict(fu=self.ip, su=friends))))
   f.write("}")

回复收藏 0 原文

~没有更多了~