在 Python 中附加到 JSON(最好是由于 RAM 限制)
我正在尝试找到使用 Python 将一些数据附加到 json 文件的最佳方法。基本上发生的情况是,我有大约 100 个线程打开,将数据存储到数组中。完成后,他们使用 json.dump 将其发送到 json 文件。然而,由于建立阵列可能需要几个小时,我最终会耗尽 RAM。因此,我试图了解在此过程中使用最少 RAM 的最佳方法是什么。以下是我所拥有的消耗大量内存的内容。
i = 0
twitter_data = {}
for null in range(0,1):
while True:
try:
for friends in Cursor(api.followers_ids,screen_name=self.ip).items():
twitter_data[i] = {}
twitter_data[i]['fu'] = self.ip
twitter_data[i]['su'] = friends
i = i + 1
except tweepy.TweepError, e:
print "ERROR on " + str(self.ip) + " Reason: ", e
with open('C:/Twitter/errors.txt', mode='a') as a_file:
new_ii = "ERROR on " + str(self.ip) + " Reason: " + str(e) + "\n"
a_file.write(new_ii)
break
## Save data
with open('C:/Twitter/user_' + str(self.id) + '.json', mode='w') as f:
json.dump(twitter_data, f, indent=2, encoding='utf-8')
谢谢
I'm trying to find the optimal way to append some data to a json file using Python. Basically what happens is I have about say 100 threads open storing data to an array. When they are done they send that to a json file using json.dump. However since this can take a few hours for the array to build up I end up running out of RAM eventually. So I'm trying to see what's the best way to use the least amount of RAM in this process. The following is what I have which consumes to much RAM.
i = 0
twitter_data = {}
for null in range(0,1):
while True:
try:
for friends in Cursor(api.followers_ids,screen_name=self.ip).items():
twitter_data[i] = {}
twitter_data[i]['fu'] = self.ip
twitter_data[i]['su'] = friends
i = i + 1
except tweepy.TweepError, e:
print "ERROR on " + str(self.ip) + " Reason: ", e
with open('C:/Twitter/errors.txt', mode='a') as a_file:
new_ii = "ERROR on " + str(self.ip) + " Reason: " + str(e) + "\n"
a_file.write(new_ii)
break
## Save data
with open('C:/Twitter/user_' + str(self.id) + '.json', mode='w') as f:
json.dump(twitter_data, f, indent=2, encoding='utf-8')
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在创建单个项目时将其输出为数组,并手动为其周围的数组创建 JSON 格式。 JSON 是一种简单的格式,因此这很简单。
这是一个简单的示例,它打印出 JSON 数组,而无需将整个内容保存在内存中;一次只需存储数组中的一个元素。
Output the individual items as an array as they're created, creating the JSON formatting for the array around it manually. JSON is a simple format, so this is trivial to do.
Here's a simple example that prints out a JSON array, without having to hold the entire contents in memory; only a single element in the array needs to be stored at once.
我的看法是,建立在格伦回答的想法的基础上,但按照OP的要求序列化一个大字典,并使用更Pythonic的
enumerate
而不是手动递增i
(可以采取错误通过为它们保留单独的计数并在写入f
之前从i
中减去它来考虑):My take, building on the idea from Glenn's answer but serializing a big dict as requested by the OP and using the more pythonic
enumerate
instead of manually incrementingi
(errors can be taken into account by keeping a separate count for them and subtracting it fromi
before wriring tof
):