如何“一一”编写列表到Python中的二进制文件?

发布于 2024-11-06 07:42:30 字数 179 浏览 0 评论 0原文

我有一段代码,它在每次迭代中生成相当大的列表。为了节省内存,我想在生成列表后的每次迭代中将每个列表写入二进制文件。 我已经尝试过使用文本文件(甚至在linux中将参数设置为“wb”)。 “wb”似乎对以二进制或文本格式写入的文件没有任何影响。而且,写入的文件很大,我不想要这个。我确信如果我能以二进制格式编写这些列表,这个文件将会小得多。 谢谢

I have a piece of code which generates quite large lists in each iteration. To save memory I want to write each list to a binary file in each iteration after the list has been generated.
I have tried this with text files(even setting the parameter to "wb" in linux). "wb" seems not to have any effect for the file to be written in binary or text format. Moreover, the written file is huge and I don't want this. I am sure that If i can write these lists in binary format this file will be much smaller.
thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

泪眸﹌ 2024-11-13 07:42:30

既然您提到了可压缩性的需要,我建议使用 pickle 使用 gzip 模块来压缩输出。您可以一次写入和读回一个列表,以下是如何操作的示例:

import gzip, pickle

output = gzip.open('pickled.gz', 'wb', compresslevel=9)

for x in range(10):
     output.write(pickle.dumps(range(10)) + '\n\n')
output.close()

然后使用生成器一次生成一个列表:

def unpickler(input):
    partial = []
    for line in input:
        partial.append(line)
        if line == '\n':
            obj = ''.join(partial)
            partial = []
            yield pickle.loads(obj)

input = gzip.open('pickled.gz', 'rb')
for l in unpickler(input):
    print l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Since you mentioned the need for compressibility, I'd suggest using pickle with the gzip module to compress your output. You can write and read back your lists one at a time, here's an example of how:

import gzip, pickle

output = gzip.open('pickled.gz', 'wb', compresslevel=9)

for x in range(10):
     output.write(pickle.dumps(range(10)) + '\n\n')
output.close()

And then use a generator to yield the lists back one at a time:

def unpickler(input):
    partial = []
    for line in input:
        partial.append(line)
        if line == '\n':
            obj = ''.join(partial)
            partial = []
            yield pickle.loads(obj)

input = gzip.open('pickled.gz', 'rb')
for l in unpickler(input):
    print l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
浅暮の光 2024-11-13 07:42:30

您可以使用 cPickle 序列化您的列表并将结果转储到文件中。

You can use cPickle to serialize your lists and dump the result to a file.

水水月牙 2024-11-13 07:42:30

'b' 标志唯一改变的是如何完成换行转换以支持 Windows。

import pickle
help(pickle.load)
help(pickle.dump)

# seems fairly efficient, taking 200bytes to store [1,2,...,100],
# 2.7kb to store [1,2,...,1000],
# and 29kb to store [1,2,...,10000]:
>>> len(pickle.dumps(list(range(100))))
208
>>> len(pickle.dumps(list(range(1000))))
2752
>>> len(pickle.dumps(list(range(10000))))
29770

#create and store
data = {}
data['myList'] = [i for i in range(100)]
with open('myfile.pickle', 'wb') as f:
    pickle.dump(data, f)

# retrieve
with open('myfile.pickle', 'wb') as f:
    data2 = pickle.load(f)
print(data2)

请注意,对任何用户提供的数据使用 pickle 是不安全的。您将需要以二进制模式打开正在写入的文件。

The only thing the 'b' flag changes is how linebreak translations are done to support Windows.

import pickle
help(pickle.load)
help(pickle.dump)

# seems fairly efficient, taking 200bytes to store [1,2,...,100],
# 2.7kb to store [1,2,...,1000],
# and 29kb to store [1,2,...,10000]:
>>> len(pickle.dumps(list(range(100))))
208
>>> len(pickle.dumps(list(range(1000))))
2752
>>> len(pickle.dumps(list(range(10000))))
29770

#create and store
data = {}
data['myList'] = [i for i in range(100)]
with open('myfile.pickle', 'wb') as f:
    pickle.dump(data, f)

# retrieve
with open('myfile.pickle', 'wb') as f:
    data2 = pickle.load(f)
print(data2)

Note that it is insecure to use pickle on any user-supplied data. You will want to open the file you are writing to in binary mode.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文