如何“一一”编写列表到Python中的二进制文件？

发布于 2024-11-06 07:42:30 字数 179 浏览 0 评论 0原文

我有一段代码，它在每次迭代中生成相当大的列表。为了节省内存，我想在生成列表后的每次迭代中将每个列表写入二进制文件。我已经尝试过使用文本文件（甚至在linux中将参数设置为“wb”）。 “wb”似乎对以二进制或文本格式写入的文件没有任何影响。而且，写入的文件很大，我不想要这个。我确信如果我能以二进制格式编写这些列表，这个文件将会小得多。谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泪眸﹌ 2024-11-13 07:42:30

既然您提到了可压缩性的需要，我建议使用 pickle 使用 gzip 模块来压缩输出。您可以一次写入和读回一个列表，以下是如何操作的示例：

import gzip, pickle

output = gzip.open('pickled.gz', 'wb', compresslevel=9)

for x in range(10):
     output.write(pickle.dumps(range(10)) + '\n\n')
output.close()

然后使用生成器一次生成一个列表：

def unpickler(input):
    partial = []
    for line in input:
        partial.append(line)
        if line == '\n':
            obj = ''.join(partial)
            partial = []
            yield pickle.loads(obj)

input = gzip.open('pickled.gz', 'rb')
for l in unpickler(input):
    print l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Since you mentioned the need for compressibility, I'd suggest using pickle with the gzip module to compress your output. You can write and read back your lists one at a time, here's an example of how:

import gzip, pickle

output = gzip.open('pickled.gz', 'wb', compresslevel=9)

for x in range(10):
     output.write(pickle.dumps(range(10)) + '\n\n')
output.close()

And then use a generator to yield the lists back one at a time:

def unpickler(input):
    partial = []
    for line in input:
        partial.append(line)
        if line == '\n':
            obj = ''.join(partial)
            partial = []
            yield pickle.loads(obj)

input = gzip.open('pickled.gz', 'rb')
for l in unpickler(input):
    print l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

回复收藏 0 原文

浅暮の光 2024-11-13 07:42:30

您可以使用 cPickle 序列化您的列表并将结果转储到文件中。

回复收藏 0 原文

水水月牙 2024-11-13 07:42:30

'b' 标志唯一改变的是如何完成换行转换以支持 Windows。

import pickle
help(pickle.load)
help(pickle.dump)

# seems fairly efficient, taking 200bytes to store [1,2,...,100],
# 2.7kb to store [1,2,...,1000],
# and 29kb to store [1,2,...,10000]:
>>> len(pickle.dumps(list(range(100))))
208
>>> len(pickle.dumps(list(range(1000))))
2752
>>> len(pickle.dumps(list(range(10000))))
29770

#create and store
data = {}
data['myList'] = [i for i in range(100)]
with open('myfile.pickle', 'wb') as f:
    pickle.dump(data, f)

# retrieve
with open('myfile.pickle', 'wb') as f:
    data2 = pickle.load(f)
print(data2)

请注意，对任何用户提供的数据使用 pickle 是不安全的。您将需要以二进制模式打开正在写入的文件。

The only thing the 'b' flag changes is how linebreak translations are done to support Windows.

import pickle
help(pickle.load)
help(pickle.dump)

# seems fairly efficient, taking 200bytes to store [1,2,...,100],
# 2.7kb to store [1,2,...,1000],
# and 29kb to store [1,2,...,10000]:
>>> len(pickle.dumps(list(range(100))))
208
>>> len(pickle.dumps(list(range(1000))))
2752
>>> len(pickle.dumps(list(range(10000))))
29770

#create and store
data = {}
data['myList'] = [i for i in range(100)]
with open('myfile.pickle', 'wb') as f:
    pickle.dump(data, f)

# retrieve
with open('myfile.pickle', 'wb') as f:
    data2 = pickle.load(f)
print(data2)

Note that it is insecure to use pickle on any user-supplied data. You will want to open the file you are writing to in binary mode.

回复收藏 0 原文

~没有更多了~