用于逐个加载 python 对象列表的适当文件格式
我有自定义 python 类 Custom
并想要转储/加载 List[Custom]
(让我从现在开始将其称为“Chunk”)。
另外,请考虑以下设置。
Custom
太复杂,无法手动编写序列化/反序列化过程。- 虽然
Custom
的实例数据量很小,但块往往很大,如 10GB。
在很多情况下我只需要一小部分块(~10MB)。目前,我使用 pickle
作为文件格式。我通过 chunk = pickle.load()
加载整个 10GB 块,并通过 chunk_use = chunk[:100]
只使用其中的一小部分。
然而,对于仅使用块的一小部分来说,这在内存/计算上效率低下。 逐个对象地加载块,那就太好了
chunk_use = []
for i in range(100):
chunk_use.append(load_data(filename, i))
因此,如果我可以像或更简洁地
chunk_use = load_data(filename, 1, 100)
是否有适当的数据格式和文件格式或库来执行此操作?
I have custom python class Custom
and want to dump/load List[Custom]
(Let me refer this to a "Chunk" from now).
Also, consider the following setting.
Custom
is too complex to hand write serialize/deserialize procedure.- Although instance of
Custom
is small data size but The chunk tend to be huge like 10GB.
There are so many situation where I need only small portion of the chunk (~10MB). Currently, I use pickle
as the file format. I load the whole 10GB chunk by chunk = pickle.load()
and use only small portion of it by like chunk_use = chunk[:100]
.
However this is memory/computationally inefficient for just use small portion of the chunk. So, it would be nice if I can load the chunk object-by-object like
chunk_use = []
for i in range(100):
chunk_use.append(load_data(filename, i))
or more concisely
chunk_use = load_data(filename, 1, 100)
Is there appropriate data format and file format or library to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论