用于逐个加载 python 对象列表的适当文件格式

发布于 2025-01-10 09:28:35 字数 756 浏览 1 评论 0原文

我有自定义 python 类 Custom 并想要转储/加载 List[Custom] （让我从现在开始将其称为“Chunk”）。

另外，请考虑以下设置。

Custom 太复杂，无法手动编写序列化/反序列化过程。
虽然 Custom 的实例数据量很小，但块往往很大，如 10GB。

在很多情况下我只需要一小部分块（~10MB）。目前，我使用 pickle 作为文件格式。我通过 chunk = pickle.load() 加载整个 10GB 块，并通过 chunk_use = chunk[:100] 只使用其中的一小部分。

然而，对于仅使用块的一小部分来说，这在内存/计算上效率低下。逐个对象地加载块，那就太好了

chunk_use = []
for i in range(100):
    chunk_use.append(load_data(filename, i))

因此，如果我可以像或更简洁地

chunk_use = load_data(filename, 1, 100)

是否有适当的数据格式和文件格式或库来执行此操作？

原文

I have custom python class Custom and want to dump/load List[Custom] (Let me refer this to a "Chunk" from now).

Also, consider the following setting.

Custom is too complex to hand write serialize/deserialize procedure.
Although instance of Custom is small data size but The chunk tend to be huge like 10GB.

There are so many situation where I need only small portion of the chunk (~10MB). Currently, I use pickle as the file format. I load the whole 10GB chunk by chunk = pickle.load() and use only small portion of it by like chunk_use = chunk[:100].

However this is memory/computationally inefficient for just use small portion of the chunk. So, it would be nice if I can load the chunk object-by-object like

chunk_use = []
for i in range(100):
    chunk_use.append(load_data(filename, i))

or more concisely

chunk_use = load_data(filename, 1, 100)

Is there appropriate data format and file format or library to do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

关于作者

街道布景

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

用于逐个加载 python 对象列表的适当文件格式

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

檐前雨

鹿港巷口少年归

qq_32QL4xcD

sum_

DLL

唐婉

友情链接

用于逐个加载 python 对象列表的适当文件格式

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

檐前雨

鹿港巷口少年归

qq_32QL4xcD

sum_

DLL

唐婉

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。