python 中基本类型的轻量级 pickle?
我想做的就是序列化和反序列化字符串或整数的元组。
我查看了 pickle.dumps() 但字节开销很大。 基本上看起来它占用的空间是所需空间的 4 倍。 此外,我只需要基本类型,不需要序列化对象。
marshal 在空间方面稍好一些,但结果充满了令人讨厌的 \x00 字节。 理想情况下,我希望结果是人类可读的。
我想过只使用 repr() 和 eval(),但是有没有一种简单的方法可以在不使用 eval() 的情况下完成此任务?
这是存储在数据库中,而不是文件中。 字节开销很重要,因为它可能会导致需要 TEXT 列与 varchar 之间的差异,并且通常数据紧凑性会影响数据库性能的所有方面。
All I want to do is serialize and unserialize tuples of strings or ints.
I looked at pickle.dumps() but the byte overhead is significant. Basically it looks like it takes up about 4x as much space as it needs to. Besides, all I need is basic types and have no need to serialize objects.
marshal is a little better in terms of space but the result is full of nasty \x00 bytes. Ideally I would like the result to be human readable.
I thought of just using repr() and eval(), but is there a simple way I could accomplish this without using eval()?
This is getting stored in a db, not a file. Byte overhead matters because it could make the difference between requiring a TEXT column versus a varchar, and generally data compactness affects all areas of db performance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
看一下json,至少生成的
转储
是可以用许多其他语言阅读。Take a look at json, at least the generated
dumps
are readable with many other languages.我个人会使用yaml。 它在编码大小方面与 json 相当,但在必要时它可以表示一些更复杂的事物(例如类、递归结构)。
personally i would use yaml. it's on par with json for encoding size, but it can represent some more complex things (e.g. classes, recursive structures) when necessary.
也许您没有使用正确的协议:
请参阅 pickle 数据格式的文档。
Maybe you're not using the right protocol:
See the documentation for pickle data formats.
如果您需要一个节省空间的解决方案,您可以使用 Google Protocol buffers。
协议缓冲区 - 编码
协议缓冲区 - Python 教程
If you need a space efficient solution you can use Google Protocol buffers.
Protocol buffers - Encoding
Protocol buffers - Python Tutorial
python 文档 中提到了一些持久性内置函数,但我认为没有这些生成的文件大小明显较小。
您始终可以使用 configparser 但在那里您只能得到 string、int、float、bool 。
There are some persistence builtins mentioned in the python documentation but I don't think any of these is remarkable smaller in the produced filesize.
You could alway use the configparser but there you only get string, int, float, bool.
“字节开销很大”
为什么这很重要? 它完成了这项工作。 如果您的磁盘空间不足,我很乐意以 500 美元的价格卖给您 1Tb。
你运行了吗? 性能有问题吗? 您能否证明序列化的性能是问题?
“我想过只使用 repr() 和 eval(),但是有没有一种简单的方法可以在不使用 eval() 的情况下完成此任务?”
没有什么比 repr 和 eval 更简单了。
评估有什么问题?
是“有人可能将恶意代码插入到我序列化列表的文件中”问题吗?
具体来说,谁会找到并编辑该文件以插入恶意代码? 您为保护这一点所做的任何事情(即加密)都会从中删除“简单”。
"the byte overhead is significant"
Why does this matter? It does the job. If you're running low on disk space, I'd be glad to sell you a 1Tb for $500.
Have you run it? Is performance a problem? Can you demonstrate that the performance of serialization is the problem?
"I thought of just using repr() and eval(), but is there a simple way I could accomplish this without using eval()?"
Nothing simpler than repr and eval.
What's wrong with eval?
Is is the "someone could insert malicious code into the file where I serialized my lists" issue?
Who -- specifically -- is going to find and edit this file to put in malicious code? Anything you do to secure this (i.e., encryption) removes "simple" from it.
幸运的是,有一个使用压缩的解决方案,并解决了
涉及任意 Python 对象的一般问题
包括新课程。 而不是单纯的微观管理
元组有时最好使用 DRY 工具。
您的代码将更加清晰且易于重构
在未来类似的情况下。
y_serial.py 模块 :: 使用 SQLite 仓库 Python 对象
“序列化 + 持久化 :: 只需几行代码,即可将 Python 对象压缩并注释到 SQLite 中;然后通过关键字按时间顺序检索它们,无需任何 SQL。最有用的“标准”模块存储无模式数据的数据库。”
http://yserial.sourceforge.net
[如果您仍然担心,为什么不将这些元组放入
字典,然后将 y_serial 应用于字典。
由于透明,可能任何开销都会消失
zlib 在后台进行压缩。]
至于可读性,文档还提供了有关的详细信息
为什么选择 cPickle 而不是 json。
Luckily there is solution which uses COMPRESSION, and solves
the general problem involving any arbitrary Python object
including new classes. Rather than micro-manage mere
tuples sometimes it's better to use a DRY tool.
Your code will be more crisp and readily refactored
in similar future situations.
y_serial.py module :: warehouse Python objects with SQLite
"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data."
http://yserial.sourceforge.net
[If you are still concerned, why not stick those tuples in
a dictionary, then apply y_serial to the dictionary.
Probably any overhead will vanish due to the transparent
compression in the background by zlib.]
As to readability, the documentation also gives details on
why cPickle was selected over json.