Pickle 还是 json?
我需要将一个 dict
对象保存到磁盘,其键为 str
类型,值为 int
s ,然后恢复它。像这样:
{'juanjo': 2, 'pedro':99, 'other': 333}
什么是最好的选择,为什么?使用 pickle
或使用 simplejson
序列化它?
我正在使用Python 2.6。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
对于序列化,我更喜欢 JSON 而不是 pickle。 Unpickling 可以运行任意代码,并且使用
pickle
在程序之间传输数据或在会话之间存储数据是一个安全漏洞。 JSON 不会引入安全漏洞并且是标准化的,因此如果需要,可以通过不同语言的程序访问数据。I prefer JSON over pickle for my serialization. Unpickling can run arbitrary code, and using
pickle
to transfer data between programs or store data between sessions is a security hole. JSON does not introduce a security hole and is standardized, so the data can be accessed by programs in different languages if you ever need to.如果您没有任何互操作性要求(例如,您只想通过 Python 使用数据)并且二进制格式就可以,请使用 cPickle 它为您提供非常快速的 Python 对象序列化。
如果您想要互操作性或者想要使用文本格式来存储数据,请使用 JSON(或其他适当的格式,具体取决于您的限制)。
If you do not have any interoperability requirements (e.g. you are just going to use the data with Python) and a binary format is fine, go with cPickle which gives you really fast Python object serialization.
If you want interoperability or you want a text format to store your data, go with JSON (or some other appropriate format depending on your constraints).
您可能还会发现这很有趣,可以比较一些图表:http:// /kovshenin.com/archives/pickle-vs-json-which-is-faster/
You might also find this interesting, with some charts to compare: http://kovshenin.com/archives/pickle-vs-json-which-is-faster/
如果您主要关心速度和空间,请使用 cPickle,因为 cPickle 比 JSON 更快。
如果您更关心互操作性、安全性和/或人类可读性,请使用 JSON。
其他答案中引用的测试结果记录于 2010 年,并在 2016 年使用 cPickle 协议 2 显示:
自己复制与此要点,它基于康斯坦丁的基准在其他答案中引用,但使用带有协议2的cPickle而不是pickle,并且使用 json 而不是 simplejson (因为 json 比 simplejson 更快),例如
在不错的 2015 Xeon 处理器上使用 python 2.7 的结果:
带有 pickle 协议 3 的 Python 3.4 甚至更快。
If you are primarily concerned with speed and space, use cPickle because cPickle is faster than JSON.
If you are more concerned with interoperability, security, and/or human readability, then use JSON.
The tests results referenced in other answers were recorded in 2010, and the updated tests in 2016 with cPickle protocol 2 show:
Reproduce this yourself with this gist, which is based on the Konstantin's benchmark referenced in other answers, but using cPickle with protocol 2 instead of pickle, and using json instead of simplejson (since json is faster than simplejson), e.g.
Results with python 2.7 on a decent 2015 Xeon processor:
Python 3.4 with pickle protocol 3 is even faster.
JSON 还是 pickle? JSON 和 pickle 怎么样!
您可以使用 jsonpickle。它易于使用,并且磁盘上的文件是可读的,因为它是 JSON。
请参阅 jsonpickle 文档
JSON or pickle? How about JSON and pickle!
You can use
jsonpickle
. It easy to use and the file on disk is readable because it's JSON.See jsonpickle Documentation
我尝试了多种方法,发现使用 cPickle 并将转储方法的协议参数设置为:cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL) 是最快的转储方法。
输出:
I have tried several methods and found out that using cPickle with setting the protocol argument of the dumps method as:
cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)
is the fastest dump method.Output:
就我个人而言,我通常更喜欢 JSON,因为数据是人类可读的。当然,如果您需要序列化 JSON 无法接受的内容,请使用 pickle。
但对于大多数数据存储,您不需要序列化任何奇怪的东西,JSON 更容易,并且始终允许您在文本编辑器中将其打开并自己检查数据。
速度不错,但对于大多数数据集来说,差异可以忽略不计;无论如何,Python 一般都不会太快。
Personally, I generally prefer JSON because the data is human-readable. Definitely, if you need to serialize something that JSON won't take, than use pickle.
But for most data storage, you won't need to serialize anything weird and JSON is much easier and always allows you to pop it open in a text editor and check out the data yourself.
The speed is nice, but for most datasets the difference is negligible; Python generally isn't too fast anyways.
大多数答案都很旧并且遗漏了一些信息。
对于语句“Unpickling 可以运行任意代码”:
pwd
可以替换为rm 删除文件。
对于“pickle速度与json”部分:
首先,现在python3中没有明确的
cpickle
< /a> .对于从另一个答案借用的测试代码,
pickle
在所有方面都击败了json
:结果:
Most answers are quite old and miss some info.
For the statement "Unpickling can run arbitrary code":
pwd
can be replaced e.g. byrm
to delete files.For the "pickle speed vs json" part:
Firstly, there is no explicit
cpickle
in python3 now .And for this test code borrowed from another answer,
pickle
beatsjson
in all:result: