Pickle 还是 json?

发布于 2024-08-20 23:38:47 字数 282 浏览 5 评论 0 原文

我需要将一个 dict 对象保存到磁盘,其键为 str 类型,值为 ints ,然后恢复它。像这样:

{'juanjo': 2, 'pedro':99, 'other': 333}

什么是最好的选择,为什么?使用 pickle 或使用 simplejson 序列化它?

我正在使用Python 2.6。

I need to save to disk a little dict object whose keys are of the type str and values are ints and then recover it. Something like this:

{'juanjo': 2, 'pedro':99, 'other': 333}

What is the best option and why? Serialize it with pickle or with simplejson?

I am using Python 2.6.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

慕烟庭风 2024-08-27 23:38:47

对于序列化,我更喜欢 JSON 而不是 pickle。 Unpickling 可以运行任意代码,并且使用 pickle 在程序之间传输数据或在会话之间存储数据是一个安全漏洞。 JSON 不会引入安全漏洞并且是标准化的,因此如果需要,可以通过不同语言的程序访问数据。

I prefer JSON over pickle for my serialization. Unpickling can run arbitrary code, and using pickle to transfer data between programs or store data between sessions is a security hole. JSON does not introduce a security hole and is standardized, so the data can be accessed by programs in different languages if you ever need to.

爱,才寂寞 2024-08-27 23:38:47

如果您没有任何互操作性要求(例如,您只想通过 Python 使用数据)并且二进制格式就可以,请使用 cPickle 它为您提供非常快速的 Python 对象序列化。

如果您想要互操作性或者想要使用文本格式来存储数据,请使用 JSON(或其他适当的格式,具体取决于您的限制)。

If you do not have any interoperability requirements (e.g. you are just going to use the data with Python) and a binary format is fine, go with cPickle which gives you really fast Python object serialization.

If you want interoperability or you want a text format to store your data, go with JSON (or some other appropriate format depending on your constraints).

幻想少年梦 2024-08-27 23:38:47

您可能还会发现这很有趣,可以比较一些图表:http:// /kovshenin.com/archives/pickle-vs-json-which-is-faster/

You might also find this interesting, with some charts to compare: http://kovshenin.com/archives/pickle-vs-json-which-is-faster/

_蜘蛛 2024-08-27 23:38:47

如果您主要关心速度和空间,请使用 cPickle,因为 cPickle 比 JSON 更快。

如果您更关心互操作性、安全性和/或人类可读性,请使用 JSON。


其他答案中引用的测试结果记录于 2010 年,并在 2016 年使用 cPickle 协议 2 显示:

  • cPickle 加载速度加快 3.8 倍
  • cPickle 读取速度加快 1.5 倍
  • cPickle 编码稍小

自己复制与此要点,它基于康斯坦丁的基准在其他答案中引用,但使用带有协议2的cPickle而不是pickle,并且使用 json 而不是 simplejson (因为 json 比 simplejson 更快),例如

wget https://gist.github.com/jdimatteo/af317ef24ccf1b3fa91f4399902bb534/raw/03e8dbab11b5605bc572bc117c8ac34cfa959a70/pickle_vs_json.py
python pickle_vs_json.py

在不错的 2015 Xeon 处理器上使用 python 2.7 的结果:

Dir Entries Method  Time    Length

dump    10  JSON    0.017   1484510
load    10  JSON    0.375   -
dump    10  Pickle  0.011   1428790
load    10  Pickle  0.098   -
dump    20  JSON    0.036   2969020
load    20  JSON    1.498   -
dump    20  Pickle  0.022   2857580
load    20  Pickle  0.394   -
dump    50  JSON    0.079   7422550
load    50  JSON    9.485   -
dump    50  Pickle  0.055   7143950
load    50  Pickle  2.518   -
dump    100 JSON    0.165   14845100
load    100 JSON    37.730  -
dump    100 Pickle  0.107   14287900
load    100 Pickle  9.907   -

带有 pickle 协议 3 的 Python 3.4 甚至更快。

If you are primarily concerned with speed and space, use cPickle because cPickle is faster than JSON.

If you are more concerned with interoperability, security, and/or human readability, then use JSON.


The tests results referenced in other answers were recorded in 2010, and the updated tests in 2016 with cPickle protocol 2 show:

  • cPickle 3.8x faster loading
  • cPickle 1.5x faster reading
  • cPickle slightly smaller encoding

Reproduce this yourself with this gist, which is based on the Konstantin's benchmark referenced in other answers, but using cPickle with protocol 2 instead of pickle, and using json instead of simplejson (since json is faster than simplejson), e.g.

wget https://gist.github.com/jdimatteo/af317ef24ccf1b3fa91f4399902bb534/raw/03e8dbab11b5605bc572bc117c8ac34cfa959a70/pickle_vs_json.py
python pickle_vs_json.py

Results with python 2.7 on a decent 2015 Xeon processor:

Dir Entries Method  Time    Length

dump    10  JSON    0.017   1484510
load    10  JSON    0.375   -
dump    10  Pickle  0.011   1428790
load    10  Pickle  0.098   -
dump    20  JSON    0.036   2969020
load    20  JSON    1.498   -
dump    20  Pickle  0.022   2857580
load    20  Pickle  0.394   -
dump    50  JSON    0.079   7422550
load    50  JSON    9.485   -
dump    50  Pickle  0.055   7143950
load    50  Pickle  2.518   -
dump    100 JSON    0.165   14845100
load    100 JSON    37.730  -
dump    100 Pickle  0.107   14287900
load    100 Pickle  9.907   -

Python 3.4 with pickle protocol 3 is even faster.

卸妝后依然美 2024-08-27 23:38:47

JSON 还是 pickle? JSON pickle 怎么样!

您可以使用 jsonpickle。它易于使用,并且磁盘上的文件是可读的,因为它是 JSON。

请参阅 jsonpickle 文档

JSON or pickle? How about JSON and pickle!

You can use jsonpickle. It easy to use and the file on disk is readable because it's JSON.

See jsonpickle Documentation

世俗缘 2024-08-27 23:38:47

我尝试了多种方法,发现使用 cPickle 并将转储方法的协议参数设置为:cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL) 是最快的转储方法。

import msgpack
import json
import pickle
import timeit
import cPickle
import numpy as np

num_tests = 10

obj = np.random.normal(0.5, 1, [240, 320, 3])

command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle:  %f seconds" % result)

command = 'cPickle.dumps(obj)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle:   %f seconds" % result)


command = 'cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle highest:   %f seconds" % result)

command = 'json.dumps(obj.tolist())'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json:   %f seconds" % result)


command = 'msgpack.packb(obj.tolist())'
setup = 'from __main__ import msgpack, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("msgpack:   %f seconds" % result)

输出:

pickle         :   0.847938 seconds
cPickle        :   0.810384 seconds
cPickle highest:   0.004283 seconds
json           :   1.769215 seconds
msgpack        :   0.270886 seconds

I have tried several methods and found out that using cPickle with setting the protocol argument of the dumps method as: cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL) is the fastest dump method.

import msgpack
import json
import pickle
import timeit
import cPickle
import numpy as np

num_tests = 10

obj = np.random.normal(0.5, 1, [240, 320, 3])

command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle:  %f seconds" % result)

command = 'cPickle.dumps(obj)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle:   %f seconds" % result)


command = 'cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle highest:   %f seconds" % result)

command = 'json.dumps(obj.tolist())'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json:   %f seconds" % result)


command = 'msgpack.packb(obj.tolist())'
setup = 'from __main__ import msgpack, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("msgpack:   %f seconds" % result)

Output:

pickle         :   0.847938 seconds
cPickle        :   0.810384 seconds
cPickle highest:   0.004283 seconds
json           :   1.769215 seconds
msgpack        :   0.270886 seconds
夕嗳→ 2024-08-27 23:38:47

就我个人而言,我通常更喜欢 JSON,因为数据是人类可读的。当然,如果您需要序列化 ​​JSON 无法接受的内容,请使用 pickle。

但对于大多数数据存储,您不需要序列化任何奇怪的东西,JSON 更容易,并且始终允许您在文本编辑器中将其打开并自己检查数据。

速度不错,但对于大多数数据集来说,差异可以忽略不计;无论如何,Python 一般都不会太快。

Personally, I generally prefer JSON because the data is human-readable. Definitely, if you need to serialize something that JSON won't take, than use pickle.

But for most data storage, you won't need to serialize anything weird and JSON is much easier and always allows you to pop it open in a text editor and check out the data yourself.

The speed is nice, but for most datasets the difference is negligible; Python generally isn't too fast anyways.

八巷 2024-08-27 23:38:47

大多数答案都很旧并且遗漏了一些信息。

对于语句“Unpickling 可以运行任意代码”:
  1. 查看 https://docs.python.org/3/library/pickle.html#restricting-globals
import pickle
pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
pickle.loads(b"cos\nsystem\n(S'pwd'\ntR.")

pwd 可以替换为 rm 删除文件。

  1. 检查 https://checkoway.net/musings/pickle/ 以获得更复杂的“运行任意代码”模板。代码是用python2.7编写的,但我想经过一些修改,也可以在python3中工作。如果你让它在 python3 中工作,请添加 python3 版本我的答案。 :)
对于“pickle速度与json”部分:

首先,现在python3中没有明确的cpickle< /a> .

对于从另一个答案借用的测试代码, pickle 在所有方面都击败了 json

import pickle
import json, random
from time import time
from hashlib import md5

test_runs = 100000

if __name__ == "__main__":
    payload = {
        "float": [(random.randrange(0, 99) + random.random()) for i in range(1000)],
        "int": [random.randrange(0, 9999) for i in range(1000)],
        "str": [md5(str(random.random()).encode('utf8')).hexdigest() for i in range(1000)]
    }
    modules = [json, pickle]

    for payload_type in payload:
        data = payload[payload_type]
        for module in modules:
            start = time()
            if module.__name__ in ['pickle']:
                for i in range(test_runs): serialized = module.dumps(data)
            else:
                for i in range(test_runs): 
                    # print(i)
                    serialized = module.dumps(data)
            w = time() - start
            start = time()
            for i in range(test_runs):
                unserialized = module.loads(serialized)
            r = time() - start
            print("%s %s W %.3f R %.3f" % (module.__name__, payload_type, w, r))

结果:

tian@tian-B250M-Wind:~/playground/pickle_vs_json$ p3 pickle_test.py 
json float W 41.775 R 26.738
pickle float W 1.272 R 2.286
json int W 5.142 R 4.974
pickle int W 0.589 R 1.352
json str W 10.379 R 4.626
pickle str W 3.062 R 3.294

Most answers are quite old and miss some info.

For the statement "Unpickling can run arbitrary code":
  1. Check the example in https://docs.python.org/3/library/pickle.html#restricting-globals
import pickle
pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
pickle.loads(b"cos\nsystem\n(S'pwd'\ntR.")

pwd can be replaced e.g. by rm to delete files.

  1. Check https://checkoway.net/musings/pickle/ for more sophisicated "run arbitrary code" template. The code is written in python2.7 but I guess with some modification, could also work in python3. If you make it work in python3, please add the python3 version my answer. :)
For the "pickle speed vs json" part:

Firstly, there is no explicit cpickle in python3 now .

And for this test code borrowed from another answer, pickle beats json in all:

import pickle
import json, random
from time import time
from hashlib import md5

test_runs = 100000

if __name__ == "__main__":
    payload = {
        "float": [(random.randrange(0, 99) + random.random()) for i in range(1000)],
        "int": [random.randrange(0, 9999) for i in range(1000)],
        "str": [md5(str(random.random()).encode('utf8')).hexdigest() for i in range(1000)]
    }
    modules = [json, pickle]

    for payload_type in payload:
        data = payload[payload_type]
        for module in modules:
            start = time()
            if module.__name__ in ['pickle']:
                for i in range(test_runs): serialized = module.dumps(data)
            else:
                for i in range(test_runs): 
                    # print(i)
                    serialized = module.dumps(data)
            w = time() - start
            start = time()
            for i in range(test_runs):
                unserialized = module.loads(serialized)
            r = time() - start
            print("%s %s W %.3f R %.3f" % (module.__name__, payload_type, w, r))

result:

tian@tian-B250M-Wind:~/playground/pickle_vs_json$ p3 pickle_test.py 
json float W 41.775 R 26.738
pickle float W 1.272 R 2.286
json int W 5.142 R 4.974
pickle int W 0.589 R 1.352
json str W 10.379 R 4.626
pickle str W 3.062 R 3.294
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文