cPickle - 酸洗同一对象的不同结果

发布于 2024-12-05 21:25:20 字数 333 浏览 1 评论 0原文

有谁能够解释一下代码片段

我已经运行了代码,确实评论所说的是真的。不过我想了解为什么这是真的,即为什么 cPickle 根据引用方式为同一对象输出不同的值。

和引用计数有关系吗?如果是这样,这不是某种错误 - 即腌制和反序列化的对象将具有异常高的引用计数,并且实际上永远不会被垃圾收集?

Is anyone able to explain the comment under testLookups() in this code snippet?

I've run the code and indeed what the comment sais is true. However I'd like to understand why it's true, i.e. why is cPickle outputting different values for the same object depending on how it is referenced.

Does it have anything to do with reference count? If so, isn't that some kind of a bug - i.e. the pickled and deserialized object would have an abnormally high reference count and in effect would never get garbage collected?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

灰色世界里的红玫瑰 2024-12-12 21:25:20

无法保证看似相同的物体会产生相同的泡菜串。

pickle 协议是一个虚拟机,而 pickle 字符串是该虚拟机的程序。对于给定的对象,存在多个pickle字符串(=程序),它们将准确地重建该对象。

举一个例子:

>>> from cPickle import dumps
>>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
>>> dumps(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]))
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\ntp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
>>> dumps(t)
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\n(I1\nI2\nI3\nI4\nI5\nt(lp2\nI1\naI2\naI3\naI4\naI5\natp3\n."

两个 pickle 字符串在 p 操作码的使用方面有所不同。该操作码采用一个整数参数,其功能如下:

  name='PUT'    code='p'   arg=decimalnl_short

  Store the stack top into the memo.  The stack is not popped.

  The index of the memo location to write into is given by the newline-
  terminated decimal string following.  BINPUT and LONG_BINPUT are
  space-optimized versions.

长话短说,两个 pickle 字符串基本上是等价的。

我还没有试图确定生成的操作码差异的确切原因。这很可能与被序列化的对象的引用计数有关。然而,很明显的是,这样的差异不会对重建的物体产生影响。

There is no guarantee that seemingly identical objects will produce identical pickle strings.

The pickle protocol is a virtual machine, and a pickle string is a program for that virtual machine. For a given object there exist multiple pickle strings (=programs) that will reconstruct that object exactly.

To take one of your examples:

>>> from cPickle import dumps
>>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
>>> dumps(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]))
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\ntp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
>>> dumps(t)
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\n(I1\nI2\nI3\nI4\nI5\nt(lp2\nI1\naI2\naI3\naI4\naI5\natp3\n."

The two pickle strings differ in their use of the p opcode. The opcode takes one integer argument and its function is as follows:

  name='PUT'    code='p'   arg=decimalnl_short

  Store the stack top into the memo.  The stack is not popped.

  The index of the memo location to write into is given by the newline-
  terminated decimal string following.  BINPUT and LONG_BINPUT are
  space-optimized versions.

To cut a long story short, the two pickle strings are basically equivalent.

I haven't tried to nail down the exact cause of the differences in generated opcodes. This could well have to do with reference counts of the objects being serialized. What is clear, however, that discrepancies like this will have no effect on the reconstructed object.

八巷 2024-12-12 21:25:20

它正在查看来自 cPickle 源的引用计数:

if (Py_REFCNT(args) > 1) {
    if (!( py_ob_id = PyLong_FromVoidPtr(args)))
        goto finally;

    if (PyDict_GetItem(self->memo, py_ob_id)) {
        if (get(self, py_ob_id) < 0)
            goto finally;

        res = 0;
        goto finally;
    }
}

pickle 协议必须处理对同一对象的多个引用的pickle。为了防止在 depickle 时重复对象,它使用了备忘录。备忘录基本上将索引映射到各种对象。 pickle 中的 PUT (p) 操作码将当前对象存储在此备忘录字典中。

然而,如果一个对象只有一个引用,则没有理由将其存储在备忘录中,因为不可能需要再次引用它,因为它只有一个引用。因此,cPickle 代码此时会检查引用计数以进行一些优化。

所以是的,它是引用计数。但这不是问题。 unpickled 的对象将具有正确的引用计数,当引用计数为 1 时,它只会产生稍短的 pickle。

现在,我不知道你在做什么,你关心这个。但你真的不应该假设对同一个对象进行酸洗总是会得到相同的结果。如果不出意外的话,我希望字典会给你带来问题,因为键的顺序是未定义的。除非你有 python 文档保证每次 pickle 都是相同的,否则我强烈建议你不要依赖它。

It is looking at the reference counts, from the cPickle source:

if (Py_REFCNT(args) > 1) {
    if (!( py_ob_id = PyLong_FromVoidPtr(args)))
        goto finally;

    if (PyDict_GetItem(self->memo, py_ob_id)) {
        if (get(self, py_ob_id) < 0)
            goto finally;

        res = 0;
        goto finally;
    }
}

The pickle protocol has to deal with pickling multiple references to the same object. In order to prevent duplicating the object when depickled it uses a memo. The memo basically maps indexes to the various objects. The PUT (p) opcode in the pickle stores the current object in this memo dictionary.

However, if there is only a single reference to an object, there is no reason to store it it the memo because it is impossible to need to reference it again because it only has one reference. Thus the cPickle code checks the reference count for a little optimization at this point.

So yes, its the reference counts. But not that's not a problem. The objects unpickled will have the correct reference counts, it just produces a slightly shorter pickle when the reference counts are at 1.

Now, I don't know what you are you doing that you care about this. But you really shouldn't assume that pickling the same object will always give you the same result. If nothing else, I'd expect dictionaries to give you problems because the order of the keys is undefined. Unless you have python documentation that guarantees the pickle is the same each time I highly recommend you don't depend on it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文