cPickle - 酸洗同一对象的不同结果
有谁能够解释一下代码片段?
我已经运行了代码,确实评论所说的是真的。不过我想了解为什么这是真的,即为什么 cPickle 根据引用方式为同一对象输出不同的值。
和引用计数有关系吗?如果是这样,这不是某种错误 - 即腌制和反序列化的对象将具有异常高的引用计数,并且实际上永远不会被垃圾收集?
Is anyone able to explain the comment under testLookups()
in this code snippet?
I've run the code and indeed what the comment sais is true. However I'd like to understand why it's true, i.e. why is cPickle outputting different values for the same object depending on how it is referenced.
Does it have anything to do with reference count? If so, isn't that some kind of a bug - i.e. the pickled and deserialized object would have an abnormally high reference count and in effect would never get garbage collected?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
无法保证看似相同的物体会产生相同的泡菜串。
pickle 协议是一个虚拟机,而 pickle 字符串是该虚拟机的程序。对于给定的对象,存在多个pickle字符串(=程序),它们将准确地重建该对象。
举一个例子:
两个 pickle 字符串在
p
操作码的使用方面有所不同。该操作码采用一个整数参数,其功能如下:长话短说,两个 pickle 字符串基本上是等价的。
我还没有试图确定生成的操作码差异的确切原因。这很可能与被序列化的对象的引用计数有关。然而,很明显的是,这样的差异不会对重建的物体产生影响。
There is no guarantee that seemingly identical objects will produce identical pickle strings.
The pickle protocol is a virtual machine, and a pickle string is a program for that virtual machine. For a given object there exist multiple pickle strings (=programs) that will reconstruct that object exactly.
To take one of your examples:
The two pickle strings differ in their use of the
p
opcode. The opcode takes one integer argument and its function is as follows:To cut a long story short, the two pickle strings are basically equivalent.
I haven't tried to nail down the exact cause of the differences in generated opcodes. This could well have to do with reference counts of the objects being serialized. What is clear, however, that discrepancies like this will have no effect on the reconstructed object.
它正在查看来自 cPickle 源的引用计数:
pickle 协议必须处理对同一对象的多个引用的pickle。为了防止在 depickle 时重复对象,它使用了备忘录。备忘录基本上将索引映射到各种对象。 pickle 中的 PUT (p) 操作码将当前对象存储在此备忘录字典中。
然而,如果一个对象只有一个引用,则没有理由将其存储在备忘录中,因为不可能需要再次引用它,因为它只有一个引用。因此,cPickle 代码此时会检查引用计数以进行一些优化。
所以是的,它是引用计数。但这不是问题。 unpickled 的对象将具有正确的引用计数,当引用计数为 1 时,它只会产生稍短的 pickle。
现在,我不知道你在做什么,你关心这个。但你真的不应该假设对同一个对象进行酸洗总是会得到相同的结果。如果不出意外的话,我希望字典会给你带来问题,因为键的顺序是未定义的。除非你有 python 文档保证每次 pickle 都是相同的,否则我强烈建议你不要依赖它。
It is looking at the reference counts, from the cPickle source:
The pickle protocol has to deal with pickling multiple references to the same object. In order to prevent duplicating the object when depickled it uses a memo. The memo basically maps indexes to the various objects. The PUT (p) opcode in the pickle stores the current object in this memo dictionary.
However, if there is only a single reference to an object, there is no reason to store it it the memo because it is impossible to need to reference it again because it only has one reference. Thus the cPickle code checks the reference count for a little optimization at this point.
So yes, its the reference counts. But not that's not a problem. The objects unpickled will have the correct reference counts, it just produces a slightly shorter pickle when the reference counts are at 1.
Now, I don't know what you are you doing that you care about this. But you really shouldn't assume that pickling the same object will always give you the same result. If nothing else, I'd expect dictionaries to give you problems because the order of the keys is undefined. Unless you have python documentation that guarantees the pickle is the same each time I highly recommend you don't depend on it.