copy.deepcopy 与 pickle
我有一个小部件的树结构,例如集合包含模型,模型包含小部件。我想复制整个集合,与“pickle 和 de-pickle”对象相比,copy.deepcopy
更快,但用 C 编写的 cPickle 更快,所以
- 为什么我不应该(我们)总是使用 cPickle 而不是 deepcopy?
- 还有其他复制替代方案吗?因为pickle比deepcopy慢,但cPickle更快,所以deepcopy的C实现可能会是胜利者
示例测试代码:
import copy
import pickle
import cPickle
class A(object): pass
d = {}
for i in range(1000):
d[i] = A()
def copy1():
return copy.deepcopy(d)
def copy2():
return pickle.loads(pickle.dumps(d, -1))
def copy3():
return cPickle.loads(cPickle.dumps(d, -1))
时间:
>python -m timeit -s "import c" "c.copy1()"
10 loops, best of 3: 46.3 msec per loop
>python -m timeit -s "import c" "c.copy2()"
10 loops, best of 3: 93.3 msec per loop
>python -m timeit -s "import c" "c.copy3()"
100 loops, best of 3: 17.1 msec per loop
I have a tree structure of widgets e.g. collection contains models and model contains widgets. I want to copy whole collection, copy.deepcopy
is faster in comparison to 'pickle and de-pickle'ing the object but cPickle as being written in C is much faster, so
- Why shouldn't I(we) always be using cPickle instead of deepcopy?
- Is there any other copy alternative? because pickle is slower then deepcopy but cPickle is faster, so may be a C implementation of deepcopy will be the winner
Sample test code:
import copy
import pickle
import cPickle
class A(object): pass
d = {}
for i in range(1000):
d[i] = A()
def copy1():
return copy.deepcopy(d)
def copy2():
return pickle.loads(pickle.dumps(d, -1))
def copy3():
return cPickle.loads(cPickle.dumps(d, -1))
Timings:
>python -m timeit -s "import c" "c.copy1()"
10 loops, best of 3: 46.3 msec per loop
>python -m timeit -s "import c" "c.copy2()"
10 loops, best of 3: 93.3 msec per loop
>python -m timeit -s "import c" "c.copy3()"
100 loops, best of 3: 17.1 msec per loop
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
问题是,pickle+unpickle 可以更快(在 C 实现中),因为它比 deepcopy 不太通用:许多对象可以深度复制,但不能 pickle。例如,假设您的类
A
更改为...:现在,
copy1
仍然可以正常工作(A 的复杂性减慢了速度,但绝对不会阻止它);copy2
和copy3
中断,堆栈跟踪的末尾显示...:即,pickling 始终假设类和函数是其模块中的顶级实体,因此“按名称”腌制它们——深度复制绝对不会做出这样的假设。
因此,如果您遇到“某种深度复制”的速度绝对至关重要的情况,那么每一毫秒都很重要,并且您想利用您知道适用于您正在复制的对象的特殊限制,例如那些进行酸洗的对象适用的,或者喜欢其他形式的序列化和其他快捷方式的人,无论如何都可以继续 - 但如果你这样做,你必须意识到你正在限制你的系统永远受到这些限制,并非常清楚地记录设计决策和明确是为了未来维护者的利益。
对于正常情况,如果您想要通用性,请使用
deepcopy
!-)Problem is, pickle+unpickle can be faster (in the C implementation) because it's less general than deepcopy: many objects can be deepcopied but not pickled. Suppose for example that your class
A
were changed to...:now,
copy1
still works fine (A's complexity slows it downs but absolutely doesn't stop it);copy2
andcopy3
break, the end of the stack trace says...:I.e., pickling always assumes that classes and functions are top-level entities in their modules, and so pickles them "by name" -- deepcopying makes absolutely no such assumptions.
So if you have a situation where speed of "somewhat deep-copying" is absolutely crucial, every millisecond matters, AND you want to take advantage of special limitations that you KNOW apply to the objects you're duplicating, such as those that make pickling applicable, or ones favoring other forms yet of serializations and other shortcuts, by all means go ahead - but if you do you MUST be aware that you're constraining your system to live by those limitations forevermore, and document that design decision very clearly and explicitly for the benefit of future maintainers.
For the NORMAL case, where you want generality, use
deepcopy
!-)您应该使用深度复制,因为它使您的代码更具可读性。使用序列化机制复制内存中的对象至少会让阅读您代码的其他开发人员感到困惑。使用深度复制还意味着您可以获得深度复制未来优化的好处。
优化的第一条规则:不要。
You should be using deepcopy because it makes your code more readable. Using a serialization mechanism to copy objects in memory is at the very least confusing to another developer reading your code. Using deepcopy also means you get to reap the benefits of future optimizations in deepcopy.
First rule of optimization: don't.
cPickle 并不总是比 deepcopy() 更快。虽然 cPickle 可能总是比 pickle 快,但它是否比 deepcopy 快取决于
如果某些东西可以被pickle,那么它显然可以被深度复制,但相反的情况并非如此:为了pickle某些东西,它需要完全序列化;深度复制则不是这种情况。特别是,您可以通过复制内存中的结构(考虑扩展类型)来非常有效地实现 __deepcopy__ ,而无需将所有内容保存到磁盘。 (想想挂起到 RAM 与挂起到磁盘。)
满足上述条件的一个众所周知的扩展类型可能是 ndarray,事实上,它可以作为您的示例的一个很好的反例。观察:使用
d = numpy.arange(100000000)
,您的代码会给出不同的运行时:如果
__deepcopy__
未实现,则copy
和pickle
共享通用基础设施(参见copy_reg
模块,在 pickle 和 deepcopy 之间的关系)。It is not always the case that cPickle is faster than deepcopy(). While cPickle is probably always faster than pickle, whether it is faster than deepcopy depends on
If something can be pickled, it can obviously be deepcopied, but the opposite is not the case: In order to pickle something, it needs to be fully serialized; this is not the case for deepcopying. In particular, you can implement
__deepcopy__
very efficiently by copying a structure in memory (think of extension types), without being able to save everything to disk. (Think of suspend-to-RAM vs. suspend-to-disk.)A well-known extension type that fulfills the conditions above may be
ndarray
, and indeed, it serves as a good counterexample to your observation: Withd = numpy.arange(100000000)
, your code gives different runtimes:If
__deepcopy__
is not implemented,copy
andpickle
share common infrastructure (cf.copy_reg
module, discussed in Relationship between pickle and deepcopy).更快的方法是从一开始就避免复制。你提到你正在做渲染。为什么需要复制对象?
Even faster would be to avoid the copy in the first place. You mention that you are doing rendering. Why does it need to copy objects?
简短且有点晚:
例如您可能会考虑:
Short and somewhat late:
e.g. You might consider: