Python 中 pickle 的常见用例
我查看了 pickle 文档,但我不明白 pickle 在哪里很有用。
pickle 的一些常见用例有哪些?
I've looked at the pickle documentation, but I don't understand where pickle is useful.
What are some common use-cases for pickle?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
我遇到的一些用途:
1)将程序的状态数据保存到磁盘,以便它可以在重新启动时从中断处继续(持久性)
2)在多核或分布式系统中通过 TCP 连接发送 python 数据(编组)
3)将Python对象存储在数据库中
4)将任意Python对象转换为字符串,以便它可以用作字典键(例如用于缓存和记忆)。
最后一个存在一些问题 - 两个相同的对象可以被 pickle 并导致不同的字符串 - 或者甚至同一对象被 pickle 两次可以有不同的表示。这是因为 pickle 可以包含引用计数信息。
为了强调 @lunaryorn 的评论 - 你永远不应该从不受信任的来源中解封字符串,因为精心制作的 pickle 可以在你的系统上执行任意代码。例如,请参阅 https://blog.nelhage.com/2011/03/exploiting-pickle /
Some uses that I have come across:
1) saving a program's state data to disk so that it can carry on where it left off when restarted (persistence)
2) sending python data over a TCP connection in a multi-core or distributed system (marshalling)
3) storing python objects in a database
4) converting an arbitrary python object to a string so that it can be used as a dictionary key (e.g. for caching & memoization).
There are some issues with the last one - two identical objects can be pickled and result in different strings - or even the same object pickled twice can have different representations. This is because the pickle can include reference count information.
To emphasise @lunaryorn's comment - you should never unpickle a string from an untrusted source, since a carefully crafted pickle could execute arbitrary code on your system. For example see https://blog.nelhage.com/2011/03/exploiting-pickle/
最小往返示例..
编辑:但是至于酸洗的现实示例问题,也许酸洗的最高级用法(您必须深入挖掘进入源代码)是 ZODB:
http://svn.zope.org/
另外,PyPI 提到了几个:
http://pypi.python.org/pypi?:action =search&term=pickle&submit=search
我个人见过几个通过网络发送 pickle 对象作为易于使用的网络传输协议的示例。
Minimal roundtrip example..
Edit: but as for the question of real-world examples of pickling, perhaps the most advanced use of pickling (you'd have to dig quite deep into the source) is ZODB:
http://svn.zope.org/
Otherwise, PyPI mentions several:
http://pypi.python.org/pypi?:action=search&term=pickle&submit=search
I have personally seen several examples of pickled objects being sent over the network as an easy to use network transfer protocol.
Pickle 就像数据结构和类的“另存为..”和“打开..”。假设我想保存我的数据结构,以便它在程序运行之间保持不变。
保存:
加载:
现在我不必再次从头开始构建 myStuff,而且我可以从上次中断的地方继续。
Pickle is like "Save As.." and "Open.." for your data structures and classes. Let's say I want to save my data structures so that it is persistent between program runs.
Saving:
Loading:
Now I don't have to build myStuff from scratch all over again, and I can just pick(le) up from where I left off.
我在我的一个项目中使用过它。如果应用程序在工作期间终止(它执行了一项冗长的任务并处理了大量数据),我需要保存整个数据结构并在应用程序再次运行后重新加载它。我为此使用了 cPickle,因为速度至关重要,而且数据量非常大。
I have used it in one of my projects. If the app was terminated during it's working (it did a lengthy task and processed lots of data), I needed to save the whole data structure and reload it after the app was run again. I used cPickle for this, as speed was a crucial thing and the size of data was really big.
Pickling对于分布式并行计算来说是绝对必要的。
假设您想使用
multiprocessing
进行并行映射缩减(或者使用 pyina),那么您需要确保您想要跨并行资源映射的函数将被pickle。如果它没有 pickle,则无法将其发送到其他进程、计算机等上的其他资源。另请参阅 这里是一个很好的例子。为此,我使用 dill,它几乎可以序列化 python 中的任何内容。 Dill 还提供一些不错的工具来帮助您了解造成问题的原因当您的代码失败时,您的酸洗也会失败。
是的,人们使用选择来保存计算的状态,或者您的 ipython 会话,或者其他什么。
Pickling is absolutely necessary for distributed and parallel computing.
Say you wanted to do a parallel map-reduce with
multiprocessing
(or across cluster nodes with pyina), then you need to make sure the function you want to have mapped across the parallel resources will pickle. If it doesn't pickle, you can't send it to the other resources on another process, computer, etc. Also see here for a good example.To do this, I use dill, which can serialize almost anything in python. Dill also has some good tools for helping you understand what is causing your pickling to fail when your code fails.
And, yes, people use picking to save the state of a calculation, or your ipython session, or whatever.
对于初学者(就像我的情况一样),在阅读 官方文档。这可能是因为文档暗示您已经知道序列化的全部目的。只有在阅读了序列化的一般描述之后,我才理解了这个模块的原因及其常见用途案例。此外,不考虑特定编程语言的序列化的广泛解释可能会有所帮助:
https://stackoverflow.com/a/14482962/4383472,什么是序列化?,
https://stackoverflow.com/a/3984483/4383472
For the beginner (as is the case with me) it's really hard to understand why use pickle in the first place when reading the official documentation. It's maybe because the docs imply that you already know the whole purpose of serialization. Only after reading the general description of serialization have I understood the reason for this module and its common use cases. Also broad explanations of serialization disregarding a particular programming language may help:
https://stackoverflow.com/a/14482962/4383472, What is serialization?,
https://stackoverflow.com/a/3984483/4383472
添加一个真实示例:Python 的 Sphinx 文档工具使用 pickle 来缓存已解析的文档和交叉引用文档之间,以加快文档的后续构建。
To add a real-world example: The Sphinx documentation tool for Python uses pickle to cache parsed documents and cross-references between documents, to speed up subsequent builds of the documentation.
我可以告诉你我使用它的用途,并且见过它的用途:
这些是我使用它的记录至少
I can tell you the uses I use it for and have seen it used for:
Those are the ones I use it for at least
我在网页抓取期间使用了pickling,当时我想存储超过8000k的url,并希望尽快处理它们,所以我使用pickling,因为它的输出质量非常高。
您可以轻松到达 url,甚至在工作目录关键字停止的位置也可以非常快速地获取 url 详细信息以恢复该过程。
I use pickling during web scraping one of website at that time I want to store more than 8000k urls and want to process them as fast as possible so I use pickling because its output quality is very high.
you can easily reach to url and where you stop even job directory key word also fetch url details very fast for resuming the process.