Pickle 类实例加上定义?

发布于 2024-11-24 18:31:35 字数 541 浏览 2 评论 0原文

我怀疑这是一个常见问题,但我还没有找到解决方案。我想要的非常简单,而且在技术上似乎是可行的:我有一个简单的 python 类,我想将它存储在光盘上,实例和定义在一个文件中。 Pickle 将存储数据,但不存储类定义。有人可能会说类定义已经存储在我的 .py 文件中,但我不需要单独的 .py 文件;我的目标是拥有一个独立的单个文件,我可以使用一行代码将其弹出到我的命名空间中。

所以是的,我知道这可以使用两个文件和两行代码,但我希望它在一个文件和一行代码中。原因是因为我经常发现自己处于这种情况;我正在处理一些大数据集,在 python 中操作它,然后必须将我的切片、切块和转换后的数据写回到一些预先存在的目录结构中。我不想要的是在这些数据目录中乱扔一些名字不恰当的 python 类存根来保持我的代码和数据关联,我更不想要的是跟踪和组织所有这些定义的小临时类的麻烦独立地在脚本中运行。

因此,便利性并不在于代码的可读性,而在于代码和数据之间轻松且可靠的关联。这对我来说似乎是一个有价值的目标,尽管我知道这在大多数情况下并不合适。

所以问题是:是否有一个包或代码片段可以完成这样的事情,因为我似乎找不到任何包或代码片段。

This is a problem which I suspect is common, but I haven't found a solution for it. What I want is quite simple, and seemingly technically feasible: I have a simple python class, and I want to store it on disc, instance and definition, in a single file. Pickle will store the data, but it doesn't store the class definition. One might argue that the class definition is already stored in my .py file, but I don't want a separate .py file; my goal is to have a self-contained single file that I could pop back into my namespace with a single line of code.

So yes, I know this possible using two files and two lines of code, but I want it in one file and one line of code. The reason why is because I often find myself in this situation; I'm working on some big dataset, manipulating it in python, and then having to write my sliced, diced and transformed data back into some preexisting directory structure. What I don't want is to litter these data-directories with ill-named python class stubs to keep my code and data associated, and what I want even less is the hassle of keeping track of and organizing all these little ad hoc classes defined on the fly in a script independently.

So the convenience isn't so much in code readability, but in effortless and unfudgable association between code and data. That seems like a worthy goal to me, even though I understand it isn't appropriate in most situations.

So the question is: Is there a package or code snippet that does such a thing, because I can't seem to find any.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

新一帅帅 2024-12-01 18:31:35

如果您使用dill,它使您能够将__main__视为Python模块(在大多数情况下)。因此,您可以序列化交互式定义的类等。 dill 还(默认情况下)可以将类定义作为 pickle 的一部分传输。

>>> class MyTest(object):
...   def foo(self, x):
...     return self.x * x
...   x = 4
... 
>>> f = MyTest() 
>>> import dill
>>>
>>> with open('test.pkl', 'wb') as s:
...   dill.dump(f, s)
... 
>>> 

然后关闭解释器,并通过 TCP 发送文件 test.pkl。在远程计算机上,现在您可以获取类实例。

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('test.pkl', 'rb') as s:
...   f = dill.load(s)
... 
>>> f
<__main__.MyTest object at 0x1069348d0>
>>> f.x
4
>>> f.foo(2)
8
>>>             

但是如何获取类的定义呢?所以这不完全是你想要的。然而,以下是。

>>> class MyTest2(object):
...   def bar(self, x):
...     return x*x + self.x
...   x = 1
... 
>>> import dill
>>> with open('test2.pkl', 'wb') as s:
...   dill.dump(MyTest2, s)
... 
>>>

然后发送文件后...您可以获得类定义。

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('test2.pkl', 'rb') as s:
...   MyTest2 = dill.load(s)
... 
>>> print dill.source.getsource(MyTest2)
class MyTest2(object):
  def bar(self, x):
    return x*x + self.x
  x = 1

>>> f = MyTest2()
>>> f.x
1
>>> f.bar(4)
17

既然您正在寻找单衬,我可以做得更好。我没有表明您可以同时发送类和实例,也许这就是您想要的。

>>> import dill
>>> class Foo(object): 
...   def bar(self, x):
...     return x+self.x
...   x = 1
... 
>>> b = Foo()
>>> b.x = 5
>>> 
>>> with open('blah.pkl', 'wb') as s:
...   dill.dump((Foo, b), s)
... 
>>> 

它仍然不是一行,但是它可以工作。

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('blah.pkl', 'rb') as s:
...   Foo, b = dill.load(s)
... 
>>> b.x  
5
>>> Foo.bar(b, 2)
7

因此,在 dill 中,有 dill.source,它具有可以检测函数和类的依赖关系的方法,并将它们与 pickle 一起使用(大多数情况下) 。

>>> def foo(x):
...   return x*x
... 
>>> class Bar(object):
...   def zap(self, x):
...     return foo(x) * self.x
...   x = 3
... 
>>> print dill.source.importable(Bar.zap, source=True)
def foo(x):
  return x*x
def zap(self, x):
  return foo(x) * self.x

所以这不是“完美”(或者可能不是预期的)......但它确实序列化了动态构建方法及其依赖项的代码。你只是没有得到班级的其余部分——但在这种情况下不需要班级的其余部分。不过,这似乎并不是您想要的。

如果你想获得一切,你可以腌制整个会话。
一行中(两行算导入)。

>>> import dill
>>> def foo(x):
...   return x*x
... 
>>> class Blah(object):
...   def bar(self, x):
...     self.x = (lambda x:foo(x)+self.x)(x)
...   x = 2
... 
>>> b = Blah()
>>> b.x
2
>>> b.bar(3)
>>> b.x
11
>>> # the one line
>>> dill.dump_session('foo.pkl')
>>> 

然后在远程计算机上...

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> # the one line
>>> dill.load_session('foo.pkl')
>>> b.x
11
>>> b.bar(2)
>>> b.x
15
>>> foo(3)
9

最后,如果您希望透明地“完成”传输(而不是使用文件),您可以使用 pathos.ppppft< /code>,它提供了将对象发送到第二个 python 服务器(在远程计算机上)或 python 进程的能力。他们在底层使用 dill,然后通过网络传递代码。

>>> class More(object):
...   def squared(self, x):
...     return x*x
... 
>>> import pathos
>>> 
>>> p = pathos.pp.ParallelPythonPool(servers=('localhost,1234',))
>>> 
>>> m = More()
>>> p.map(m.squared, range(5))
[0, 1, 4, 9, 16]

servers 参数是可选的,这里只是连接到端口 1234 上的本地计算机......但是如果您使用远程计算机名称和端口(或也),您将“毫不费力”地向远程机器发出信号。

在此处获取 dillpathosppfthttps://github.com/uqfoundation

If you use dill, it enables you to treat __main__ as if it were a python module (for the most part). Hence, you can serialize interactively defined classes, and the like. dill also (by default) can transport the class definition as part of the pickle.

>>> class MyTest(object):
...   def foo(self, x):
...     return self.x * x
...   x = 4
... 
>>> f = MyTest() 
>>> import dill
>>>
>>> with open('test.pkl', 'wb') as s:
...   dill.dump(f, s)
... 
>>> 

Then shut down the interpreter, and send the file test.pkl over TCP. On your remote machine, now you can get the class instance.

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('test.pkl', 'rb') as s:
...   f = dill.load(s)
... 
>>> f
<__main__.MyTest object at 0x1069348d0>
>>> f.x
4
>>> f.foo(2)
8
>>>             

But how to get the class definition? So this is not exactly what you wanted. The following is, however.

>>> class MyTest2(object):
...   def bar(self, x):
...     return x*x + self.x
...   x = 1
... 
>>> import dill
>>> with open('test2.pkl', 'wb') as s:
...   dill.dump(MyTest2, s)
... 
>>>

Then after sending the file… you can get the class definition.

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('test2.pkl', 'rb') as s:
...   MyTest2 = dill.load(s)
... 
>>> print dill.source.getsource(MyTest2)
class MyTest2(object):
  def bar(self, x):
    return x*x + self.x
  x = 1

>>> f = MyTest2()
>>> f.x
1
>>> f.bar(4)
17

Since you were looking for a one liner, I can do better. I didn't show you can send over the class and the instance at the same time, and maybe that's what you were wanting.

>>> import dill
>>> class Foo(object): 
...   def bar(self, x):
...     return x+self.x
...   x = 1
... 
>>> b = Foo()
>>> b.x = 5
>>> 
>>> with open('blah.pkl', 'wb') as s:
...   dill.dump((Foo, b), s)
... 
>>> 

It's still not a single line, however, it works.

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('blah.pkl', 'rb') as s:
...   Foo, b = dill.load(s)
... 
>>> b.x  
5
>>> Foo.bar(b, 2)
7

So, within dill, there's dill.source, and that has methods that can detect dependencies of functions and classes, and take them along with the pickle (for the most part).

>>> def foo(x):
...   return x*x
... 
>>> class Bar(object):
...   def zap(self, x):
...     return foo(x) * self.x
...   x = 3
... 
>>> print dill.source.importable(Bar.zap, source=True)
def foo(x):
  return x*x
def zap(self, x):
  return foo(x) * self.x

So that's not "perfect" (or maybe not what's expected)… but it does serialize the code for a dynamically built method and it's dependencies. You just don't get the rest of the class -- but the rest of the class is not needed in this case. Still, it doesn't seem like what you wanted.

If you wanted to get everything, you could just pickle the entire session.
And in one line (two counting the import).

>>> import dill
>>> def foo(x):
...   return x*x
... 
>>> class Blah(object):
...   def bar(self, x):
...     self.x = (lambda x:foo(x)+self.x)(x)
...   x = 2
... 
>>> b = Blah()
>>> b.x
2
>>> b.bar(3)
>>> b.x
11
>>> # the one line
>>> dill.dump_session('foo.pkl')
>>> 

Then on the remote machine...

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> # the one line
>>> dill.load_session('foo.pkl')
>>> b.x
11
>>> b.bar(2)
>>> b.x
15
>>> foo(3)
9

Lastly, if you want the transport to be "done" for you transparently (instead of using a file), you could use pathos.pp or ppft, which provide the ability to ship objects to a second python server (on a remote machine) or python process. They use dill under the hood, and just pass the code across the wire.

>>> class More(object):
...   def squared(self, x):
...     return x*x
... 
>>> import pathos
>>> 
>>> p = pathos.pp.ParallelPythonPool(servers=('localhost,1234',))
>>> 
>>> m = More()
>>> p.map(m.squared, range(5))
[0, 1, 4, 9, 16]

The servers argument is optional, and here is just connecting to the local machine on port 1234… but if you use the remote machine name and port instead (or as well), you'll fire off to the remote machine -- "effortlessly".

Get dill, pathos, and ppft here: https://github.com/uqfoundation

墟烟 2024-12-01 18:31:35

Pickle 无法pickle python 代码,所以我认为pickle 根本不可能做到这一点。

>>> from pickle import *
>>> def A(object):
...     def __init__(self):
...             self.potato = "Hello"
...             print "Starting"
...                                                                                                                                                                  
>>> A.__code__                                                                                                                                                       
<code object A at 0xb76bc0b0, file "<stdin>", line 1>                                                                                                                
>>> dumps(A.__code__)                                                                                                                                                
Traceback (most recent call last):                                                                                                                                   
  File "<stdin>", line 1, in <module>                                                                                                                                
  File "/usr/lib/python2.6/pickle.py", line 1366, in dumps
    Pickler(file, protocol).dump(obj)
  File "/usr/lib/python2.6/pickle.py", line 224, in dump
    self.save(obj)
  File "/usr/lib/python2.6/pickle.py", line 306, in save
    rv = reduce(self.proto)
  File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle code objects

Pickle can't pickle python code, so I don't think this is possible at all with pickle.

>>> from pickle import *
>>> def A(object):
...     def __init__(self):
...             self.potato = "Hello"
...             print "Starting"
...                                                                                                                                                                  
>>> A.__code__                                                                                                                                                       
<code object A at 0xb76bc0b0, file "<stdin>", line 1>                                                                                                                
>>> dumps(A.__code__)                                                                                                                                                
Traceback (most recent call last):                                                                                                                                   
  File "<stdin>", line 1, in <module>                                                                                                                                
  File "/usr/lib/python2.6/pickle.py", line 1366, in dumps
    Pickler(file, protocol).dump(obj)
  File "/usr/lib/python2.6/pickle.py", line 224, in dump
    self.save(obj)
  File "/usr/lib/python2.6/pickle.py", line 306, in save
    rv = reduce(self.proto)
  File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle code objects
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文