描述文件的 python pickleable 对象的设计

发布于 2024-09-30 16:20:03 字数 342 浏览 6 评论 0原文

我想创建一个描述文件资源的类,然后对其进行pickle。这部分很简单。具体来说,假设我有一个类“A”,它具有对文件进行操作的方法。如果该对象不包含文件句柄,我可以对其进行pickle。我希望能够创建一个文件句柄以便访问“A”描述的资源。如果我在“A”类中有一个“open()”方法来打开并存储文件句柄以供以后使用,那么“A”就不再是可腌制的。 (我在这里补充说,打开文件包括一些无法缓存的重要索引(第三方代码),因此在需要时关闭并重新打开并不是没有费用的)。我可以将类“A”编码为一个工厂,可以生成所描述文件的文件句柄,但这可能会导致多个文件句柄同时访问文件内容。我可以使用另一个类“B”来处理类“A”中文件的打开,包括锁定等。我可能对此想得太多,但任何提示将不胜感激。

I would like to create a class that describes a file resource and then pickle it. This part is straightforward. To be concrete, let's say that I have a class "A" that has methods to operate on a file. I can pickle this object if it does not contain a file handle. I want to be able to create a file handle in order to access the resource described by "A". If I have an "open()" method in class "A" that opens and stores the file handle for later use, then "A" is no longer pickleable. (I add here that opening the file includes some non-trivial indexing which cannot be cached--third party code--so closing and reopening when needed is not without expense). I could code class "A" as a factory that can generate file handles to the described file, but that could result in multiple file handles accessing the file contents simultaneously. I could use another class "B" to handle the opening of the file in class "A", including locking, etc. I am probably overthinking this, but any hints would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

攒一口袋星星 2024-10-07 16:20:03

问题不太清楚;它看起来是这样的:

  • 您有一个具有可picklable类的第三方模块,
  • 这些类可能包含对文件的引用,这使得类本身不可picklable,因为打开的文件不可picklable。

本质上,您希望使打开的文件可挑选。您可以相当轻松地做到这一点,但有一些注意事项。这是一个不完整但功能齐全的示例:

import pickle
class PicklableFile(object):
    def __init__(self, fileobj):
        self.fileobj = fileobj

    def __getattr__(self, key):
        return getattr(self.fileobj, key)

    def __getstate__(self):
        ret = self.__dict__.copy()
        ret['_file_name'] = self.fileobj.name
        ret['_file_mode'] = self.fileobj.mode
        ret['_file_pos'] = self.fileobj.tell()
        del ret['fileobj']
        return ret

    def __setstate__(self, dict):
        self.fileobj = open(dict['_file_name'], dict['_file_mode'])
        self.fileobj.seek(dict['_file_pos'])
        del dict['_file_name']
        del dict['_file_mode']
        del dict['_file_pos']
        self.__dict__.update(dict)

f = PicklableFile(open("/tmp/blah"))
print f.readline()
data = pickle.dumps(f)
f2 = pickle.loads(data)
print f2.read()

警告和注释,有些明显,有些则不太明显:

  • 此类应该直接对从 open 获得的文件对象进行操作。如果您在文件上使用包装类,例如 gzip.GzipFile,那么这些包装类应该位于此上方,而不是下方。从逻辑上讲,将其视为 file 之上的装饰器类。
  • 如果 unpickle 时文件不存在,则无法 unpickle 并且会抛出异常。
  • 如果是不同的文件,则该行为可能有意义,也可能没有意义。
  • 如果文件模式包含文件创建('w+'),并且文件不存在,则创建文件;我们不知道要使用什么文件权限,因为它没有与文件一起存储。如果这很重要(可能不应该如此),那么在第一次创建类时将正确的权限存储在类中。
  • 如果文件不可查找,尝试查找旧位置可能会引发 IOError;如果您使用这样的文件,您需要决定如何处理它。
  • Python 2 和 Python 3 中的文件类不同; Python 3 中没有 file 类。即使您现在只使用 Python 2,也不要子类化 file

我会避免这样做;依赖于外部文件的腌制数据不改变并保留在同一个地方是很脆弱的。这使得重新定位文件变得困难,因为您的腌制数据没有意义。

The question isn't too clear; what it looks like is that:

  • you have a third-party module which has picklable classes
  • those classes may contain references to files, which makes the classes themselves not picklable because open files aren't picklable.

Essentially, you want to make open files picklable. You can do this fairly easily, with certain caveats. Here's an incomplete but functional sample:

import pickle
class PicklableFile(object):
    def __init__(self, fileobj):
        self.fileobj = fileobj

    def __getattr__(self, key):
        return getattr(self.fileobj, key)

    def __getstate__(self):
        ret = self.__dict__.copy()
        ret['_file_name'] = self.fileobj.name
        ret['_file_mode'] = self.fileobj.mode
        ret['_file_pos'] = self.fileobj.tell()
        del ret['fileobj']
        return ret

    def __setstate__(self, dict):
        self.fileobj = open(dict['_file_name'], dict['_file_mode'])
        self.fileobj.seek(dict['_file_pos'])
        del dict['_file_name']
        del dict['_file_mode']
        del dict['_file_pos']
        self.__dict__.update(dict)

f = PicklableFile(open("/tmp/blah"))
print f.readline()
data = pickle.dumps(f)
f2 = pickle.loads(data)
print f2.read()

Caveats and notes, some obvious, some less so:

  • This class should operate directly on the file object you got from open. If you're using wrapper classes on files, like gzip.GzipFile, those should go above this, not below it. Logically, treat this as a decorator class on top of file.
  • If the file doesn't exist when you unpickle, it can't be unpickled and will throw an exception.
  • If it's a different file, the behavior may or may not make sense.
  • If the file mode includes file creation ('w+'), and the file doesn't exist, it'll be created; we don't know what file permissions to use, since that's not stored with the file. If this is important--it probably shouldn't be--then store the correct permissions in the class when you first create it.
  • If the file isn't seekable, trying to seek to the old position may raise IOError; if you're using a file like that you'll need to decide how to handle that.
  • The file classes in Python 2 and Python 3 are different; there's no file class in Python 3. Even if you're only using Python 2 right now, don't subclass file.

I'd steer away from doing this; having pickled data dependent on external files not changing and staying in the same place is brittle. This makes it difficult to even relocate files, since your pickled data won't make sense.

你怎么这么可爱啊 2024-10-07 16:20:03

如果您打开指向文件的指针,对其进行腌制,然后稍后尝试重新构建,则不能保证该文件仍可用于打开。

详细来说,文件指针实际上代表了与文件的连接。就像数据库连接一样,您无法“pickle”连接的另一端,因此这不起作用。

是否可以将文件指针保留在自己进程的内存中?

If you open a pointer to a file, pickle it, then attempt to reconstitute is later, there is no guarantee that file will still be available for opening.

To elaborate, the file pointer really represents a connection to the file. Just like a database connection, you can't "pickle" the other end of the connection, so this won't work.

Is it possible to keep the file pointer around in memory in its own process instead?

娇俏 2024-10-07 16:20:03

听起来你知道你不能腌制手柄,但你对此没意见,你只想腌制可以腌制的部分。正如您的对象现在所站的那样,它不能被腌制,因为它有手柄。我有这个权利吗?如果是这样,请继续阅读。

对于这些情况,pickle 模块将让你的类描述它自己的状态来 pickle。您想要定义自己的 __getstate__ 方法。 pickler 将调用它来获取要腌制的状态,只有当该方法丢失时,它才会继续执行尝试腌制所有属性的默认操作。

It sounds like you know you can't pickle the handle, and you're ok with that, you just want to pickle the part that can be pickled. As your object stands now, it can't be pickled because it has the handle. Do I have that right? If so, read on.

The pickle module will let your class describe its own state to pickle, for exactly these cases. You want to define your own __getstate__ method. The pickler will invoke it to get the state to be pickled, only if the method is missing does it go ahead and do the default thing of trying to pickle all the attributes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文