更改模块目录后的 Python pickle

发布于 2024-08-19 02:36:44 字数 773 浏览 7 评论 0原文

我最近更改了程序的目录布局:之前,我将所有模块都放在“main”文件夹中。现在,我已将它们移动到以程序命名的目录中,并在其中放置一个 __init__.py 来制作一个包。

现在,我的主目录中有一个 .py 文件,用于启动我的程序,这更加简洁。

不管怎样,尝试从我的程序的早期版本加载腌制文件失败了。我收到“ImportError:没有名为工具的模块” - 我猜这是因为我的模块以前位于主文件夹中,现在位于 Whyteboard.tools 中,而不仅仅是普通的工具。但是,在工具模块中导入的代码与其位于同一目录中,因此我怀疑是否需要指定一个包。

所以,我的程序目录看起来像这样:

whyteboard-0.39.4

-->whyteboard.py

-->README.txt

-->CHANGELOG.txt

---->whyteboard/

---->whyteboard/__init__.py

---->whyteboard/gui.py

---->whyteboard/tools.py

Whyteboard.py 从 Whyteboard/gui.py 启动一段代码,并触发上GUI。在目录重新组织之前,这个酸洗问题肯定不会发生。

I've recently changed my program's directory layout: before, I had all my modules inside the "main" folder. Now, I've moved them into a directory named after the program, and placed an __init__.py there to make a package.

Now I have a single .py file in my main directory that is used to launch my program, which is much neater.

Anyway, trying to load in pickled files from previous versions of my program is failing. I'm getting, "ImportError: No module named tools" - which I guess is because my module was previously in the main folder, and now it's in whyteboard.tools, not simply plain tools. However, the code that is importing in the tools module lives in the same directory as it, so I doubt there's a need to specify a package.

So, my program directory looks something like this:

whyteboard-0.39.4

-->whyteboard.py

-->README.txt

-->CHANGELOG.txt

---->whyteboard/

---->whyteboard/__init__.py

---->whyteboard/gui.py

---->whyteboard/tools.py

whyteboard.py launches a block of code from whyteboard/gui.py, that fires up the GUI. This pickling problem definitely wasn't happening before the directory re-organizing.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

不必在意 2024-08-26 02:36:45

正如 pickle 的文档所说,为了保存和恢复类实例(实际上也是一个函数),您必须遵守某些约束:

pickle可以保存和恢复类
实例透明,但是
类定义必须是可导入的
并住在同一个模块中
对象已存储

whyteboard.tools不是“与”tools“相同的模块”(即使它可以通过import导入tools 被同一包中的其他模块调用,它最终以 sys.modules['whyteboard.tools'] 的形式出现在 sys.modules 中:这绝对是至关重要的,否则同一包中的一个模块与另一个包中的一个模块导入的同一模块最终会出现多个条目,并且可能存在冲突!)。

如果您的 pickle 文件采用良好/高级格式(而不是仅出于兼容性原因而默认的旧 ascii 格式),那么在执行此类更改后迁移它们实际上可能那么简单作为“编辑文件”(这是二进制的&c...!),尽管另一个答案表明了这一点。相反,我建议您制作一个小“pickle 迁移脚本”:让它像这样修补 sys.modules...:

import sys
from whyteboard import tools

sys.modules['tools'] = tools

然后 cPickle.load 每个文件、 del sys.modules['tools']cPickle.dump 将每个加载的对象返回到文件:sys.modules 应该让 pickles 成功加载,然后再次转储它们应该为实例的类使用正确的模块名称(删除额外的条目应该确保这一点)。

As pickle's docs say, in order to save and restore a class instance (actually a function, too), you must respect certain constraints:

pickle can save and restore class
instances transparently, however the
class definition must be importable
and live in the same module as when
the object was stored

whyteboard.tools is not the "the same module as" tools (even though it can be imported by import tools by other modules in the same package, it ends up in sys.modules as sys.modules['whyteboard.tools']: this is absolutely crucial, otherwise the same module imported by one in the same package vs one in another package would end up with multiple and possibly conflicting entries!).

If your pickle files are in a good/advanced format (as opposed to the old ascii format that's the default only for compatibility reasons), migrating them once you perform such changes may in fact not be quite as trivial as "editing the file" (which is binary &c...!), despite what another answer suggests. I suggest that, instead, you make a little "pickle-migrating script": let it patch sys.modules like this...:

import sys
from whyteboard import tools

sys.modules['tools'] = tools

and then cPickle.load each file, del sys.modules['tools'], and cPickle.dump each loaded object back to file: that temporary extra entry in sys.modules should let the pickles load successfully, then dumping them again should be using the right module-name for the instances' classes (removing that extra entry should make sure of that).

奈何桥上唱咆哮 2024-08-26 02:36:45

这可以通过使用 的自定义“unpickler”来完成find_class()

import io
import pickle


class RenameUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        renamed_module = module
        if module == "tools":
            renamed_module = "whyteboard.tools"

        return super(RenameUnpickler, self).find_class(renamed_module, name)


def renamed_load(file_obj):
    return RenameUnpickler(file_obj).load()


def renamed_loads(pickled_bytes):
    file_obj = io.BytesIO(pickled_bytes)
    return renamed_load(file_obj)

那么您需要使用 renamed_load() 而不是 pickle.load()renamed_loads() 而不是 pickle.loads()

This can be done with a custom "unpickler" that uses find_class():

import io
import pickle


class RenameUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        renamed_module = module
        if module == "tools":
            renamed_module = "whyteboard.tools"

        return super(RenameUnpickler, self).find_class(renamed_module, name)


def renamed_load(file_obj):
    return RenameUnpickler(file_obj).load()


def renamed_loads(pickled_bytes):
    file_obj = io.BytesIO(pickled_bytes)
    return renamed_load(file_obj)

Then you'd need to use renamed_load() instead of pickle.load() and renamed_loads() instead of pickle.loads().

愿得七秒忆 2024-08-26 02:36:45

发生在我身上,通过在加载 pickle 之前将模块的新位置添加到 sys.path 来解决它:

import sys
sys.path.append('path/to/whiteboard')
f = open("pickled_file", "rb")
pickle.load(f)

Happened to me, solved it by adding the new location of the module to sys.path before loading pickle:

import sys
sys.path.append('path/to/whiteboard')
f = open("pickled_file", "rb")
pickle.load(f)
千鲤 2024-08-26 02:36:45

pickle 通过引用序列化类,因此如果您更改类的生存位置,它将不会取消pickle,因为将找不到该类。如果您使用 dill 而不是 pickle,那么您可以通过引用或直接序列化类(通过直接序列化类而不是其导入路径)。您只需在 dump 之后和 load 之前更改类定义即可轻松模拟这一点。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> 
>>> class Foo(object):
...   def bar(self):
...     return 5
... 
>>> f = Foo()
>>> 
>>> _f = dill.dumps(f)
>>> 
>>> class Foo(object):
...   def bar(self, x):
...     return x
... 
>>> g = Foo()
>>> f_ = dill.loads(_f)
>>> f_.bar()
5
>>> g.bar(4)
4

pickle serializes classes by reference, so if you change were the class lives, it will not unpickle because the class will not be found. If you use dill instead of pickle, then you can serialize classes by reference or directly (by directly serializing the class instead of it's import path). You simulate this pretty easily by just changing the class definition after a dump and before a load.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> 
>>> class Foo(object):
...   def bar(self):
...     return 5
... 
>>> f = Foo()
>>> 
>>> _f = dill.dumps(f)
>>> 
>>> class Foo(object):
...   def bar(self, x):
...     return x
... 
>>> g = Foo()
>>> f_ = dill.loads(_f)
>>> f_.bar()
5
>>> g.bar(4)
4
柳若烟 2024-08-26 02:36:45

这是 pickle 的正常行为,unpickled 对象需要有它们的 定义可导入模块

您应该能够通过编辑 pickled 文件来更改模块路径(即从 toolswhyteboard.tools),因为它们通常是简单的文本文件。

This is the normal behavior of pickle, unpickled objects need to have their defining module importable.

You should be able to change the modules path (i.e. from tools to whyteboard.tools) by editing the pickled files, as they are normally simple text files.

别在捏我脸啦 2024-08-26 02:36:45

对于像我这样需要更新大量 pickle 转储的人,这里有一个实现 @Alex Martelli 的极好的建议的函数:

import sys
from types import ModuleType
import pickle

# import torch

def update_module_path_in_pickled_object(
    pickle_path: str, old_module_path: str, new_module: ModuleType
) -> None:
    """Update a python module's dotted path in a pickle dump if the
    corresponding file was renamed.

    Implements the advice in https://stackoverflow.com/a/2121918.

    Args:
        pickle_path (str): Path to the pickled object.
        old_module_path (str): The old.dotted.path.to.renamed.module.
        new_module (ModuleType): from new.location import module.
    """
    sys.modules[old_module_path] = new_module

    dic = pickle.load(open(pickle_path, "rb"))
    # dic = torch.load(pickle_path, map_location="cpu")

    del sys.modules[old_module_path]

    pickle.dump(dic, open(pickle_path, "wb"))
    # torch.save(dic, pickle_path)

在我的例子中,转储是 PyTorch 模型检查点。因此注释掉了 torch.load/save()

示例

from new.location import new_module

for pickle_path in ('foo.pkl', 'bar.pkl'):
    update_module_path_in_pickled_object(
        pickle_path, "old.module.dotted.path", new_module
    )

For people like me needing to update lots of pickle dumps, here's a function implementing @Alex Martelli's excellent advice:

import sys
from types import ModuleType
import pickle

# import torch

def update_module_path_in_pickled_object(
    pickle_path: str, old_module_path: str, new_module: ModuleType
) -> None:
    """Update a python module's dotted path in a pickle dump if the
    corresponding file was renamed.

    Implements the advice in https://stackoverflow.com/a/2121918.

    Args:
        pickle_path (str): Path to the pickled object.
        old_module_path (str): The old.dotted.path.to.renamed.module.
        new_module (ModuleType): from new.location import module.
    """
    sys.modules[old_module_path] = new_module

    dic = pickle.load(open(pickle_path, "rb"))
    # dic = torch.load(pickle_path, map_location="cpu")

    del sys.modules[old_module_path]

    pickle.dump(dic, open(pickle_path, "wb"))
    # torch.save(dic, pickle_path)

In my case, the dumps were PyTorch model checkpoints. Hence the commented-out torch.load/save().

Example

from new.location import new_module

for pickle_path in ('foo.pkl', 'bar.pkl'):
    update_module_path_in_pickled_object(
        pickle_path, "old.module.dotted.path", new_module
    )
对岸观火 2024-08-26 02:36:45

当您尝试加载包含类引用的 pickle 文件时,必须遵循保存 pickle 时相同的结构。如果你想在其他地方使用pickle,你必须告诉这个类或其他对象在哪里;因此,执行以下操作可以挽救这一天:

import sys
sys.path.append('path/to/folder containing the python module')

When you try to load a pickle file that contain a class reference, you must respect the same structure when you saved the pickle. If you want use the pickle somewhere else, you have to tell where this class or other object is; so do this below you can save the day:

import sys
sys.path.append('path/to/folder containing the python module')
生寂 2024-08-26 02:36:45

我知道这已经有一段时间了,但这为我解决了这个问题:

本质上,使用完整的导入路径(例如 concurrent.run_concurrent),而不仅仅是模块名称(例如 run_concurrent) code>)


共享代码:

import importlib
module_path="concurrent.run_concurrent"

...

module = importlib.util.module_from_spec(spec)

原始(错误):

module_name = module_path.split(".")[-1]

spec = importlib.util.spec_from_file_location(module_name, filepath)

...

sys.modules[module_name] = module

替换为以下内容(删除对 module_name 的所有引用):

# Remove "module_name"

# Use "module_path" instead of "module_name"
spec = importlib.util.spec_from_file_location(module_path, filepath)

...

# Use "module_path" instead of "module_name"
sys.modules[module_path] = module

I know this has been a while, but this fixed it for me:

Essentially, use full import path (eg. concurrent.run_concurrent) instead of just the module name (eg. run_concurrent)


Shared Code:

import importlib
module_path="concurrent.run_concurrent"

...

module = importlib.util.module_from_spec(spec)

Original (bad):

module_name = module_path.split(".")[-1]

spec = importlib.util.spec_from_file_location(module_name, filepath)

...

sys.modules[module_name] = module

Replace with the following (remove all references to module_name):

# Remove "module_name"

# Use "module_path" instead of "module_name"
spec = importlib.util.spec_from_file_location(module_path, filepath)

...

# Use "module_path" instead of "module_name"
sys.modules[module_path] = module
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文