如何使用 cPickle Python 将包含 utf-8 字符作为其键的字典保存到文件中?

发布于 2024-10-21 06:03:04 字数 865 浏览 7 评论 0原文

我想知道如何使用 cPickle 将包含 utf-8 字符的字典作为其键保存到 Python 中的文件中?这本字典非常大,而且我听说 cPicklepickle 快得多。另外我认为使用 utf-8 编码的密钥也是有问题的。 也欢迎任何其他快速解决方案。 这就是我所做的,下面是错误消息:

unique_ngrams_dict = defaultdict(lambda: 0)# just to show how I defined my dict


dict_file = codecs.open('ngram_dict', 'w', 'utf-8')
cPickle.dump(unique_ngrams_dict,dict_file)
dict_file.close()

错误消息:

Traceback (most recent call last):
  File "Generate_NGram.py", line 81, in <module>
    save_ngram_dict(unique_ngrams_dict)
  File "Generate_NGram.py", line 70, in save_ngram_dict
    cPickle.dump(unique_ngrams_dict,dict_file)
  File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle function objects

谢谢

I want to know How to save a dictionary containing utf-8 characters as its keys to a file in Python with cPickle? this dictionary is very large and I've heard that cPickle is much faster than pickle. Also I suppose having utf-8 encoded keys is also problematic.
Any other fast solutions are also welcome.
here is what I do and below is the error message:

unique_ngrams_dict = defaultdict(lambda: 0)# just to show how I defined my dict


dict_file = codecs.open('ngram_dict', 'w', 'utf-8')
cPickle.dump(unique_ngrams_dict,dict_file)
dict_file.close()

error message:

Traceback (most recent call last):
  File "Generate_NGram.py", line 81, in <module>
    save_ngram_dict(unique_ngrams_dict)
  File "Generate_NGram.py", line 70, in save_ngram_dict
    cPickle.dump(unique_ngrams_dict,dict_file)
  File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle function objects

thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

何以畏孤独 2024-10-28 06:03:04
  1. Pickle 是一种二进制格式,因此您不应使用任何编解码器打开该文件,只需:

    文件('ngram_dict', 'w')
    

    这不是它失败的原因,只是效率很低。

  2. 实际问题是您尝试保存的对象包含函数引用
    (默认值lambda: 0)并且pickle格式不支持序列化函数。

    您将有三个选择:

    1. 使用常规 dict 并使用其带有默认参数的 .get 方法。
    2. 设置

      unique_ngrams_dict.default_factory = 无
      

      酸洗前并将其设置回

      unique_ngrams_dict.default_factory = lambda:0
      

      解酸后。

    3. 定义一个类:

      NgramDefault 类:
          def __call__():
              返回0
      

      并使用 NgramDefault() 作为默认工厂,而不是 lambda: 0

  1. Pickle is a binary format, so you shouldn't open the file with any codecs, just:

    file('ngram_dict', 'w')
    

    It's not a reason it's failing, just quite inefficient.

  2. The actual problem is the object you are trying to save contains a function reference
    (the default value lambda: 0) and pickle format does not support serializing functions.

    You'll have three options:

    1. Use a regular dict and use it's .get method with default argument.
    2. Set

      unique_ngrams_dict.default_factory = None
      

      before pickling and set it back to

      unique_ngrams_dict.default_factory = lambda: 0
      

      after unpickling.

    3. Define a class like:

      class NgramDefault:
          def __call__():
              return 0
      

      and use NgramDefault() as the default factory instead of lambda: 0.

小傻瓜 2024-10-28 06:03:04

您应该这样做并相信 pickle 模块会做正确的事情。处理pickle的最好方法是把它当作一团不透明的东西,当你解开它时,它会神奇地重新创建你开始使用的确切数据结构。

不要尝试对 pickle 的输出应用任何类型的编码,它应该被视为二进制 blob。如果在 pickle 时有 unicode 元素,那么在 unpickle 后它们将是 unicode。

You should just do it and trust the pickle module to do the right thing. The best way to treat pickle is as an opaque blob of stuff that will magically re-create the exact data structure you started with when you unpickle it.

Don't try to apply any sort of encoding to the output of pickle, it should be treated as a binary blob. If you have unicode elements when you pickle, they will be unicode once you unpickle.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文