如何使用 cPickle Python 将包含 utf-8 字符作为其键的字典保存到文件中?
我想知道如何使用 cPickle
将包含 utf-8 字符的字典作为其键保存到 Python 中的文件中?这本字典非常大,而且我听说 cPickle
比 pickle
快得多。另外我认为使用 utf-8 编码的密钥也是有问题的。 也欢迎任何其他快速解决方案。 这就是我所做的,下面是错误消息:
unique_ngrams_dict = defaultdict(lambda: 0)# just to show how I defined my dict
dict_file = codecs.open('ngram_dict', 'w', 'utf-8')
cPickle.dump(unique_ngrams_dict,dict_file)
dict_file.close()
错误消息:
Traceback (most recent call last):
File "Generate_NGram.py", line 81, in <module>
save_ngram_dict(unique_ngrams_dict)
File "Generate_NGram.py", line 70, in save_ngram_dict
cPickle.dump(unique_ngrams_dict,dict_file)
File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle function objects
谢谢
I want to know How to save a dictionary containing utf-8 characters as its keys to a file in Python with cPickle
? this dictionary is very large and I've heard that cPickle
is much faster than pickle
. Also I suppose having utf-8 encoded keys is also problematic.
Any other fast solutions are also welcome.
here is what I do and below is the error message:
unique_ngrams_dict = defaultdict(lambda: 0)# just to show how I defined my dict
dict_file = codecs.open('ngram_dict', 'w', 'utf-8')
cPickle.dump(unique_ngrams_dict,dict_file)
dict_file.close()
error message:
Traceback (most recent call last):
File "Generate_NGram.py", line 81, in <module>
save_ngram_dict(unique_ngrams_dict)
File "Generate_NGram.py", line 70, in save_ngram_dict
cPickle.dump(unique_ngrams_dict,dict_file)
File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle function objects
thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Pickle 是一种二进制格式,因此您不应使用任何编解码器打开该文件,只需:
这不是它失败的原因,只是效率很低。
实际问题是您尝试保存的对象包含函数引用
(默认值
lambda: 0
)并且pickle格式不支持序列化函数。您将有三个选择:
dict
并使用其带有默认参数的.get
方法。设置
酸洗前并将其设置回
解酸后。
定义一个类:
并使用
NgramDefault()
作为默认工厂,而不是lambda: 0
。Pickle is a binary format, so you shouldn't open the file with any codecs, just:
It's not a reason it's failing, just quite inefficient.
The actual problem is the object you are trying to save contains a function reference
(the default value
lambda: 0
) and pickle format does not support serializing functions.You'll have three options:
dict
and use it's.get
method with default argument.Set
before pickling and set it back to
after unpickling.
Define a class like:
and use
NgramDefault()
as the default factory instead oflambda: 0
.您应该这样做并相信 pickle 模块会做正确的事情。处理pickle的最好方法是把它当作一团不透明的东西,当你解开它时,它会神奇地重新创建你开始使用的确切数据结构。
不要尝试对 pickle 的输出应用任何类型的编码,它应该被视为二进制 blob。如果在 pickle 时有 unicode 元素,那么在 unpickle 后它们将是 unicode。
You should just do it and trust the pickle module to do the right thing. The best way to treat pickle is as an opaque blob of stuff that will magically re-create the exact data structure you started with when you unpickle it.
Don't try to apply any sort of encoding to the output of pickle, it should be treated as a binary blob. If you have unicode elements when you pickle, they will be unicode once you unpickle.