存储 pickle'd 数据时出现 DjangoUnicodeDecodeError

发布于 2024-08-27 05:56:11 字数 1541 浏览 8 评论 0原文

我有一个简单的 dict 对象,在通过 pickle 运行后,我试图将其存储在数据库中。 Django 似乎不喜欢尝试编码此错误。我已经检查过 MySQL,查询在抛出错误之前甚至还没有到达那里,所以我不认为这是问题所在。我存储的 dict 看起来像这样:

{
    'ordered': [
        {   'value': u'First\xd1ame Last\xd1ame',
            'label': u'Full Name' },
        {   'value': u'123-456-7890',
            'label': u'Phone Number' },
        {   'value': u'[email protected]',
            'label': u'Email Address' } ],
    'cleaned_data': {
        u'Phone Number': u'123-456-7890',
        u'Full Name': u'First\xd1ame Last\xd1ame',
        u'Email Address': u'[email protected]' },
    'post_data': <QueryDict: {
        u'Phone Number': [u'1234567890'],
        u'Full Name_1': [u'Last\xd1ame'],
        u'Full Name_0': [u'First\xd1ame'],
        u'Email Address': [u'[email protected]'] }>,
    'user': <User: itis>
}

抛出的错误是:

“utf8”编解码器无法解码位置 52-53 中的字节:无效数据。

位置 52-53 是腌制数据中 \xd1 (Ñ) 的第一个实例。

到目前为止,我已经深入研究了 StackOverflow,发现了一些对象的数据库编码错误的问题。这对我没有帮助,因为还没有 MySQL 查询。这是在数据库之前发生的。在搜索 pickled 数据上的 unicode 错误时,Google 也没有提供太多帮助。

可能值得一提的是,如果我不使用 Ñ,此代码可以正常工作。

I've got a simple dict object I'm trying to store in the database after it has been run through pickle. It seems that Django doesn't like trying to encode this error. I've checked with MySQL, and the query isn't even getting there before it is throwing the error, so I don't believe that is the problem. The dict I'm storing looks like this:

{
    'ordered': [
        {   'value': u'First\xd1ame Last\xd1ame',
            'label': u'Full Name' },
        {   'value': u'123-456-7890',
            'label': u'Phone Number' },
        {   'value': u'[email protected]',
            'label': u'Email Address' } ],
    'cleaned_data': {
        u'Phone Number': u'123-456-7890',
        u'Full Name': u'First\xd1ame Last\xd1ame',
        u'Email Address': u'[email protected]' },
    'post_data': <QueryDict: {
        u'Phone Number': [u'1234567890'],
        u'Full Name_1': [u'Last\xd1ame'],
        u'Full Name_0': [u'First\xd1ame'],
        u'Email Address': [u'[email protected]'] }>,
    'user': <User: itis>
}

The error that gets thrown is:

'utf8' codec can't decode bytes in position 52-53: invalid data.

Position 52-53 is the first instance of \xd1 (Ñ) in the pickled data.

So far, I've dug around StackOverflow and found a few questions where the database encoding for the objects was wrong. This doesn't help me because there is no MySQL query yet. This is happening before the database. Google also didn't help much when searching for unicode errors on pickled data.

It is probably worth mentioning that if I don't use the Ñ, this code works fine.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

感悟人生的甜 2024-09-03 05:56:11

非常感谢@prometheus,我找到了解决方案。基本上,您可以在将 pickle.dumps() 的输出插入数据库之前使用 base64 进行编码。然后,您将转身并使用 base64 来解码数据库的输出,然后将其传递给 pickle.loads()

我的代码现在如下所示:

## Put the information into the database:
self.raw_data = base64.b64encode(pickle.dumps(data))

## Get the information out of the database:
return pickle.loads(base64.b64decode(self.raw_data))

再次感谢@prometheus。

With much thanks to @prometheus, I found a solution for this. Basically you can use base64 to encode the output of pickle.dumps() before plugging it into the database. You would then turn around and use base64 to decode the output of the database before passing it to pickle.loads().

My code now looks like this:

## Put the information into the database:
self.raw_data = base64.b64encode(pickle.dumps(data))

## Get the information out of the database:
return pickle.loads(base64.b64decode(self.raw_data))

Again, thank you @prometheus.

东北女汉子 2024-09-03 05:56:11

这是一个已知问题,Python bug-tracker 上对此进行了讨论:

今天我在将 python 数据结构写入
数据库。在这种情况下只有 ASCII 是安全的。我明白了
Python 文档表明协议 0 仅支持 ASCII。

我现在使用pickle+base64,但是,这使得调试更加困难。

无论如何,我认为文档应该明确指出协议 0 不是
仅限 ASCII,因为这在 Python 世界中很重要。例如,
我看到这个问题是因为 Django 进行了隐式 unicode() 转换
我的输入因非 ASCII 而失败。

That's a known problem, and there was a discussion about this on the Python bug-tracker:

I ran into this problem today when writing python data structures into a
database. Only ASCII is safe in this situation. I understood the
Python docs that protocol 0 was ASCII-only.

I use pickle+base64 now, however, this makes debugging more difficult.

Anyway, I think that the docs should clearly say that protocol 0 is not
ASCII-only because this is important in the Python world. For example,
I saw this issue because Django makes an implicit unicode() conversion
with my input which fails with non-ASCII.

蒲公英的约定 2024-09-03 05:56:11

我认为没有必要这样做。通常,应该可以在数据库中存储任何二进制数据。

更糟糕的问题是 pickling 不安全 - 如果数据库可以从任何地方获取数据,它可能会获取恶意的 pickling 数据。

I see no need to do so. Normally, it should be possible to store any binary data in a database.

A worse problem is that pickling is not safe - if the database could get its data from anywhere, it could get malicious pickling data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文