python 中的 HeaderParseError

发布于 2024-11-18 05:49:54 字数 652 浏览 2 评论 0原文

如果我尝试在 python 2.6.5（和 2.7）中使用decode_header()解析这个字符串，我会得到一个 HeaderParseError 。这里是字符串的 repr()：

 '=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?='

该字符串来自一封包含 JPEG 图片的 mime 电子邮件。雷鸟可以解码文件名（包含德语变音符号）。

>>> from email.header import decode_header
>>> decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header
    raise HeaderParseError
email.errors.HeaderParseError

原文

I get a HeaderParseError if I try to parse this string with decode_header() in python 2.6.5 (and 2.7). Here the repr() of the string:

 '=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?='

This string comes from a mime email which contains a JPEG picture. Thunderbird can
decode the filename (which contains German umlauts).

>>> from email.header import decode_header
>>> decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header
    raise HeaderParseError
email.errors.HeaderParseError

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

始终不够 2024-11-25 05:49:54

Python 的 base64 编码字符串字符集与邮件代理的字符集之间似乎不兼容：

>>> from email.header import decode_header
>>> a='QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw=='
>>> decode_header(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/email/header.py", line 108, in decode_header
    raise HeaderParseError
email.errors.HeaderParseError
>>> a1= a.replace('_', '/')
>>> decode_header(a1)
[('Anmeldung Netzanschluss S\xecdring3p.jpg', 'iso-8859-1')]
>>> print _[0][0].decode(_[0][1])
Anmeldung Netzanschluss Südring3p.jpg

Python 使用的字符集维基百科文章建议（即0-9、AZ、az、+、/）。在同一篇文章中，包含了一些替代方案（包括此处的问题下划线）；但是，下划线的值是模糊的（它的值是 62 或 63，具体取决于替代方案）。

我不知道Python可以做什么来猜测b0rken邮件代理的意图；因此，我建议您在 decode_header 失败时进行一些适当的猜测。

我称邮件代理为“损坏”，因为无需在邮件标头中转义 + 或 /：它不是 URL，所以为什么不使用典型的字符集？

It seems an incompatibility between Python's character set for base64-encoded strings and the mail agent's:

>>> from email.header import decode_header
>>> a='QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw=='
>>> decode_header(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/email/header.py", line 108, in decode_header
    raise HeaderParseError
email.errors.HeaderParseError
>>> a1= a.replace('_', '/')
>>> decode_header(a1)
[('Anmeldung Netzanschluss S\xecdring3p.jpg', 'iso-8859-1')]
>>> print _[0][0].decode(_[0][1])
Anmeldung Netzanschluss Südring3p.jpg

Python utilizes the character set that the Wikipedia article suggests (i.e 0-9, A-Z, a-z, +, /). In that same article, some alternatives (including the underscore that's the issue here) are included; however, the underscore's value is vague (it's value 62 or 63, depending on the alternative).

I don't know what Python can do to guess the intentions of b0rken mail agents; so I suggest you do some appropriate guessing whenever decode_header fails.

I'm calling “broken” the mail agent because there is no need to escape either + or / in a message header: it's not a URL, so why not use the typical character set?

回复收藏 0 原文

~没有更多了~