python 中的 HeaderParseError
如果我尝试在 python 2.6.5(和 2.7)中使用decode_header()解析这个字符串,我会得到一个 HeaderParseError 。这里是字符串的 repr():
'=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?='
该字符串来自一封包含 JPEG 图片的 mime 电子邮件。雷鸟可以 解码文件名(包含德语变音符号)。
>>> from email.header import decode_header
>>> decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header
raise HeaderParseError
email.errors.HeaderParseError
I get a HeaderParseError if I try to parse this string with decode_header() in python 2.6.5 (and 2.7). Here the repr() of the string:
'=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?='
This string comes from a mime email which contains a JPEG picture. Thunderbird can
decode the filename (which contains German umlauts).
>>> from email.header import decode_header
>>> decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header
raise HeaderParseError
email.errors.HeaderParseError
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Python 的 base64 编码字符串字符集与邮件代理的字符集之间似乎不兼容:
Python 使用 的字符集维基百科文章建议(即0-9、AZ、az、+、/)。在同一篇文章中,包含了一些替代方案(包括此处的问题下划线);但是,下划线的值是模糊的(它的值是 62 或 63,具体取决于替代方案)。
我不知道Python可以做什么来猜测b0rken邮件代理的意图;因此,我建议您在
decode_header
失败时进行一些适当的猜测。我称邮件代理为“损坏”,因为无需在邮件标头中转义
+
或/
:它不是 URL,所以为什么不使用典型的字符集?It seems an incompatibility between Python's character set for base64-encoded strings and the mail agent's:
Python utilizes the character set that the Wikipedia article suggests (i.e 0-9, A-Z, a-z, +, /). In that same article, some alternatives (including the underscore that's the issue here) are included; however, the underscore's value is vague (it's value 62 or 63, depending on the alternative).
I don't know what Python can do to guess the intentions of b0rken mail agents; so I suggest you do some appropriate guessing whenever
decode_header
fails.I'm calling “broken” the mail agent because there is no need to escape either
+
or/
in a message header: it's not a URL, so why not use the typical character set?