email.header.decode_headers() 抛出 HeaderParseError
我正在尝试解码电子邮件主题标头。
我正在这样做(正则表达式用于在两个 = 之间添加空格:
header = '=?iso-8859-1?B?TU9UT1IubmwgbmlldXdzYnJpZWYgPiBOaWV1d2UgdmVya29vcHRvcHBl?==?iso-8859-1?B?ciBTdXp1a2kg?='
header = re.sub(r"(==)(?!$)", u"\0= =", header)
email.header.decode_header(header)
但这会引发 HeaderParseError:
HeaderParseError Traceback (most recent call last)
/home/leon/<ipython console> in <module>()
/usr/lib/python2.7/email/header.pyc in decode_header(header)
106 # now we throw the lower level exception away but
107 # when/if we get exception chaining, we'll preserve it.
--> 108 raise HeaderParseError
109 if dec is None:
110 dec = encoded
有趣的是,如果我将 re.sub() 的输出复制到剪贴板并执行
email.header.decode_header('=?iso-8859-1?B?TU9UT1IubmwgbmlldXdzYnJpZWYgPiBOaWV1d2UgdmVya29vcHRvcHBl?= =?iso-8859-1?B?ciBTdXp1a2kg?=')
以下操作 :所以
我猜 re.sub() 的编码有问题,但我不知道如何解决这个问题。
I'm trying to decode email Subject headers.
I'm doing this (the regex is for adding a space between the two = 's:
header = '=?iso-8859-1?B?TU9UT1IubmwgbmlldXdzYnJpZWYgPiBOaWV1d2UgdmVya29vcHRvcHBl?==?iso-8859-1?B?ciBTdXp1a2kg?='
header = re.sub(r"(==)(?!$)", u"\0= =", header)
email.header.decode_header(header)
But that throws an HeaderParseError:
HeaderParseError Traceback (most recent call last)
/home/leon/<ipython console> in <module>()
/usr/lib/python2.7/email/header.pyc in decode_header(header)
106 # now we throw the lower level exception away but
107 # when/if we get exception chaining, we'll preserve it.
--> 108 raise HeaderParseError
109 if dec is None:
110 dec = encoded
The funny thing is, if I copy the output of the re.sub() to my clipboard and do:
email.header.decode_header('=?iso-8859-1?B?TU9UT1IubmwgbmlldXdzYnJpZWYgPiBOaWV1d2UgdmVya29vcHRvcHBl?= =?iso-8859-1?B?ciBTdXp1a2kg?=')
it works!
So I guess something's wrong with the encoding of re.sub() but I don't know how to fix this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
示例中的 RFC2047 标记之间缺少空格,这不起作用。然而,您尝试修复它也是不正确的;您应该替换为
u"= ="
,而不是u"\0= ="
。如果您能够找到此类错误的根源并进行纠正,而不是事后尝试根据(充其量)对数据应该是什么的良好猜测来修复它,那就更好了。
You lack a space between the RFC2047 tokens in the example which doesn't work. Your attempt to repair it is, however, also incorrect; you should be replacing with
u"= ="
, notu"\0= ="
.It would be much better if you could find the source of such errors and correct it, rather than attempt to fix it up afterwards based on, at best, good guesses about what your data ought to be.