email.header.decode_headers() 抛出 HeaderParseError

发布于 2024-12-04 00:02:50 字数 1103 浏览 6 评论 0原文

我正在尝试解码电子邮件主题标头。

我正在这样做(正则表达式用于在两个 = 之间添加空格:

header = '=?iso-8859-1?B?TU9UT1IubmwgbmlldXdzYnJpZWYgPiBOaWV1d2UgdmVya29vcHRvcHBl?==?iso-8859-1?B?ciBTdXp1a2kg?='
header = re.sub(r"(==)(?!$)", u"\0= =", header)
email.header.decode_header(header)

但这会引发 HeaderParseError:

HeaderParseError                          Traceback (most recent call last)

/home/leon/<ipython console> in <module>()

/usr/lib/python2.7/email/header.pyc in decode_header(header)
    106                         # now we throw the lower level exception away but
    107                         # when/if we get exception chaining, we'll preserve it.
--> 108                         raise HeaderParseError
    109                 if dec is None:
    110                     dec = encoded

有趣的是,如果我将 re.sub() 的输出复制到剪贴板并执行

email.header.decode_header('=?iso-8859-1?B?TU9UT1IubmwgbmlldXdzYnJpZWYgPiBOaWV1d2UgdmVya29vcHRvcHBl?= =?iso-8859-1?B?ciBTdXp1a2kg?=')

以下操作 :所以

我猜 re.sub() 的编码有问题,但我不知道如何解决这个问题。

I'm trying to decode email Subject headers.

I'm doing this (the regex is for adding a space between the two = 's:

header = '=?iso-8859-1?B?TU9UT1IubmwgbmlldXdzYnJpZWYgPiBOaWV1d2UgdmVya29vcHRvcHBl?==?iso-8859-1?B?ciBTdXp1a2kg?='
header = re.sub(r"(==)(?!$)", u"\0= =", header)
email.header.decode_header(header)

But that throws an HeaderParseError:

HeaderParseError                          Traceback (most recent call last)

/home/leon/<ipython console> in <module>()

/usr/lib/python2.7/email/header.pyc in decode_header(header)
    106                         # now we throw the lower level exception away but
    107                         # when/if we get exception chaining, we'll preserve it.
--> 108                         raise HeaderParseError
    109                 if dec is None:
    110                     dec = encoded

The funny thing is, if I copy the output of the re.sub() to my clipboard and do:

email.header.decode_header('=?iso-8859-1?B?TU9UT1IubmwgbmlldXdzYnJpZWYgPiBOaWV1d2UgdmVya29vcHRvcHBl?= =?iso-8859-1?B?ciBTdXp1a2kg?=')

it works!

So I guess something's wrong with the encoding of re.sub() but I don't know how to fix this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

绳情 2024-12-11 00:02:50

示例中的 RFC2047 标记之间缺少空格,这不起作用。然而,您尝试修复它也是不正确的;您应该替换为 u"= =",而不是 u"\0= ="

如果您能够找到此类错误的根源并进行纠正,而不是事后尝试根据(充其量)对数据应该是什么的良好猜测来修复它,那就更好了。

You lack a space between the RFC2047 tokens in the example which doesn't work. Your attempt to repair it is, however, also incorrect; you should be replacing with u"= =", not u"\0= =".

It would be much better if you could find the source of such errors and correct it, rather than attempt to fix it up afterwards based on, at best, good guesses about what your data ought to be.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文