来自 PHP 的电子邮件已破坏主题标头编码
我的 PHP 脚本向用户发送电子邮件,当电子邮件到达他们的邮箱时,主题行 ($subject
) 的主题末尾添加了诸如 a^£
之类的字符文本。这显然是和编码问题有关。电子邮件内容本身很好,只是主题行被破坏了。
我已经搜索遍了,但找不到如何正确编码我的主题。
这是我的标题。请注意,我将 Content-Type
与 charset=utf-8
和 Content-Transfer-Encoding: 8bit
一起使用。
//set all necessary headers
$headers = "From: $sender_name<$from>\n";
$headers .= "Reply-To: $sender_name<$from>\n";
$headers .= "X-Sender: $sender_name<$from>\n";
$headers .= "X-Mailer: PHP4\n"; //mailer
$headers .= "X-Priority: 3\n"; //1 UrgentMessage, 3 Normal
$headers .= "MIME-Version: 1.0\n";
$headers .= "X-MSMail-Priority: High\n";
$headers .= "Importance: 3\n";
$headers .= "Date: $date\n";
$headers .= "Delivered-to: $to\n";
$headers .= "Return-Path: $sender_name<$from>\n";
$headers .= "Envelope-from: $sender_name<$from>\n";
$headers .= "Content-Transfer-Encoding: 8bit\n";
$headers .= "Content-Type: text/plain; charset=UTF-8\n";
My PHP script sends email to users and when the email arrives to their mailboxes, the subject line ($subject
) has characters like a^£
added to the end of my subject text. This is obviously and encoding problem. The email message content itself is fine, just the subject line is broken.
I have searched all over but can’t find how to encode my subject properly.
This is my header. Notice that I’m using Content-Type
with charset=utf-8
and Content-Transfer-Encoding: 8bit
.
//set all necessary headers
$headers = "From: $sender_name<$from>\n";
$headers .= "Reply-To: $sender_name<$from>\n";
$headers .= "X-Sender: $sender_name<$from>\n";
$headers .= "X-Mailer: PHP4\n"; //mailer
$headers .= "X-Priority: 3\n"; //1 UrgentMessage, 3 Normal
$headers .= "MIME-Version: 1.0\n";
$headers .= "X-MSMail-Priority: High\n";
$headers .= "Importance: 3\n";
$headers .= "Date: $date\n";
$headers .= "Delivered-to: $to\n";
$headers .= "Return-Path: $sender_name<$from>\n";
$headers .= "Envelope-from: $sender_name<$from>\n";
$headers .= "Content-Transfer-Encoding: 8bit\n";
$headers .= "Content-Type: text/plain; charset=UTF-8\n";
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
更新 有关更实用和最新的答案,请查看帕莱克的回答。
Content-Type中指定的字符编码仅描述消息正文的字符编码,而不描述消息头的字符编码。您需要使用encoded-word语法 使用 quoted-printable 编码 或 Base64 编码:
您可以使用
imap_8bit
用于 quoted-printable 编码和base64_encode
用于 Base64 编码:Update For a more practical and up-to-date answer, have a look at Palec’s answer.
The specified character encoding in Content-Type does only describe the character encoding of the message body but not the header. You need to use the encoded-word syntax with either the quoted-printable encoding or the Base64 encoding:
You can use
imap_8bit
for the quoted-printable encoding andbase64_encode
for the Base64 encoding:TL;DR
或
问题和解决方案
Content-Type
和Content-Transfer-Encoding
标头仅适用于邮件正文。对于标头,有一种机制可以指定 RFC 2047 中指定的编码。您应该通过
iconv_mime_encode 对您的
,从 PHP 5 开始存在:Subject
进行编码()更改
input-charset
以匹配字符串$subject
的编码。您应该将output-charset
保留为UTF-8
。在 PHP 5.4 之前,使用array()
而不是[]
。现在
$encoded_subject
是(不带尾随换行符)$subject
包含:它是如何工作的?
iconv_mime_encode()
函数分割文本,将每个部分分别编码为
令牌和 折叠它们之间的空白。编码字为=????=
其中:
为B< /code> (对于 Base 64 – 请参阅
base64_encode()
) 或Q
(对于 Quoted-printable – 请参阅quoted_printable_encode()
),<编码文本>< /code>是用
编码的字符串,解码后有字符集
。您可以将
=?CP1250?B?QWhvaiwgc3bsdGU=?=
解码为 UTF-8 字符串Ahoj, světe
(捷克语中的Hello, world
):iconv("CP1250", "UTF-8", base64_decode("QWhvaiwgc3bsdGU="))
或直接通过iconv_mime_decode("=?CP1250?B?QWhvaiwgc3bsdGU=?=", 0, “UTF-8”)
。编码为编码字更加复杂,因为规范要求每个编码字标记的长度最多为 75 个字节,并且包含任何编码字标记的每一行的长度必须最多为 76 个字节(包括连续行开头的空白) )。 不要自己实现编码。您真正需要知道的是
iconv_mime_encode()
尊重规范。有趣的相关阅读是维基百科文章 Unicode 和电子邮件。
替代方案
一个基本的选择是仅使用有限的字符集。 ASCII 保证可以工作。正如 user2250504 建议的那样,ISO Latin 1 (ISO-8859-1) 也可能有效,因为它经常被用作未指定编码时的回退。但这些字符集非常小,您可能无法对所需的所有字符进行编码。此外,RFC 没有提及 Latin 1 是否应该工作。
您还可以使用
mb_encode_mimeheader()
,正如 Paul Norman 回答的,但很容易错误使用它。您必须使用
mb_internal_encoding()
< /a> 设置 mbstring 函数内部使用的编码。mb_*
函数期望输入字符串采用此编码。注意:mb_encode_mimeheader()
的第二个参数与输入字符串无关(尽管手册上有说明)。它对应于编码单词中的
(请参阅上面的它是如何工作的?)。输入字符串在传递给 B 或 Q 编码之前,会从内部编码重新编码为该编码。自 PHP 5.6 起,可能不需要设置内部编码,因为底层
mbstring.internal_encoding
配置选项已被弃用,取而代之的是default_charset
选项,此后默认设置为 UTF-8。请注意,这只是默认值,在代码中依赖默认值可能不合适。您必须在输入字符串中包含标头名称和冒号。 RFC 对行长度施加了严格的限制,并且它也必须适用于第一行!另一种方法是修改第五个参数(
$indent
;最后一个参数截至 2015 年 9 月),但这更不方便。实施可能存在错误。即使使用正确,您也可能会得到损坏的输出。至少手册页上的许多评论都是这么说的。我没有找到任何问题,但我知道编码单词的实现很棘手。 如果您发现
mb_encode_mimeheader()
或iconv_mime_encode()
中潜在或实际的错误,请在评论中告诉我。使用 mb_encode_mimeheader() 的至少一个好处是:它并不总是对所有标头内容进行编码,这样可以节省空间并使文本易于阅读。仅非 ASCII 部分需要编码。与上面的
iconv_mime_encode()
示例类似的输出是:mb_encode_mimeheader()
的使用示例:这是本文顶部 TL;DR 中的代码片段的替代方案。它不是仅仅为
Subject:
保留空间,而是实际上将其放在那里,然后将其删除,以便能够与mail()
的愚蠢接口一起使用它。如果您比 iconv 更喜欢 mbstring 函数,您可能需要使用 <代码>mb_send_mail()。它在内部使用
mail()
,但对主题进行编码和消息正文自动。再次强调,谨慎使用。除主题之外的标头需要不同的处理
请注意,您不能假设对所有可能包含非 ASCII 字符的标头进行标头的全部内容编码都是可以的。例如,From、To、Cc、Bcc 和 Reply-To 可能包含它们所包含的地址的名称,但只有名称可以被编码,而不是地址。原因是
标记可能仅替换
、
和。 word>
标记,并且仅在某些情况下(请参阅 RFC 的 §5 2047)。其他标头中的非 ASCII 文本的编码是一个相关但不同的问题。 如果您想了解有关此主题的更多信息,请搜索。如果您找不到答案,请提出另一个问题并在评论中指出我。
TL;DR
or
Problem and solution
The
Content-Type
andContent-Transfer-Encoding
headers apply only to the body of your message. For headers, there is a mechanism for specifying their encoding specified in RFC 2047.You should encode your
Subject
viaiconv_mime_encode()
, which exists as of PHP 5:Change
input-charset
to match the encoding of your string$subject
. You should leaveoutput-charset
asUTF-8
. Before PHP 5.4, usearray()
instead of[]
.Now
$encoded_subject
is (without trailing newline)for
$subject
containing:How does it work?
The
iconv_mime_encode()
function splits the text, encodes each piece separately into an<encoded-word>
token and folds the whitespace between them. Encoded word is=?<charset>?<encoding>?<encoded-text>?=
where:<encoding>
is eitherB
(for Base 64 – seebase64_encode()
) orQ
(for Quoted-printable – seequoted_printable_encode()
),<encoded-text>
is string encoded with<encoding>
, which has charset<charset>
after being decoded.You can decode
=?CP1250?B?QWhvaiwgc3bsdGU=?=
into UTF-8 stringAhoj, světe
(Hello, world
in Czech) viaiconv("CP1250", "UTF-8", base64_decode("QWhvaiwgc3bsdGU="))
or directly viaiconv_mime_decode("=?CP1250?B?QWhvaiwgc3bsdGU=?=", 0, "UTF-8")
.Encoding into encoded words is more complicated, because the spec requires each encoded-word token to be at most 75 bytes long and each line containing any encoded-word token must be at most 76 bytes long (including blank at the start of a continuation line). Don’t implement the encoding yourself. All you really need to know is that
iconv_mime_encode()
respects the spec.Interesting related reading is the Wikipedia article Unicode and email.
Alternatives
A rudimentary option is to use only a restricted set of characters. ASCII is guaranteed to work. ISO Latin 1 (ISO-8859-1), as user2250504 suggested, will probably work too, because it is often used as fallback when no encoding is specified. But those character sets are very small and you’ll probably be unable to encode all the characters you’ll want. Moreover, the RFCs say nothing about whether Latin 1 should work or not.
You can also use
mb_encode_mimeheader()
, as Paul Norman answered, but it’s easy to use it incorrectly.You must use
mb_internal_encoding()
to set the mbstring functions’ internally used encoding. Themb_*
functions expect input strings to be in this encoding. Beware: The second parameter ofmb_encode_mimeheader()
has nothing to do with the input string (despite what the manual says). It corresponds to the<charset>
in the encoded word (see How does it work? above). The input string is recoded from the internal encoding to this one before being passed to the B or Q encoding.Setting internal encoding might not be needed since PHP 5.6, because the underlying
mbstring.internal_encoding
configuration option had been deprecated in favor of thedefault_charset
option, which has been set to UTF-8 by default, since. Note that this is just a default and it may be inappropriate to rely on defaults in your code.You must include the header name and colon in the input string. The RFC imposes a strong limit on line length and it must hold for the first line, too! An alternative is to fiddle with the fifth parameter (
$indent
; last one as of September 2015), but this is even less convenient.The implementation might have bugs. Even if used correctly, you might get broken output. At least this is what many comments on the manual page say. I have not managed to find any problem, but I know implementation of encoded words is tricky. If you find potential or actual bugs in
mb_encode_mimeheader()
oriconv_mime_encode()
, please, let me know in the comments.There is also at least one upside to using
mb_encode_mimeheader()
: it does not always encode all the header contents, which saves space and leaves the text human-readable. The encoding is required only for the non-ASCII parts. The output analogous to theiconv_mime_encode()
example above is:Usage example of
mb_encode_mimeheader()
:This is an alternative to the snippet in TL;DR on top of this post. Instead of just reserving the space for
Subject:
, it actually puts it there and then removes it in order to be able to use it with themail()
’s stupid interface.If you like mbstring functions better than the iconv ones, you might want to use
mb_send_mail()
. It usesmail()
internally, but encodes subject and body of the message automatically. Again, use with care.Headers other than Subject need different treatment
Note that you must not assume that encoding the whole contents of a header is OK for all headers that may contain non-ASCII characters. E.g. From, To, Cc, Bcc and Reply-To may contain names for the addresses they contain, but only the names may be encoded, not the addresses. The reason is that
<encoded-word>
token may replace just<text>
,<ctext>
and<word>
tokens, and only under certain circumstances (see §5 of RFC 2047).Encoding of non-ASCII text in other headers is a related but different question. If you wish to know more about this topic, search. If you find no answer, ask another question and point me to it in the comments.
mb_encode_mimeheader() 对于 UTF-8 字符串在这里可能很有用,例如
mb_encode_mimeheader() for UTF-8 strings can be useful here, e.g.