来自 PHP 的电子邮件已破坏主题标头编码

发布于 2024-10-06 15:35:21 字数 1015 浏览 8 评论 0原文

我的 PHP 脚本向用户发送电子邮件,当电子邮件到达他们的邮箱时,主题行 ($subject) 的主题末尾添加了诸如 a^£ 之类的字符文本。这显然是和编码问题有关。电子邮件内容本身很好,只是主题行被破坏了。

我已经搜索遍了,但找不到如何正确编码我的主题

这是我的标题。请注意,我将 Content-Typecharset=utf-8Content-Transfer-Encoding: 8bit 一起使用。

//set all necessary headers
$headers = "From: $sender_name<$from>\n";
$headers .= "Reply-To: $sender_name<$from>\n";
$headers .= "X-Sender: $sender_name<$from>\n";
$headers .= "X-Mailer: PHP4\n"; //mailer
$headers .= "X-Priority: 3\n"; //1 UrgentMessage, 3 Normal
$headers .= "MIME-Version: 1.0\n";
$headers .= "X-MSMail-Priority: High\n";
$headers .= "Importance: 3\n";
$headers .= "Date: $date\n";
$headers .= "Delivered-to: $to\n";
$headers .= "Return-Path: $sender_name<$from>\n";
$headers .= "Envelope-from: $sender_name<$from>\n";
$headers .= "Content-Transfer-Encoding: 8bit\n";
$headers .= "Content-Type: text/plain; charset=UTF-8\n";

My PHP script sends email to users and when the email arrives to their mailboxes, the subject line ($subject) has characters like a^£ added to the end of my subject text. This is obviously and encoding problem. The email message content itself is fine, just the subject line is broken.

I have searched all over but can’t find how to encode my subject properly.

This is my header. Notice that I’m using Content-Type with charset=utf-8 and Content-Transfer-Encoding: 8bit.

//set all necessary headers
$headers = "From: $sender_name<$from>\n";
$headers .= "Reply-To: $sender_name<$from>\n";
$headers .= "X-Sender: $sender_name<$from>\n";
$headers .= "X-Mailer: PHP4\n"; //mailer
$headers .= "X-Priority: 3\n"; //1 UrgentMessage, 3 Normal
$headers .= "MIME-Version: 1.0\n";
$headers .= "X-MSMail-Priority: High\n";
$headers .= "Importance: 3\n";
$headers .= "Date: $date\n";
$headers .= "Delivered-to: $to\n";
$headers .= "Return-Path: $sender_name<$from>\n";
$headers .= "Envelope-from: $sender_name<$from>\n";
$headers .= "Content-Transfer-Encoding: 8bit\n";
$headers .= "Content-Type: text/plain; charset=UTF-8\n";

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

因为看清所以看轻 2024-10-13 15:35:21

更新   有关更实用和最新的答案,请查看帕莱克的回答


Content-Type中指定的字符编码仅描述消息正文的字符编码,而不描述消息头的字符编码。您需要使用encoded-word语法 使用 quoted-printable 编码 或 Base64 编码

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

您可以使用 imap_8bit 用于 quoted-printable 编码和 base64_encode 用于 Base64 编码:

"Subject: =?UTF-8?B?".base64_encode($subject)."?="
"Subject: =?UTF-8?Q?".imap_8bit($subject)."?="

Update   For a more practical and up-to-date answer, have a look at Palec’s answer.


The specified character encoding in Content-Type does only describe the character encoding of the message body but not the header. You need to use the encoded-word syntax with either the quoted-printable encoding or the Base64 encoding:

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

You can use imap_8bit for the quoted-printable encoding and base64_encode for the Base64 encoding:

"Subject: =?UTF-8?B?".base64_encode($subject)."?="
"Subject: =?UTF-8?Q?".imap_8bit($subject)."?="
Bonjour°[大白 2024-10-13 15:35:21

TL;DR

$preferences = ['input-charset' => 'UTF-8', 'output-charset' => 'UTF-8'];
$encoded_subject = iconv_mime_encode('Subject', $subject, $preferences);
$encoded_subject = substr($encoded_subject, strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);

mb_internal_encoding('UTF-8');
$encoded_subject = mb_encode_mimeheader($subject, 'UTF-8', 'B', "\r\n", strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);

问题和解决方案

Content-TypeContent-Transfer-Encoding 标头仅适用于邮件正文。对于标头,有一种机制可以指定 RFC 2047 中指定的编码。

您应该通过 iconv_mime_encode 对您的 Subject 进行编码(),从 PHP 5 开始存在:

$preferences = ["input-charset" => "UTF-8", "output-charset" => "UTF-8"];
$encoded_subject = iconv_mime_encode("Subject", $subject, $preferences);

更改 input-charset 以匹配字符串 $subject 的编码。您应该将 output-charset 保留为 UTF-8。在 PHP 5.4 之前,使用 array() 而不是 []

现在 $encoded_subject 是(不带尾随换行符)

Subject: =?UTF-8?B?VmVyeSBsb25nIHRleHQgY29udGFpbmluZyBzcGVjaWFsIGM=?=
 =?UTF-8?B?aGFyYWN0ZXJzIGxpa2UgxJvFocSNxZnFvsO9w6HDrcOpPD4/PSsqIHA=?=
 =?UTF-8?B?cm9kdWNlcyBzZXZlcmFsIGVuY29kZWQtd29yZHMsIHNwYW5uaW5nIG0=?=
 =?UTF-8?B?dWx0aXBsZSBsaW5lcw==?=

$subject 包含:

Very long text containing special characters like ěščřžýáíé<>?=+* produces several encoded-words, spanning multiple lines

它是如何工作的?

iconv_mime_encode() 函数分割文本,将每个部分分别编码为 令牌和 折叠它们之间的空白。编码字为 =????= 其中:

您可以将 =?CP1250?B?QWhvaiwgc3bsdGU=?= 解码为 UTF-8 字符串 Ahoj, světe(捷克语中的 Hello, world): iconv("CP1250", "UTF-8", base64_decode("QWhvaiwgc3bsdGU=")) 或直接通过 iconv_mime_decode("=?CP1250?B?QWhvaiwgc3bsdGU=?=", 0, “UTF-8”)

编码为编码字更加复杂,因为规范要求每个编码字标记的长度最多为 75 个字节,并且包含任何编码字标记的每一行的长度必须最多为 76 个字节(包括连续行开头的空白) )。 不要自己实现编码。您真正需要知道的是 iconv_mime_encode() 尊重规范。

有趣的相关阅读是维基百科文章 Unicode 和电子邮件

替代方案

一个基本的选择是仅使用有限的字符集。 ASCII 保证可以工作。正如 user2250504 建议的那样,ISO Latin 1 (ISO-8859-1) 也可能有效,因为它经常被用作未指定编码时的回退。但这些字符集非常小,您可能无法对所需的所有字符进行编码。此外,RFC 没有提及 Latin 1 是否应该工作。

您还可以使用 mb_encode_mimeheader(),正如 Paul Norman 回答的,但很容易错误使用它。

  1. 您必须使用mb_internal_encoding()< /a> 设置 mbstring 函数内部使用的编码。 mb_* 函数期望输入字符串采用此编码。注意:mb_encode_mimeheader() 的第二个参数与输入字符串无关(尽管手册上有说明)。它对应于编码单词中的 (请参阅上面的它是如何工作的?)。输入字符串在传递给 B 或 Q 编码之前,会从内部编码重新编码为该编码。


    自 PHP 5.6 起,可能不需要设置内部编码,因为底层 mbstring.internal_encoding 配置选项已被弃用,取而代之的是 default_charset 选项,此后默认设置为 UTF-8。请注意,这只是默认值,在代码中依赖默认值可能不合适。

  2. 您必须在输入字符串中包含标头名称和冒号。 RFC 对行长度施加了严格的限制,并且它也必须适用于第一行!另一种方法是修改第五个参数($indent;最后一个参数截至 2015 年 9 月),但这更不方便。

  3. 实施可能存在错误。即使使用正确,您也可能会得到损坏的输出。至少手册页上的许多评论都是这么说的。我没有找到任何问题,但我知道编码单词的实现很棘手。 如果您发现 mb_encode_mimeheader()iconv_mime_encode() 中潜在或实际的错误,请在评论中告诉我。

使用 mb_encode_mimeheader() 的至少一个好处是:它并不总是对所有标头内容进行编码,这样可以节省空间并使文本易于阅读。仅非 ASCII 部分需要编码。与上面的 iconv_mime_encode() 示例类似的输出是:

Subject: Very long text containing special characters like
 =?UTF-8?B?xJvFocSNxZnFvsO9w6HDrcOpPD4/PSsqIHByb2R1Y2VzIHNldmVyYWwgZW5j?=
 =?UTF-8?B?b2RlZC13b3Jkcywgc3Bhbm5pbmcgbXVsdGlwbGUgbGluZXM=?=

mb_encode_mimeheader() 的使用示例:

mb_internal_encoding('UTF-8');
$encoded_subject = mb_encode_mimeheader("Subject: $subject", 'UTF-8');
$encoded_subject = substr($encoded_subject, strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);

这是本文顶部 TL;DR 中的代码片段的替代方案。它不是仅仅为 Subject: 保留空间,而是实际上将其放在那里,然后将其删除,以便能够与 mail() 的愚蠢接口一起使用它。

如果您比 iconv 更喜欢 mbstring 函数,您可能需要使用 <代码>mb_send_mail()。它在内部使用 mail() ,但对主题进行编码和消息正文自动。再次强调,谨慎使用

除主题之外的标头需要不同的处理

请注意,您不能假设对所有可能包含非 ASCII 字符的标头进行标头的全部内容编码都是可以的。例如,From、To、Cc、Bcc 和 Reply-To 可能包含它们所包含的地址的名称,但只有名称可以被编码,而不是地址。原因是 标记可能仅替换 。 word> 标记,并且仅在某些情况下(请参阅 RFC 的 §5 2047)。

其他标头中的非 ASCII 文本的编码是一个相关但不同的问题。 如果您想了解有关此主题的更多信息,请搜索。如果您找不到答案,请提出另一个问题并在评论中指出我。

TL;DR

$preferences = ['input-charset' => 'UTF-8', 'output-charset' => 'UTF-8'];
$encoded_subject = iconv_mime_encode('Subject', $subject, $preferences);
$encoded_subject = substr($encoded_subject, strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);

or

mb_internal_encoding('UTF-8');
$encoded_subject = mb_encode_mimeheader($subject, 'UTF-8', 'B', "\r\n", strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);

Problem and solution

The Content-Type and Content-Transfer-Encoding headers apply only to the body of your message. For headers, there is a mechanism for specifying their encoding specified in RFC 2047.

You should encode your Subject via iconv_mime_encode(), which exists as of PHP 5:

$preferences = ["input-charset" => "UTF-8", "output-charset" => "UTF-8"];
$encoded_subject = iconv_mime_encode("Subject", $subject, $preferences);

Change input-charset to match the encoding of your string $subject. You should leave output-charset as UTF-8. Before PHP 5.4, use array() instead of [].

Now $encoded_subject is (without trailing newline)

Subject: =?UTF-8?B?VmVyeSBsb25nIHRleHQgY29udGFpbmluZyBzcGVjaWFsIGM=?=
 =?UTF-8?B?aGFyYWN0ZXJzIGxpa2UgxJvFocSNxZnFvsO9w6HDrcOpPD4/PSsqIHA=?=
 =?UTF-8?B?cm9kdWNlcyBzZXZlcmFsIGVuY29kZWQtd29yZHMsIHNwYW5uaW5nIG0=?=
 =?UTF-8?B?dWx0aXBsZSBsaW5lcw==?=

for $subject containing:

Very long text containing special characters like ěščřžýáíé<>?=+* produces several encoded-words, spanning multiple lines

How does it work?

The iconv_mime_encode() function splits the text, encodes each piece separately into an <encoded-word> token and folds the whitespace between them. Encoded word is =?<charset>?<encoding>?<encoded-text>?= where:

You can decode =?CP1250?B?QWhvaiwgc3bsdGU=?= into UTF-8 string Ahoj, světe (Hello, world in Czech) via iconv("CP1250", "UTF-8", base64_decode("QWhvaiwgc3bsdGU=")) or directly via iconv_mime_decode("=?CP1250?B?QWhvaiwgc3bsdGU=?=", 0, "UTF-8").

Encoding into encoded words is more complicated, because the spec requires each encoded-word token to be at most 75 bytes long and each line containing any encoded-word token must be at most 76 bytes long (including blank at the start of a continuation line). Don’t implement the encoding yourself. All you really need to know is that iconv_mime_encode() respects the spec.

Interesting related reading is the Wikipedia article Unicode and email.

Alternatives

A rudimentary option is to use only a restricted set of characters. ASCII is guaranteed to work. ISO Latin 1 (ISO-8859-1), as user2250504 suggested, will probably work too, because it is often used as fallback when no encoding is specified. But those character sets are very small and you’ll probably be unable to encode all the characters you’ll want. Moreover, the RFCs say nothing about whether Latin 1 should work or not.

You can also use mb_encode_mimeheader(), as Paul Norman answered, but it’s easy to use it incorrectly.

  1. You must use mb_internal_encoding() to set the mbstring functions’ internally used encoding. The mb_* functions expect input strings to be in this encoding. Beware: The second parameter of mb_encode_mimeheader() has nothing to do with the input string (despite what the manual says). It corresponds to the <charset> in the encoded word (see How does it work? above). The input string is recoded from the internal encoding to this one before being passed to the B or Q encoding.

    Setting internal encoding might not be needed since PHP 5.6, because the underlying mbstring.internal_encoding configuration option had been deprecated in favor of the default_charset option, which has been set to UTF-8 by default, since. Note that this is just a default and it may be inappropriate to rely on defaults in your code.

  2. You must include the header name and colon in the input string. The RFC imposes a strong limit on line length and it must hold for the first line, too! An alternative is to fiddle with the fifth parameter ($indent; last one as of September 2015), but this is even less convenient.

  3. The implementation might have bugs. Even if used correctly, you might get broken output. At least this is what many comments on the manual page say. I have not managed to find any problem, but I know implementation of encoded words is tricky. If you find potential or actual bugs in mb_encode_mimeheader() or iconv_mime_encode(), please, let me know in the comments.

There is also at least one upside to using mb_encode_mimeheader(): it does not always encode all the header contents, which saves space and leaves the text human-readable. The encoding is required only for the non-ASCII parts. The output analogous to the iconv_mime_encode() example above is:

Subject: Very long text containing special characters like
 =?UTF-8?B?xJvFocSNxZnFvsO9w6HDrcOpPD4/PSsqIHByb2R1Y2VzIHNldmVyYWwgZW5j?=
 =?UTF-8?B?b2RlZC13b3Jkcywgc3Bhbm5pbmcgbXVsdGlwbGUgbGluZXM=?=

Usage example of mb_encode_mimeheader():

mb_internal_encoding('UTF-8');
$encoded_subject = mb_encode_mimeheader("Subject: $subject", 'UTF-8');
$encoded_subject = substr($encoded_subject, strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);

This is an alternative to the snippet in TL;DR on top of this post. Instead of just reserving the space for Subject:, it actually puts it there and then removes it in order to be able to use it with the mail()’s stupid interface.

If you like mbstring functions better than the iconv ones, you might want to use mb_send_mail(). It uses mail() internally, but encodes subject and body of the message automatically. Again, use with care.

Headers other than Subject need different treatment

Note that you must not assume that encoding the whole contents of a header is OK for all headers that may contain non-ASCII characters. E.g. From, To, Cc, Bcc and Reply-To may contain names for the addresses they contain, but only the names may be encoded, not the addresses. The reason is that <encoded-word> token may replace just <text>, <ctext> and <word> tokens, and only under certain circumstances (see §5 of RFC 2047).

Encoding of non-ASCII text in other headers is a related but different question. If you wish to know more about this topic, search. If you find no answer, ask another question and point me to it in the comments.

来日方长 2024-10-13 15:35:21

mb_encode_mimeheader() 对于 UTF-8 字符串在这里可能很有用,例如

$subject = mb_encode_mimeheader($subjectText,"UTF-8");

mb_encode_mimeheader() for UTF-8 strings can be useful here, e.g.

$subject = mb_encode_mimeheader($subjectText,"UTF-8");
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文