MimeMessage 中的智能引号未在 Outlook 中正确显示

发布于 2024-07-20 13:20:51 字数 1166 浏览 9 评论 0原文

我们的应用程序从网络表单中获取文本并通过电子邮件将其发送给适当的用户。 然而,当有人从 Word 中复制/粘贴臭名昭著的“智能引号”或其他特殊字符时,事情就会变得很棘手。

用户输入

他对我说“你好”——这不是很好吗?

但是,当该邮件出现在 Outlook 2003 中时,结果如下:

他向我打招呼不是很好吗?

其代码是:

Session session = Session.getInstance(props, new MailAuthenticator());
Message msg = new MimeMessage(session);

//removed setting to/from addresses to simplify

msg.setSubject(subject);
msg.setText(text);
msg.setHeader("X-Mailer", MailSender.class.getName());
msg.setSentDate(new Date());
Transport.send(msg);

经过一番研究,我认为这可能是字符编码问题,并尝试将内容移至 UTF-8。 所以,我这样更新了代码:

Session session = Session.getInstance(props, new MailAuthenticator());
MimeMessage msg = new MimeMessage(session);

//removed setting to/from addresses to simplify

msg.setHeader("X-Mailer", MailSender.class.getName());
msg.addHeader("Content-Type", "text/plain");
msg.addHeader("charset", "UTF-8");
msg.setSentDate(new Date());
Transport.send(msg);

这让我更接近,但没有雪茄:

他对我说“你好”——这不是很好吗?

我无法想象这是一个不常见的问题——我错过了什么?

Our application takes text from a web form and sends it via email to an appropriate user. However, when someone copy/pastes in the infamous "smart quotes" or other special characters from Word, things get hairy.

The user types in

he said “hello” to me—isn’t that nice?

But when the message appears in Outlook 2003, it comes out like this:

he said hello to meisnt that nice?

The code for this was:

Session session = Session.getInstance(props, new MailAuthenticator());
Message msg = new MimeMessage(session);

//removed setting to/from addresses to simplify

msg.setSubject(subject);
msg.setText(text);
msg.setHeader("X-Mailer", MailSender.class.getName());
msg.setSentDate(new Date());
Transport.send(msg);

After a little research, I figured this was probably a character encoding issue and attempted to move things to UTF-8. So, I updated the code thusly:

Session session = Session.getInstance(props, new MailAuthenticator());
MimeMessage msg = new MimeMessage(session);

//removed setting to/from addresses to simplify

msg.setHeader("X-Mailer", MailSender.class.getName());
msg.addHeader("Content-Type", "text/plain");
msg.addHeader("charset", "UTF-8");
msg.setSentDate(new Date());
Transport.send(msg);

This got me closer, but no cigar:

he said “hello” to me—isn’t that nice?

I can't imagine this is an uncommon problem--what have I missed?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

扭转时空 2024-07-27 13:20:51

您的表单页面是否也使用 UTF-8 或不同的字符集? 如果您不指定网页字符集,则任何人都可以猜测传入脚本的数据格式。


编辑:消息中的字符集应如下设置:

msg.addHeader("Content-Type", "text/plain; charset=UTF-8");

因为字符集不是单独的标头,而是 Content-type 的选项

Is the page with your form also using UTF-8, or a different charset? If you don't specify the webpage charset, the format of data coming to your script is anyone's guess.


Edit: the charset in the message should be set like this:

msg.addHeader("Content-Type", "text/plain; charset=UTF-8");

since charset is not a separate header, but an option to Content-type

九厘米的零° 2024-07-27 13:20:51

为什么不把漂亮的引言替换成普通的引言呢?

Why don't you replace the nice quotes with regular prime quotes?

梦亿 2024-07-27 13:20:51

我会检查从浏览器接收的数据是否正确 - 转储 Unicode 代码点并根据 图表检查它们

  public static void printCodepoints(char[] s) {
    for (int i = 0; i < s.length; i++) {
      int codePoint = Character.isHighSurrogate(s[i]) ? Character
          .toCodePoint(s[i], s[++i])
          : s[i];
      System.out.println(Integer.toHexString(codePoint));
    }
  }

例如,符号双左引号()是字符U+201C。

我已经很长时间没有使用邮件 API 了,但是 MimeMessage.html.setText(text, charset) 方法可能值得一看。 setText(String) 表示它使用默认字符集(如果您使用的是 English/Latin-1 Windows,则可能是 windows-1252)。

I would check that the data being received from the browser is correct - dump the Unicode code points and check them against the charts:

  public static void printCodepoints(char[] s) {
    for (int i = 0; i < s.length; i++) {
      int codePoint = Character.isHighSurrogate(s[i]) ? Character
          .toCodePoint(s[i], s[++i])
          : s[i];
      System.out.println(Integer.toHexString(codePoint));
    }
  }

For example, the symbol DOUBLE LEFT QUOTATION MARK () is character U+201C.

It has been a long time since I used the mail API, but the MimeMessage.html.setText(text, charset) method might be worth a look. The documentation on setText(String) says it uses the default character set (probably windows-1252 if you're using English/Latin-1 Windows).

爱殇璃 2024-07-27 13:20:51

IIRC,MS Office 报价发现字符集“iso-8859-1”。

IIRC, MS Office quotes are found characterset "iso-8859-1".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文