Outlook 对纯文本邮件使用什么编码?
我需要解码从 Outlook 保存为纯文本的电子邮件。不幸的是,它们不是普通的 ISO-8859-1,因为它们包含特殊的“智能引号”字符。 Outlook 使用的代码页是否有真实名称(我可以将其传递给 Python 中的 unicode.decode()),还是只是一些我必须手动解码的任意捏造的废话?如果是这样,有人有微软添加的所有“特殊”字符的参考吗?
I need to decode e-mails saved from Outlook as Text Only. Unfortunately they're not in plain ISO-8859-1 since they contain special "smart quote" characters. Does the codepage used by Outlook have a real name (that I can pass to unicode.decode() in Python) or is it just some arbitrary made-up nonsense which I'll have to manually decode? And if so, does anyone have a reference for all the "special" characters Microsoft added?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Outlook 很可能会在您当前的区域设置中保存邮件。我的猜测是 Windows-1252。
吹毛求疵:你所说的“智能引号”实际上就是引号应该看起来的样子。您在帖子中使用的引用被称为“打字机引用”;对于机械打字机来说,按键的数量是一个主要的成本因素和报价,它们看起来彼此非常相似,而且英寸符号被合并成一个按键,美观性就被破坏了。
It's quite likely that Outlook will save messages in your current locale. My guess would be Windows-1252.
Nitpick: What you call “smart quotes” is actually the way quotes are supposed to look. The quotes you've been using in your post are known as “typewriter quotes”; for mechanic typewriters, the number of keys was a major cost factor and quotes, which look very similar to one another, and the inch symbol were coalesced into a single key, aesthetics be damned.
有很多(取决于区域设置)Windows 代码页,所以最坏的情况可能取决于发件人居住的国家/地区。
There are many (locale-dependent) Windows code pages, so maybe worst-case it depends on the country in which the sender resides.