当前位置：文江博客话题详情

为什么不允许 UTF-8 作为“ANSI”编码？代码页？

发布于 2024-09-04 06:23:52 字数 447 浏览 17 评论 0 原文

Windows _setmbcp 函数允许任何有效的代码页...

（不支持 UTF-7 和 UTF-8 除外）

好吧，不支持 UTF-7 是有道理的：字符具有非唯一的表示形式，这会带来复杂性和安全风险。

但为什么不是UTF-8呢？

据我了解，Windows API 函数的“ANSI”版本将其参数转换为 UTF-16，调用等效的“W”函数，并将输出中的任何字符串转换为“ANSI”。这就是我一直手动做的事情。那么为什么 Windows 不能为我做这件事呢？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不打扰别人 2024-09-11 06:23:52

“ANSI”代码页基本上是遗留的：Windows 9X 时代。无论如何，所有现代软件都应该基于 Unicode（即 UTF-16）。

基本上，当 Ansi 代码页最初设计时，甚至还没有发明 UTF-8，因此对多字节编码的支持相当随意（即大多数 Ansi 代码页都是单字节，除了一些东亚代码页）它们是一个或两个字节）。当所有新开发都应该以 UTF-16 进行时，添加对“正确”多字节编码的支持可能被认为不值得。

回复收藏 0 原文

抽个烟儿 2024-09-11 06:23:52

_setmbcp() 是 VC++ RTL 函数，而不是 Win32 API 函数。它仅影响 RTL 解释字符串的方式。它对 Win32 API A 函数没有任何影响。当它们在内部调用对应的 W 函数时，A 函数始终使用 MultiByteToWideChar() 和 WideCharToMultiByte() 指定代码页0 (CP_ACP) 使用系统默认的 Ansi 代码页进行转换。

回复收藏 0 原文

无妨# 2024-09-11 06:23:52

微软的国际化专家 Michael Kaplan 试图在他的博客上回答这个问题。

基本上他的解释是，尽管 Windows API 函数的“ANSI”版本旨在处理不同的代码页，但历史上存在一种隐含的期望，即字符编码每个代码点最多需要两个字节。 UTF-8 无法满足这一期望，现在更改所有这些功能将需要大量的测试。

回复收藏 0 原文

旧街凉风 2024-09-11 06:23:52

原因与jamesdlin的答案及其下面的评论中所说的完全一样：MBCS 与 Windows 中的 DBCS 相同，并且某些功能无法使用长度超过 2 个字节的字符

微软表示，UTF-8 语言环境可能会破坏某些函数，因为它们的编写假设多字节编码每个字符使用不超过 2 个字节，因此代码页具有更多字节，例如 UTF-8 （以及 GB 18030、cp54936）无法设置为区域设置。

https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8< /a>

因此，在读/写等功能中允许使用 UTF-8，但在用作语言环境时则不允许使用。

但是 Microsoft 终于解决了这个问题，所以现在我们可以使用 UTF-8 作为语言环境。事实上，MS 甚至再次开始推荐 ANSI API (-A)，而不是像以前那样推荐 Unicode (-W) 版本。 MSVC 中有一些新选项： /execution-charset:utf-8 和 /utf-8 设置字符集，或者您也可以在 UWP 应用程序的 appxmanifest 中设置 ActiveCodePage 属性

自 Windows 10 内部版本 17035 起，在引入这些选项之前，“Beta：使用 Unicode UTF-8 提供全球语言支持” 还添加了复选框，用于将区域设置代码页设置为 UTF-8

要打开该对话框，请打开开始菜单，键入“region”并选择 区域设置>附加日期、时间和区域设置>更改日期、时间或数字格式 >管理

启用后，您可以调用 setlocale() 更改为 UTF-8 语言环境：

从 Windows 10 内部版本 17134（2018 年 4 月更新）开始，通用 C 运行时支持使用 UTF-8 代码页。这意味着传递给 C 运行时函数的 char 字符串将需要 UTF-8 编码的字符串。要启用 UTF-8 模式，请在使用 setlocale 时使用“UTF-8”作为代码页。例如，setlocale(LC_ALL, ".utf8") 将使用当前默认的 Windows ANSI 代码页 (ACP) 作为区域设置，并使用 UTF-8 作为代码页。

UTF-8 支持

您也可以在较旧的 Windows 版本中使用此功能

要在 Windows 10 之前的操作系统（例如 Windows 7）上使用此功能，您必须使用应用本地部署或使用 Windows SDK 版本 17134 或更高版本进行静态链接。对于 17134 之前的 Windows 10 操作系统，仅支持静态链接。

另请参阅

是否可以将 Windows 应用程序的“区域设置”设置为 UTF-8？

The reason is exactly like what was said in jamesdlin's answers and the comments below it: MBCS is the same as DBCS in Windows and some functions don't work with characters that are longer than 2 bytes

Microsoft said that a UTF-8 locale might break some functions as they were written to assume multibyte encodings used no more than 2 bytes per character, thus code pages with more bytes such as UTF-8 (and also GB 18030, cp54936) could not be set as the locale.

https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8

So UTF-8 was allowed in functions like read/write but not when using as a locale

However Microsoft has finally fixed that so now we can use UTF-8 as a locale. In fact MS even started recommending the ANSI APIs (-A) again instead of the Unicode (-W) versions like before. There are some new options in MSVC: /execution-charset:utf-8 and /utf-8 to set the charset, or you can also set the ActiveCodePage property in appxmanifest of the UWP app

Since Windows 10 insider build 17035, before those options were introduced, a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox had also been added for setting the locale code page to UTF-8

To open that dialog box open start menu, type "region" and select Region settings > Additional date, time & regional settings > Change date, time, or number formats > Administrative

After enabling it you can call setlocale() to change to UTF-8 locale:

Starting in Windows 10 build 17134 (April 2018 Update), the Universal C Runtime supports using a UTF-8 code page. This means that char strings passed to C runtime functions will expect strings in the UTF-8 encoding. To enable UTF-8 mode, use "UTF-8" as the code page when using setlocale. For example, setlocale(LC_ALL, ".utf8") will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.

UTF-8 Support

You can also use this in older Windows versions

To use this feature on an OS prior to Windows 10, such as Windows 7, you must use app-local deployment or link statically using version 17134 of the Windows SDK or later. For Windows 10 operating systems prior to 17134, only static linking is supported.

为什么不允许 UTF-8 作为“ANSI”编码？代码页？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

另请参阅

See also

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

为什么不允许 UTF-8 作为“ANSI”编码？代码页？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

另请参阅

See also

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。