为什么不允许 UTF-8 作为“ANSI”编码?代码页?

发布于 2024-09-04 06:23:52 字数 447 浏览 13 评论 0 原文

Windows _setmbcp 函数允许任何有效的代码页...

(不支持 UTF-7 和 UTF-8 除外)

好吧,不支持 UTF-7 是有道理的:字符具有非唯一的表示形式,这会带来复杂性和安全风险。

但为什么不是UTF-8呢?

据我了解,Windows API 函数的“ANSI”版本将其参数转换为 UTF-16,调用等效的“W”函数,并将输出中的任何字符串转换为“ANSI”。这就是我一直手动做的事情。那么为什么 Windows 不能为我做这件事呢?

The Windows _setmbcp function allows any valid code page...

(except UTF-7 and UTF-8, which are not supported)

OK, not supporting UTF-7 makes sense: Characters have non-unique representations and that introduces complexity and security risks.

But why not UTF-8?

As I understand it, the "ANSI" versions of the Windows API functions convert their arguments to UTF-16, call the equivalent "W" function, and convert any strings in the output to "ANSI". This is what I've been doing manually. So why can't Windows do it for me?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

不打扰别人 2024-09-11 06:23:52

“ANSI”代码页基本上是遗留的:Windows 9X 时代。无论如何,所有现代软件都应该基于 Unicode(即 UTF-16)。

基本上,当 Ansi 代码页最初设计时,甚至还没有发明 UTF-8,因此对多字节编码的支持相当随意(即大多数 Ansi 代码页都是单字节,除了一些东亚代码页)它们是一个或两个字节)。当所有新开发都应该以 UTF-16 进行时,添加对“正确”多字节编码的支持可能被认为不值得。

The "ANSI" codepage is basically legacy: Windows 9X era. All modern software should be Unicode (that is, UTF-16) based anyway.

Basically, when the Ansi code page stuff was originally designed, UTF-8 wasn't even invented and so support for multi-byte encodings was rather haphazard (i.e. most Ansi code pages are single byte, with the exception of some East Asian code pages which are one-or-two byte). Adding support for "proper" multi-byte encodings was probably deemed not worth the effort when all new development should be done in UTF-16 anyway.

抽个烟儿 2024-09-11 06:23:52

_setmbcp() 是 VC++ RTL 函数,而不是 Win32 API 函数。它仅影响 RTL 解释字符串的方式。它对 Win32 API A 函数没有任何影响。当它们在内部调用对应的 W 函数时,A 函数始终使用 MultiByteToWideChar()WideCharToMultiByte() 指定代码页0 (CP_ACP) 使用系统默认的 Ansi 代码页进行转换。

_setmbcp() is a VC++ RTL function, not a Win32 API function. It only affects how the RTL interprets strings. It has no effect whatsoever on Win32 API A functions. When they call their W counterparts internally, the A functions always use MultiByteToWideChar() and WideCharToMultiByte() specifying codepage 0 (CP_ACP) to use the system default Ansi codepage for the conversions.

无妨# 2024-09-11 06:23:52

微软的国际化专家 Michael Kaplan 试图在他的博客上回答这个问题

基本上他的解释是,尽管 Windows API 函数的“ANSI”版本旨在处理不同的代码页,但历史上存在一种隐含的期望,即字符编码每个代码点最多需要两个字节。 UTF-8 无法满足这一期望,现在更改所有这些功能将需要大量的测试。

Michael Kaplan, an internationalization expert from Microsoft, tried to answer this on his blog.

Basically his explanation is that even though the "ANSI" versions of Windows API functions are meant to handle different code pages, historically there was an implicit expectation that character encodings would require at most two bytes per code point. UTF-8 doesn't meet that expectation, and changing all of those functions now would require a massive amount of testing.

旧街凉风 2024-09-11 06:23:52

原因与jamesdlin的答案及其下面的评论中所说的完全一样:MBCS 与 Windows 中的 DBCS 相同,并且某些功能无法使用长度超过 2 个字节的字符

微软表示,UTF-8 语言环境可能会破坏某些函数,因为它们的编写假设多字节编码每个字符使用不超过 2 个字节,因此代码页具有更多字节,例如 UTF-8 (以及 GB 18030、cp54936)无法设置为区域设置。

https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8< /a>


因此,在读/写等功能中允许使用 UTF-8,但在用作语言环境时则不允许使用。


但是 Microsoft 终于解决了这个问题,所以现在我们可以 使用 UTF-8 作为语言环境。事实上,MS 甚至再次开始推荐 ANSI API (-A),而不是像以前那样推荐 Unicode (-W) 版本。 MSVC 中有一些新选项: /execution-charset:utf-8/utf-8 设置字符集,或者您也可以在 UWP 应用程序的 appxmanifest 中设置 ActiveCodePage 属性

自 Windows 10 内部版本 17035 起,在引入这些选项之前,“Beta:使用 Unicode UTF-8 提供全球语言支持” 还添加了复选框,用于将区域设置代码页设置为 UTF-8

Beta: 使用 Unicode UTF-8 实现全球语言支持

要打开该对话框,请打开开始菜单,键入“region”并选择 区域设置>附加日期、时间和区域设置>更改日期、时间或数字格式 >管理

启用后,您可以调用 setlocale() 更改为 UTF-8 语言环境:

从 Windows 10 内部版本 17134(2018 年 4 月更新)开始,通用 C 运行时支持使用 UTF-8 代码页。这意味着传递给 C 运行时函数的 char 字符串将需要 UTF-8 编码的字符串。要启用 UTF-8 模式,请在使用 setlocale 时使用“UTF-8”作为代码页。例如,setlocale(LC_ALL, ".utf8") 将使用当前默认的 Windows ANSI 代码页 (ACP) 作为区域设置,并使用 UTF-8 作为代码页。

UTF-8 支持

您也可以在较旧的 Windows 版本中使用此功能

要在 Windows 10 之前的操作系统(例如 Windows 7)上使用此功能,您必须使用 应用本地部署或使用 Windows SDK 版本 17134 或更高版本进行静态链接。对于 17134 之前的 Windows 10 操作系统,仅支持静态链接。

另请参阅

The reason is exactly like what was said in jamesdlin's answers and the comments below it: MBCS is the same as DBCS in Windows and some functions don't work with characters that are longer than 2 bytes

Microsoft said that a UTF-8 locale might break some functions as they were written to assume multibyte encodings used no more than 2 bytes per character, thus code pages with more bytes such as UTF-8 (and also GB 18030, cp54936) could not be set as the locale.

https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8

So UTF-8 was allowed in functions like read/write but not when using as a locale


However Microsoft has finally fixed that so now we can use UTF-8 as a locale. In fact MS even started recommending the ANSI APIs (-A) again instead of the Unicode (-W) versions like before. There are some new options in MSVC: /execution-charset:utf-8 and /utf-8 to set the charset, or you can also set the ActiveCodePage property in appxmanifest of the UWP app

Since Windows 10 insider build 17035, before those options were introduced, a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox had also been added for setting the locale code page to UTF-8

Beta: Use Unicode UTF-8 for worldwide language support

To open that dialog box open start menu, type "region" and select Region settings > Additional date, time & regional settings > Change date, time, or number formats > Administrative

After enabling it you can call setlocale() to change to UTF-8 locale:

Starting in Windows 10 build 17134 (April 2018 Update), the Universal C Runtime supports using a UTF-8 code page. This means that char strings passed to C runtime functions will expect strings in the UTF-8 encoding. To enable UTF-8 mode, use "UTF-8" as the code page when using setlocale. For example, setlocale(LC_ALL, ".utf8") will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.

UTF-8 Support

You can also use this in older Windows versions

To use this feature on an OS prior to Windows 10, such as Windows 7, you must use app-local deployment or link statically using version 17134 of the Windows SDK or later. For Windows 10 operating systems prior to 17134, only static linking is supported.

See also

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文