将 unicode 字符转换为小写的标准算法是什么?

发布于 2024-09-15 02:08:19 字数 132 浏览 8 评论 0原文

我想知道 unicode.org 提出的将 unicode 字符转换为小写的标准算法。

另外,大多数编程语言都遵循这个提议的标准吗?

I want to know the standard algorithm for converting unicode characters into lowercase as proposed by unicode.org.

Also, do most programming languages follow this proposed standard?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

紧拥背影 2024-09-22 02:08:19

我想知道标准算法
用于将 unicode 字符转换为
按照 unicode.org 的建议采用小写形式。

基本算法只是连接每个单独字符的小写字母(由 UnicodeData 中的倒数第二列定义)。 txt)。还有一些特殊规则来处理多字符映射(Ç → i̇ 带有额外的组合 i 上方的点)、条件映射(词尾有 Σ → ς,否则为 σ)以及语言敏感规则(如土耳其语无点 ı)。

此外,大多数编程语言也是如此
遵循这个提议的标准吗?

Java 确实如此。 Python 实现了基本规则,但没有实现特殊规则。而且 C 根本没有标准化的 Unicode 支持。

I want to know the standard algorithm
for converting unicode characters into
lowercase as proposed by unicode.org.

The basic algorithm is simply to concatenate the lowercase of each individual character (as defined by the penultimate column in UnicodeData.txt). There are also some special rules to handle multiple-character mappings (İ → i̇ with an extra COMBINING DOT ABOVE the i), conditional mappings (Σ → ς at the end of a word, but σ otherwise), and language-sensitive rules (like Turkish dotless ı).

Also, do most programming languages
follow this proposed standard?

Java does. Python implements the basic rules, but not the special rules. And C has no standardized Unicode support at all.

青芜 2024-09-22 02:08:19

.NET 确实支持 unicode,并提供了在大小写之间切换的内置函数。对于其他一些语言来说可能也是如此。

.NET does have unicode support and offers built-in functions to switch between upper and lower case. This is probably true with some other languages, as well.

一紙繁鸢 2024-09-22 02:08:19

编程语言对 unicode 的支持程度各不相同。大多数没有 unicode 字符作为内置类型。通常,它要么在库中处理,要么通过操作系统调用处理。

例如,C++ 没有本机 unicode 字符类型,但在 stl 中具有语言环境支持(它被定义为语言的一部分)。 Ada 确实有一个本机类型 Wide_Character,以及用于操作它的库支持。

Programming languages vary in how well they support unicode. Most do not have unicode characters as a built-in type. Typically it is either handled in a library, or by OS calls.

For instance, C++ doesn't have a native unicode character type, but does have locale support in the stl (which is defined as part of the language). Ada does have a native type Wide_Character, as well as library support for manipulating it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文