Unicode小写字符?

发布于 2024-07-22 18:57:08 字数 75 浏览 5 评论 0原文

我在某个地方读到,在 Unicode 中,除 AZ 之外还有一些具有小写等效项的字符。 这些可能是什么,为什么其他字符需要大写和小写?

I read up someplace, that there are characters other than A-Z that have a lowercase equivalent, in Unicode. Which could these be, and why would any other character need an upper and lower case?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

勿忘初心 2024-07-29 18:57:08

英语,甚至是那种奇怪的变体,美式英语:-),并不是地球上唯一的语言。 有一些看起来非常奇怪(至少对于那些熟悉拉丁字符的人来说),但即使是拉丁字符也有细微的变化。

其中我非常熟悉的两种语言是希腊语和德语:

Αα Ββ Γγ Δδ Εε Ζζ  Ηη Θθ Ιι Κκ Λλ Μμ
Νν Ξξ Οο Ππ Ρρ Σσς Ττ Υυ Φφ Χχ Ψψ Ωω

Aa Ää Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn
Oo Öö Pp Qq Rr Ss ß  Tt Uu Üü Vv Ww Xx Yy Zz

这就是为什么我们不允许使用这样的代码:

char lower = upper - 'A' + 'a';

不再。 在一家认真对待国际化的公司里做这样的事情几乎会被解雇。 使用支持 Unicode 的 toLower()/toUpper() 类型函数是更好的方法。

The English language, and even that strange variant, American English :-) , is not the only language on the planet. There are some very strange looking ones (at least to those familiar with the Latin-based characters) but even Latin-based ones have minor variations.

Two of which I am acquainted with on more than a casual basis are Greek and German:

Αα Ββ Γγ Δδ Εε Ζζ  Ηη Θθ Ιι Κκ Λλ Μμ
Νν Ξξ Οο Ππ Ρρ Σσς Ττ Υυ Φφ Χχ Ψψ Ωω

Aa Ää Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn
Oo Öö Pp Qq Rr Ss ß  Tt Uu Üü Vv Ww Xx Yy Zz

That's why we're not allowed to use bits of code like:

char lower = upper - 'A' + 'a';

any more. Doing something like that in a company that takes i18n seriously is near grounds for dismissal. Using Unicode-aware toLower()/toUpper()-type functions is the better way to go.

红颜悴 2024-07-29 18:57:08

除了我们大多数人习惯在这里看到的通常源自拉丁语的西欧字母表之外,还有很多字母表。 首先,您需要大写和小写版本的重音字母和连字,例如 Àà、IJij 等。 在设置亚洲语言文档时还使用了拉丁字符的全角版本(我懒得查找)。 此外,现在还使用其他字母,例如西里尔字母 (Бб) 和希腊字母 (Δδ)。

还有土耳其,根据杰夫·阿特伍德的说法,土耳其只是有点困难。 使用环境提供的大写/小写函数(通常)是处理用户输入数据的方法。

There's a lot of alphabets other than the usual Latin-derived western European alphabet most of us are used to seeing here. To start with, you'd need uppercase and lowercase versions of accented letters and ligatures, like Àà, IJij, and so on. There's also the fullwidth versions of Latin characters used when setting documents in Asian languages (which I'm too lazy to look up). Further, there are the other alphabets in use nowadays, like the Cyrillic (Бб) and Greek (Δδ) alphabets.

There's also Turkey, which is just kind of difficult according to Jeff Atwood. Using the uppercasing/lowercasing functions provided by your environment are (usually) the way to go with user-input data.

完美的未来在梦里 2024-07-29 18:57:08

德语中不需要大写 ß,因为该字母从不用作名称或单词的第一个字母。 对于其余的,在某些语言(法语?)中不使用大写重音字符,仅使用非重音变体。

An uppercase ß is not needed in the German language because the letter is never used as the first letter of a name or a word. For the rest, in some languages (French?) uppercase accented characters are not used, just the non-accented variant.

森末i 2024-07-29 18:57:08

<块引用>

在某些语言(法语?)中不使用大写重音字符(...)
[Reiner Bakels - 2012 年 12 月 10 日 19:34]

嗯,是的...但不是!

在手动“字体”页面制作的美好时代,这曾经是事实。
由于重音大写字母(例如“É”)会在一行中升得太高,因此通常的做法是忽略它,只显示“E”。 然后“des études”通常显示为“DES ETUDES”(不带重音)。

但不再推荐这样做了。 每当有人可以编辑/键入/发布重音大写字母时,我们都会被邀请这样做。 魁北克官方的“法语办公室”实际上二十多年来一直在宣传这一点!

在我们这个计算机和网络时代,文本越来越多地由机器处理(阅读和翻译),这一点变得尤为重要。 省略重音符号可以完全改变含义。
tache(污点)-vs- tâche(任务)、du(of)-vs- dû(你必须付出的东西),以及更多的单词。
继续省略大写字母的重音绝对不是一个好主意(尽管已有百年历史)。 现在只要有可能就使用它们是一种更好的做法。

in some languages (French?) uppercase accented characters are not used (...)
[Reiner Bakels - Dec 10 '12 at 19:34]

Well, yes... but no!

In the good ol'times of manual "font" page making, that used to be true.
Since an accentuated uppercase letter ("É" for example) would rise too high on a line, the usual practice was to ignore it an just display "E" instead. Then "des études" commonly appeared as "DES ETUDES" (without accent).

But that is not recommended anymore. Whenever one can edit/type/publish the accentuated capital letters, we are invited to do so. The very official Quebec's "Office de la langue française" is actually promoting this since more than two decades!

That is becoming specially crucial in our era of computers and the Web, where texts are more and more processed (read & translated) by machines. Omitting accents can entirely change the meaning.
tache (stain) -vs- tâche (task), du (of) -vs- dû (something you have to pay), and many many more words.
Continuing to omit the accents on uppercase is definitively not a good idea (although century old legacy). Using them whereever it is now possible is a far better practice.

红颜悴 2024-07-29 18:57:08

任何带重音的字母都可能具有不同的代码点,或者是多个代码点的组合。 例如,ËÕÝ 是具有小写等效项的大写字符。

关键是要根据用户的区域设置忠实地实现标准,或者通过使用正确处理 toupper()/tolower() 一般情况的系统库来获得相同的效果。

Any letter with an accent could potentially have different code point, or be a combination of more than one code point. For example, ÂËÕÝ are uppercase characters with lowercase equivalents.

The key is to implement the standards faithfully with respect to your users' locale settings, or get the same effect by using system libraries that handle the general case of toupper()/tolower() correctly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文