用于指示组合字符的 unicode 字形是什么?

发布于 2024-08-20 16:01:44 字数 810 浏览 3 评论 0原文

我的应用程序需要显示“孤立的”组合字符。我想使用与“官方”unicode 图表相同的格式,使用虚线圆圈占位符。例如,请参阅:

快速浏览图表我想出了U+25CC“点圆”。看起来不错,但这个角色的注释上写着:

请注意,此参考字形 字符故意大于 虚线圆圈字形用于 表示在此组合字符 标准;例如,参见 0300

其中(我认为)U+25CC 不是正确的字符。 (或者,如果是的话,也许只是一个措辞不佳的注释。)

所以:如果“组合变音符号”上使用的虚线圆圈不是 U+25CC,那么那个小鼻屎的正确代码是什么?

我已经尝试过:

  • 从 PDF 中复制文本并检查它,但该副本在 PDF 中被禁用。
  • 在 Gmail 中将其通过电子邮件发送给我自己,然后以 HTML 格式查看附件,但附件被转换为 U+0024(“美元符号”)。这意味着要么转换失败,要么他们只是在 PDF 中玩一些字体渲染游戏。

[澄清] 我意识到 U+25CC 看起来不错(假设某个字体支持它),但听起来规范说这是错误的字符。许多 unicode 字符具有相似的字形,但从语义上来说是不同的字符。对于大多数字体,“拉丁大写字母 A”(U+0041) 和“希腊大写字母 Alpha”(U+0391) 看起来相同,但它们具有不同的语义含义并且不可互换。

My application needs to display "orphaned" combining characters. I would like to use the same format as the "official" unicode charts, using the dotted circle placeholder. See, for example:

A quick scan through the charts and I came up with U+25CC "DOTTED CIRCLE". That looks good, but the note on this character reads:

note that the reference glyph for this
character is intentionally larger than
the dotted circle glyph used to
indicate combining characters in this
standard; see, for example, 0300

Which says (I think) that U+25CC is not the correct character. (Or, if it is, perhaps just a poorly worded note.)

So: if the dotted circle used on the "Combining Diacritical Marks" is not U+25CC, what is the correct code for that little booger?

I have tried:

  • Copying the text from the PDF and inspecting it, but the copy is disabled in the PDF.
  • Emailing it to myself in Gmail and then viewing the attachment as HTML, but there is gets converted to U+0024 ("DOLLAR SIGN"). Which means that either the conversion failed or they are just playing some font rendering games in the PDF.

[Clarification] I realize that the U+25CC looks OK (assuming one's font supports it), but it sounds like the spec says that this is the wrong character. Many unicode characters have similar glyphs but are different characters, semantically speaking. "Latin Capital Letter A" (U+0041) and "Greek Capital Letter Alpha" (U+0391) will look identical for most fonts, but they have different semantic meanings and are not interchangable.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

锦欢 2024-08-27 16:01:44

我认为没有官方占位符。根据我阅读该注释的方式,他们随意选择了 U+25CC,纯粹是为了显示目的。然后,在列出“真实”虚线圆圈的图表中,他们制作了它稍微大一点,以强调它被用作占位符。 (或者也许他们在其他图表中缩小了它;正如您所说,注释的措辞很糟糕。)

无论如何,我认为没有任何理由不使用 U+25CC 作为您的 占位符。

I don't think there is an official placeholder character. The way I read that note, they chose U+25CC arbitrarily, purely for display purposes. Then, in the chart where the "real" dotted circle is listed, they made it a little larger to emphasize that it's not being used as a placeholder there. (Or maybe they shrunk it in the other charts; as you said, the note's poorly worded.)

Whatever the case, I don't see any reason not to use U+25CC as your placeholder.

哭泣的笑容 2024-08-27 16:01:44

刚刚尝试过:创建一个空白的 .html 文件,复制文本,然后在 Firefox 中加载。按预期显示(虽然我真的没想到空格+组合字符能够正确显示):

<html>
<body>
<font size="24pt">
◌̀
◌́
◌̂
◌̃
<br/>
À
Á
Â
Ã
<br/>
 ̀
 ́
 ̂
 ̃
</font>
</body>
</html>

Just tried this: create a blank .html file, copy the text, and load in Firefox. Displays as expected (although I really didn't expect space+combining character to display correctly):

<html>
<body>
<font size="24pt">
◌̀
◌́
◌̂
◌̃
<br/>
À
Á
Â
Ã
<br/>
 ̀
 ́
 ̂
 ̃
</font>
</body>
</html>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文