寻找“实际” QString 中的字符(字素)

发布于 2024-12-13 14:15:47 字数 447 浏览 1 评论 0原文

假设我有一个可能包含任何 Unicode 字符的 QString,并且我想迭代其字符或对它们进行计数。我所说的“字符”是指用户所感知的内容(大致相当于“字形”),而不仅仅是 QChars(16 位 Unicode 字符)。一些“实际”字符由多个 QChar(代理对;基本字符 + 组合标记)组成。对于某些组合字符,我可能会通过规范化字符串来创建复合字符,但这并不总是有帮助。

我是否忽略了一个将 QString 拆分为“实际”字符的内置函数?

或者如果我必须自己解析它,这是结构(在 EBNF 中)还是我遗漏了什么?

character = ((high_surrogate, low_surrogate) | base_character), {combining_mark}

base_character 是不是代理或组合字符的每个 QChar)

Let's say I have a QString that may consist of any Unicode characters, and I want to iterate through its characters or count them. And by "characters" I mean what the user perceives as such (so roughly equivalent to "glyphs") and not simply QChars (16-bit Unicode characters). Some "actual" characters are built of several QChars (surrogate pairs; base character + combining marks). For some combining characters I might get away with normalizing the string to create composite characters, but that does not always help.

Have I overlooked a built-in function that splits a QString into "actual" characters?

Or if I have to parse it myself, is this the structure (in EBNF) or am I missing something?

character = ((high_surrogate, low_surrogate) | base_character), {combining_mark}

(with base_character being every QChar that is not a surrogate or combining character)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

隱形的亼 2024-12-20 14:15:47

经过更多研究,我找到了表示“实际字符”的术语,grapheme,以及用于查找字素边界的 Qt 类:QTextBoundaryFinder

After more research I found the term for "actual character", grapheme, and with it the Qt class for finding grapheme boundaries: QTextBoundaryFinder.

最终幸福 2024-12-20 14:15:47

我不确定组合标记,但对于代理对,我认为您可以使用 QString::toUcs4() 应该返回字符串的 32 位 Unicode 表示形式。

I am not sure about the combining marks, but for the surrogate pairs, I think you can use QString::toUcs4() which should return a 32-bit Unicode representation of your string.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文