icu4c--> ushape.c 塑造过程中缺少字符?

发布于 2024-09-26 08:20:50 字数 959 浏览 3 评论 0原文

在我们的语言中,我们使用阿拉伯字符进行书写,但存在一些差异, icu 的 ushape.c (阿拉伯语整形器)仅适用于主要阿拉伯语字符,不会塑造我的语言特定字符(即 0x6D5 等),我更改了 ushape.c 以适用于我的语言,除了字符外,它运行良好,即是 0x649,在阿拉伯语中它们只有 2 个形状,在我的语言中我们有 4 个形状。

我将第 183 行更改

1                + 256 * 0x7F,/*0x0649*/

1+2+8             + 256 * 0x98 /*0x649*/

并将第 121 行更改

static const UChar yehHamzaToYeh[] =
{
/* isolated*/ 0xFEEF,
/* final   */ 0xFEF0
};

static const UChar yehHamzaToYeh[] =
    {
        /* isolated */0xFEEF, 
                       0xFBE8, // my language specific
                      0xFBE9,// my language specific
        /* final */   0xFEF0 
   };

ushape.c

现在它可以毫无问题地生成 3 个形状(开始、孤立和最终),但中间形状显示为正方形(缺少字符)。

我尝试用其他数字替换“* 0x98”,但这是我能得到的最好的。

我应该怎么办 ?

in our langauge we use arabic characters in writing with some differences,
icu's ushape.c ( arabic shaper) only works with main arabic characters and dosn't shape my language specific characters ( i.e 0x6D5 etc) i'v changed ushape.c to work with my language and it worked well except for on character, that is 0x649, in arabic they have only 2 shapes, in my langauge we have 4 shapes for it.

i'v changed line 183

1                + 256 * 0x7F,/*0x0649*/

to

1+2+8             + 256 * 0x98 /*0x649*/

and changed line 121

static const UChar yehHamzaToYeh[] =
{
/* isolated*/ 0xFEEF,
/* final   */ 0xFEF0
};

to

static const UChar yehHamzaToYeh[] =
    {
        /* isolated */0xFEEF, 
                       0xFBE8, // my language specific
                      0xFBE9,// my language specific
        /* final */   0xFEF0 
   };

from ushape.c

now it can produce 3 shapes with no problem ( the beginning,isolated and final), but middle shape is displayed as a square ( missing character ) .

i tried replacing "* 0x98" with other numbers, but this best i can get.

what should i do ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

满天都是小星星 2024-10-03 08:20:50

维吾尔?我和几个人讨论了维吾尔语翻译的问题,不是这个特定问题,而是一般问题。

当你说你得到一个正方形时,你得到的 Unicode 字符是什么?

你真正应该做的是向 ICU 提交错误并在那里进行讨论。这是一个功能请求,而不是一个使用问题。

我的记忆是,对于维吾尔族来说,它对塑造有不同的用途,你基本上会希望在塑造器上有不同的模式。

Uighur? I discussed with a couple of people about Uighur rendering, not this particular issue but in general.

When you said you get a square, what Unicode character do you get?

What you really should do is to file a bug with ICU and discuss it there. This is a feature request, not a usage question.

My rusty recollection is that for Uighur it makes different use of shaping, and you will want to basically have a different mode on the shaper.

安静被遗忘 2024-10-03 08:20:50

ICU 确实似乎在某些语言的塑造方面存在问题,例如乌尔都语。

然而,您的特定字符 649 可能不是您正在寻找的字符。

U+649 是 alef maksura,看起来与 Farsi Yeh U+6cc 由 ICU 适当塑造。

他们确实有不同的表现形式:
Alef maksura 只有孤立的最终形式: U+feef U+feef U+feef净/U+fef0" rel="nofollow">U+fef0
波斯语 yeh 具有全部四种形式: U+fbfc U+fbfd U+fbfe U+fbff

ICU indeed seems to have problems for shaping with some languages, e.g. Urdu.

Your specific character 649 however is probably not the characters that you are looking for.

U+649 is alef maksura which looks identical to Farsi Yeh U+6cc which is shaped properly by ICU.

They do have different presentation forms:
Alef maksura only has isolated and final form: U+feef U+fef0
Farsi yeh has all four forms: U+fbfc U+fbfd U+fbfe U+fbff

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文