阿拉伯语:“来源” Unicode到最终显示Unicode
简单的问题:
这是我正在寻找的最终显示字符串
??????
一个空格以阻止连接)
??????
它们不是相同的字符,有一些神奇的转换可以将它们融合在一起并将它们转换为新的 Unicode 字符。
然后在上面,字符实际上是从右到左出现的(在内存中,它们是从左到右)
所以我的简单问题是:我在哪里可以获得一个独立于平台的 c/c++ 函数,它将采用我的源 16 位 Unicode字符串,然后对其进行转换以生成 Unicode 字符串,该字符串将创建上面第一个引用的字符串?进行 RTL 转换和连接?
这就是我想要的,一个能做到这一点的函数。
更新:
好的,是的,我知道上面两个例子中的“字符”是相同的,它们是相同的“字母”,但是(在 chrome 或最新的 IE 中查看)任何人都可以清楚地看到字形是不同的。现在我相当有信心需要完成的这种转换可以在 unicode 级别上完成,因为我的字体文件和 unicode 标准似乎为字符的单独版本和各种连接版本指定了不同的字形/字母。 (unicode.org/charts/PDF/UFB50.pdf unicode.org/charts/PDF/UFE70.pdf)
那么,我可以将我的unicode放入一个函数中并获取转换后的unicode吗?
simple question:
this is the final display string I am looking for
لعبة ديدة
now below is each of the separate characters, before being 'glued' together (so I've put a space between each of them to stop the joining)
ل ع ب ة د ي د ة
note how they are NOT the same characters, there is some magical transform that melds them together and converts them to new Unicode characters.
and then in that above, the characters are actually appearing right to left (in memory, they are left to right)
so my simple question is this: where do I get a platform independent c/c++ function that will take my source 16 bit Unicode string, and do the transform on it to result in the Unicode string that will create the one first quoted above? doing the RTL conversion, and the joining?
that's all I want, one function that does that.
UPDATE:
ok, yes, I know that the 'characters' are the same in the two above examples, they are the same 'letters' but (viewing in chrome, or latest IE) anyone can CLEARLY see that the glyphs are different. now I'm fairly confident that this transform that needs to be done can be done on the unicode level, because my font file, and the unicode standard, seems to specify the different glyphs for both the separate, and various joined versions of the characters/letters. (unicode.org/charts/PDF/UFB50.pdf unicode.org/charts/PDF/UFE70.pdf)
so, can I just put my unicode into a function and get the transformed unicode out?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
连接和 RTL 转换不会发生在 Unicode 字符级别。
换句话说:字符的顺序和实际的unicode代码点在此过程中不会改变。
事实上,合并和处理 RTL/LTR 转换是由文本渲染引擎处理的。
维基百科关于阿拉伯字母表的文章中的这句话很好地解释了这一点:
The joining and RTL conversion don't happen at the level of Unicode characters.
In other words: the order of the characters and the actual unicode codepoints are not changed during this process.
In fact, the merging and handling RTL/LTR transitions is handled by the text rendering engine.
This quote from the Wikipedia article on the Arabic alphabet explains it quite nicely:
您要查找的处理称为连字。与许多基于拉丁语的语言不同,在这些语言中,您只需将一个字符放在另一个字符即可呈现文本,而连字是阿拉伯语的基础。替换是在文本渲染引擎中完成的,连字信息通常存储在字体文件中。
对于阿拉伯读者来说它们是相同的。它仍然可读。
无需对 Unicode16 源文本进行任何转换。您必须向文本渲染器提供整个字符串。在 C/C++ 中,当您采用独立于平台的方式时,您可以使用 Pango 进行渲染。
<子>
注意:也许您想写 ?????????(即新游戏)? 因为你举的例子在阿拉伯语中没有任何意义。
The processing you're looking for is called ligature. Unlike many latin-based languages, where you can simply put one character after another to render the text, ligatures are fundamental in arabic. The substitution is done in the text rendering engine, and the ligature infos are generally stored in font files.
They are the same for an Arabic reader. It is still readable.
There is no transform to do on your Unicode16 source text. You must provide the whole string to your text renderer. In C/C++, and as you are going the platform independent way, you can use Pango for rendering.
Note : Perhaps you wanted to write لعبة جديدة (i.e. new game) ? Because what you give as an example has no meaning in Arabic.
我意识到这是一个老问题,但您正在寻找的是 FriBidi,< a href="http://www.unicode.org/reports/tr9/" rel="nofollow">Unicode 双向算法。
该程序执行问题中询问的字形选择,以及处理双向文本(从右到左和从左到右文本的混合)。
I realise this is an old question, but what you're looking for is FriBidi, the GNU implementation of the Unicode bidirectional algorithm.
This program does the glyph selection that was asked about in the question, as well as handling bidirectional text (mixture of right-to-left and left-to-right text).
您正在寻找的是阿拉伯文字合成算法。我不知道有一个开源软件存在。如果您到达,请发帖。
几点:
在存储级别,没有 Unicode 转换。正如其他答案所指出的,字符串有一个抽象表示。
在渲染级别,您可以选择使用 Unicode 表示形式,但也可以选择使用其他形式。 Unicode 表示形式并不是表示输出编码应该是什么的标准 - 相反,它们只是可以由渲染引擎使用脚本合成输出的表示代码的一个示例。
更清楚地说:不会有一个标准转换(即合成算法)可以从 A 转换到 B,其中 A 是标准 Unicode 阿拉伯语页面,B 是标准 Unicode 阿拉伯语表示形式。相反,会有不同的转换,其复杂性可能不同,并且 B 可以有不同的编码系统,但可用于 B 的编码之一是 Unicode 表示形式。
例如,简单的打字机样式需要简单的渲染算法,而无需演示表单。事实上,确实存在现代书写风格(尽管不常见),其中 A 和 B 实际上是相同的,只是使用不同的字体页面来进行渲染。另一方面,渲染排版或传统书法形式的转换会更加复杂,并且需要类似于 Unicode 表示形式的东西。
以下是有关该主题的更多信息的一些提示:
What you are looking for is an Arabic script synthesis algorithm. I'm not aware one exists as open source. If you arrive at one please post.
Some points:
At the storage level, there is no Unicode transform. There is an abstract representation of the string as pointed out by other answers.
At the rendering level, you could choose to use Unicode Presentation Forms, but you could also choose to use other forms. Unicode Presentation Forms are not a standard for what presentation output encoding should be - rather they are just one example of presentation codes that can be output by the rendering engine using script synthesis.
To make it clearer: There wouldn't be a single standard transform (ie synthesis algorithm) that would transform from A to B, where A is standard Unicode Arabic page, and B is standard Unicode Arabic Presentation Forms. Rather, there would be different transformations that can vary in complexity and can have different encoding systems for B, but one of the encodings that can be used for B is the Unicode Presentation Forms.
For example, a simple typewriter style would require a simple rendering algorithm that would not require Presentation Forms. Indeed there does exist modern writing styles (not in common usage though) where A and B are actually identical, only that a different font page would be used to do the rendering. On the other hand, the transform to render typesetting or traditional calligraphic forms would be more complex and require something similar to the Unicode Presentation Forms.
Here are a couple of pointers for more information on the topic:
请参阅:http://www.fileformat.info/info/unicode/ block/arabic_presentation_forms_b/list.htm 并查看此存储库:https://github.com/Accorpa/Arabic-Converter -从和到阿拉伯语演示文稿形式-B
PLease see: http://www.fileformat.info/info/unicode/block/arabic_presentation_forms_b/list.htm and Have a look at this repo: https://github.com/Accorpa/Arabic-Converter-From-and-To-Arabic-Presentation-Forms-B