包含阿拉伯语和西方字符的字符串连接
我正在尝试连接多个包含阿拉伯语和西方字符(混合在同一字符串中)的字符串。问题是结果是一个字符串,很可能在语义上是正确的,但与我想要获得的不同,因为字符的顺序被 Unicode 双向算法改变了。基本上,我只想将它们连接起来,就好像它们都是 LTR 一样,忽略了一些是 RTL 的事实,这是一种“不可知”的连接。
我不确定我的解释是否清楚,但我认为我不能做得更好。
希望有人能帮助我。
亲切的问候,
Carlos Ferreira
顺便说一句,字符串是从数据库中获取的。
编辑
前 2 个字符串是我要连接的字符串,第三个字符串是我要连接的字符串。是结果。
编辑2
实际上,连接的字符串与图像中的字符串有点不同,它在复制+粘贴过程中发生了变化,1在第一个A之后,而不是在第二个A之前。
I'm trying to concatenate several strings containing both arabic and western characters (mixed in the same string). The problem is that the result is a String that is, most likely, semantically correct, but different from what I want to obtain, because the order of the characters is altered by the Unicode Bidirectional Algorithm. Basically, I just want to concatenate as if they were all LTR, ignoring the fact that some are RTL, a sort of "agnostic" concatenation.
I'm not sure if I was clear in my explanation, but I don't think I can do it any better.
Hope someone can help me.
Kind regards,
Carlos Ferreira
BTW, the strings are being obtained from the database.
EDIT
The first 2 Strings are the strings I want to concatenate and the third is the result.
EDIT 2
Actually, the concatenated String is a little different from the one in the image, it got altered during the copy+paste, the 1 is after the first A and not immediately before the second A.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用 unicode 格式控制代码点嵌入 bidi 区域:
因此在 java 中,要嵌入 RTL 语言就像英语等 LTR 语言中的阿拉伯语一样,您可以做
和 做相反的事情
参见 双向通用格式了解更多详细信息,或源材料中有关“方向格式代码”的 Unicode 规范章节。
You can embed bidi regions using unicode format control codepoints:
So in java, to embed a RTL language like Arabic in an LTR language like English, you would do
and to do the reverse
See Bidirectional General Formatting for more details, or the Unicode specification chapter on "Directional Formatting Codes" for the source material.
您很可能需要将 Unicode 方向格式化代码插入到字符串中才能正确显示字符串。有关详细信息,请参阅 Unicode 双向算法规范的方向格式化代码。
也许 Bidi 类可以帮助您确定正确的序列,因为它实现了 Unicode 双向算法。
It's very likely that you need to insert Unicode directional formatting codes into your string to get your string display correctly. For details see Directional Formatting Codes of the Unicode Bidirectional Algorithm specification.
Maybe the Bidi class can help you in determining the correct sequence, as it implements the Unicode Bidirectional Algorithm.
它不会改变代码点的顺序。发生的情况是,当显示字符串时,它发现该字符串以从右到左的脚本开头,因此它从右到左显示它。
It's not changing order of the codepoints. What's happening is that when it comes to display the string, it sees that the string starts with a right-to-left script, so it displays it right-to-left.