为什么Unicode u+ 202e和u+ 202c导致输出文本具有不同的结果
在java中:
-
如果我打印
“ 123 \ u202e987 \ u202c456abc”
,则结果是 123987456ABC -
如果我打印
” 123 \ u202e987 /代码>然后结果为 123987xyzabc
您会看到,当“ 456”更改为“ xyz”为“ xyz”字符串中的“ xyz”时打印输出序列不同。
这是如何运作的?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Unicode正在这样做。
因为两者都取决于他们之后的文字并以某种方式进行编辑。
\ u202c:问题中的 ,
123 \ u202e987 \ u202cxyzabc
输出>输出123987xyzabc
。 \ u202e导致987被输出(反向)为789。\ u202c停止了左右覆盖。在第二种情况下,\ u202c之后是一些数字,它们的方向性较弱。因此,Unicode仅导致数字在\ u202e之前。
编辑: @skomisa的答案更好。
The Unicode is doing that.
Because both depend on the text after them and edits them in a way.
In your question,
123\u202e987\u202cxyzabc
outputs123987xyzabc
. \u202e causes the 987 to be outputted (reversed) as 789. And \u202c stops the RIGHT-TO-LEFT override.In the second case, after the \u202c are some digits, which have weak directionality. So, the unicode causes only the digits to be directed to before the \u202e.
EDIT: @skomisa's answer is better.
TLDR:您看到的效果会出现,因为数字和字母字符通过Unicode算法的处理方式不同,该算法决定了包含格式控制字符的文本的渲染。
对于您要显示的文本:
这就解释了为什么文本 123 \ u202e987 \ u202cxyzabc 在您的示例中被渲染为 123987xyzabc 。 RLO(\ u202e)导致以下文本以右顺序呈现(SO 987 显示为 789 ),PDF(\ u202c)终止后续文本的逆转。
但这并不能解释为什么 123 \ u202e987 \ u202c456abc 被渲染为123 456 789ABC。通过该参数,预期的输出应为123789 456 abc。
用于确定这样的方案中输出的算法非常复杂,但一个因素是字符渲染的方向性。字母字符具有很强的方向性,但是数字(即数字字符)的方向性较弱。有关详细信息,请参见Unicode文档unicode®标准附件#9
Unicode双向算法,尤其是
该文档提供了一个类似于您的示例,其中包含 pright to-lef-left嵌入(rle)字符(而不是rlo) ,后来是PDF和一些包含数字的尾随文本:
请注意,在他们的示例中,不仅仅是移动的数字。美元符号和时期也是如此,因为文本中的所有六个字符 $ 19.95 的方向性较弱。
注意:
TLDR: The effect you are seeing arises because digits and alphabetic characters are treated differently by the Unicode algorithm that determines the rendering of text containing format control characters.
For the texts you are displaying:
That explains why the text 123\u202e987\u202cxyzabc in your example is rendered as 123987xyzabc. The RLO (\u202e) causes the text that follows to be rendered in right to left order (so 987 is displayed as 789), and the PDF (\u202c) terminates reversal for the subsequent text.
But it does not explain why 123\u202e987\u202c456abc is rendered as 123456789abc. By that argument, the expected output should be 123789456abc instead.
The algorithm used to determine the output in scenarios like this is very complex, but one factor is the directionality of the characters being rendered. Alphabetic characters have strong directionality, but numbers (i.e. digit characters) have weak directionality. For full details see the Unicode document Unicode® Standard Annex #9
UNICODE BIDIRECTIONAL ALGORITHM, and especially section 3.3.4 Resolving Weak Types
That document provides an example similar to yours, with text containing a RIGHT-TO-LEFT EMBEDDING (RLE) character (rather than an RLO), later followed by a PDF and some trailing text containing digits:
Note that in their example it wasn't just the digits that were moved. The dollar sign and the period were as well, because all six of the characters in the text $19.95 have weak directionality.
Notes: