.NET 的正则表达式引擎如何处理 RTL+LTR 混合字符串?
我有一个混合的希伯来语/英语字符串需要解析。 字符串是这样构建的:
[3 hebrew] [2 english 2] [1 hebrew],
因此,它可以读作: 1 2 3,并且存储为 3 2 1 (文件中的确切字节序列,在十六进制编辑器中仔细检查,无论如何 RTL 只是 显示属性)。 .NET 正则表达式解析器具有 RTL 选项,(当给定对于纯 LTR 文本)从字符串的右侧开始处理。
我想知道,何时应应用此选项从字符串中提取 [3 hebrew] 和 [2 english] 部分,或检查 [1 hebrew] 是否与字符串末尾匹配?是否有任何隐藏的细节或者没有什么可担心的(例如处理任何具有特殊 unicode 字符的 LTR 字符串时)?
另外,谁能给我推荐一个好的 RTL+LTR 文本编辑器? (担心 VS Express 有时会显示错误的文本,如果它甚至会开始弄乱保存的字符串 - 我想重新检查文件而不再使用十六进制编辑器)
I have a mixed Hebrew/english string to parse.
The string is built like this:
[3 hebrew] [2 english 2] [1 hebrew],
So, it can be read as: 1 2 3, and it is stored as 3 2 1 (exact byte sequence in file, double-checked in hex editor, and anyway RTL is only the display attribute). .NET regex parser has RTL option, which (when given for plain LTR text) starts processing from right side of the string.
I am wondering, when this option should be applied to extract [3 hebrew] and [2 english] parts from the string,or to check if [1 hebrew] matches the end of the string? Are there any hidden specifics or there's nothing to worry about (like when processing any LTR string with special unicode characters)?
Also, can anyone recommend me a good RTL+LTR text editor? (afraid that VS Express displays the text wrong sometimes, and if it can even start messing the saved strings - I would like to re-check the files without using hex editors anymore)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
RightToLeft
选项指的是正则表达式采用的字符序列的顺序,实际上应该称为LastToFirst
,因为在希伯来语和阿拉伯语中,它实际上是左-从右到右,并且使用混合 RLT 和 LTR 文本(例如您所描述的“从右到左”表达方式)就更不合适了。这对速度(仅在搜索文本很大时才重要)和使用
startAt
索引完成的正则表达式(搜索字符串中早于startAt
的内容)影响较小。 code> 而不是字符串后面的部分)。例子;希望浏览器不要把这个搞得太乱:
The
RightToLeft
option refers to the order through the character sequence that the regular expression takes, and should really be calledLastToFirst
since in the case of Hebrew and Arabic it is actually left-to-right, and with mixed RLT and LTR text such as you describe the expression "right to left" is even less appropriate.This has a minor effect on speed (will only matter if the searched text is massive) and on regular expressions that are done with a
startAt
index (searching those earlier in the string thanstartAt
rather than later in the string).Examples; let's hope the browers don't mess this up too much: