正则表达式模式匹配在java中不适用于特定字符串
我在java中使用了REGEX模式(如下所示):
为字符串: 效果很好。但是当我尝试使用以下模式时:
表示字符串:str =
抱歉图片上传。看起来 a00[] 中的字符“[]”在浏览器上的编码不同。有什么方法可以以不同的方式读取该字符?相同的字符在notepad++中有不同的表示。我正在使用 RXTX 和 inputStream.read(readBuffer) 来读取数据。有什么方法可以更新 java 中的编码方法来克服这个问题吗? https://i.sstatic.net/FydBg.jpg i.imgur.com
P.S:对图像描述感到抱歉 - 如果它打印出来,我无法代表该角色。 当我复制粘贴该字符时,它变成了一个空白空间。
I was using a REGEX pattern in java (given below):
for the string:
It works fine. But when I tried using the below pattern:
for the string:str =
Sorry about the image upload. Looks like the character '[]' in a00[] is encoded differently on the browser. Any ways to read that character in a different manner? The same character has a different representation in notepad++. I'm using RXTX and inputStream.read(readBuffer) to read the data. Is there any way I can update my encoding methods in java to overcome this?
https://i.sstatic.net/FydBg.jpg
i.imgur.com
P.S: Sorry about the image description - if it type it out i cant represent that character.
when i copy paste that character, it becomes an empty space.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
奇怪的符号 (└) 看起来像某些字体中 ASCII 3 的表示方式。
在正则表达式中,
\b
匹配单词边界。即,介于字母数字字符和非字母数字字符之间。它在第一种情况下有效,因为匹配的子字符串之前有一个数字(“9”),其后面有一个感叹号(“!”)(这是一个非字母数字字符)。在第二种情况下,您将感叹号更改为字母,因此不再有从字母数字到非字母数字的转换。
解决方案是扩展正则表达式,使其也匹配符号和数字:
我使用
\\x03\\d
来匹配代码。最后一部分(?= )
是前瞻。它检查它是否匹配,但不消耗它。正是如此,所以您连续进行多次匹配。一个更简单的替代方案是仅将字符串拆分为“└”,然后检查各个部分。
The strange symbol (└) looks like how ASCII 3 is represented in some fonts.
In Regex,
\b
matches a word boundary. That is, between an alphanumeric and non-alphanumeric character. It works in the first case because there is a digit ("9") before the matched substring, and an exclamation mark ("!") right after it (which is a non-alphanumeric character).In the second case you changed the exclamation mark to a letter, so there is no longer a transition from alphanumeric to non-alphanumeric.
The solution is to extend the Regex so it also matches the symbol and digit:
I used
\\x03\\d
to match the codes. The last part(?= )
is a look-ahead. It checks if it matches, but does not consume it. This is so, so you do multiple matches in a row.A simpler alternative, would be to just split the string on "└", and examine the pieces.