正则表达式模式匹配在java中不适用于特定字符串

发布于 2024-12-23 09:41:38 字数 676 浏览 6 评论 0原文

我在java中使用了REGEX模式（如下所示）： Working Pattern

为字符串：工作字符串效果很好。但是当我尝试使用以下模式时：非工作模式

表示字符串：str =

非工作字符串抱歉图片上传。看起来 a00[] 中的字符“[]”在浏览器上的编码不同。有什么方法可以以不同的方式读取该字符？相同的字符在notepad++中有不同的表示。我正在使用 RXTX 和 inputStream.read(readBuffer) 来读取数据。有什么方法可以更新 java 中的编码方法来克服这个问题吗？ https://i.sstatic.net/FydBg.jpg i.imgur.com

P.S：对图像描述感到抱歉 - 如果它打印出来，我无法代表该角色。当我复制粘贴该字符时，它变成了一个空白空间。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半﹌身腐败 2024-12-30 09:41:38

奇怪的符号 (└) 看起来像某些字体中 ASCII 3 的表示方式。

在正则表达式中，\b 匹配单词边界。即，介于字母数字字符和非字母数字字符之间。它在第一种情况下有效，因为匹配的子字符串之前有一个数字（“9”），其后面有一个感叹号（“！”）（这是一个非字母数字字符）。

在第二种情况下，您将感叹号更改为字母，因此不再有从字母数字到非字母数字的转换。

解决方案是扩展正则表达式，使其也匹配符号和数字：

Pattern.compile("(\\x03\\d)(a)\\w*(?=\\x03\\d)");

我使用 \\x03\\d 来匹配代码。最后一部分 (?= ) 是前瞻。它检查它是否匹配，但不消耗它。正是如此，所以您连续进行多次匹配。

一个更简单的替代方案是仅将字符串拆分为“└”，然后检查各个部分。

s.split("\u0003")

The strange symbol (└) looks like how ASCII 3 is represented in some fonts.

In Regex, \b matches a word boundary. That is, between an alphanumeric and non-alphanumeric character. It works in the first case because there is a digit ("9") before the matched substring, and an exclamation mark ("!") right after it (which is a non-alphanumeric character).

In the second case you changed the exclamation mark to a letter, so there is no longer a transition from alphanumeric to non-alphanumeric.

The solution is to extend the Regex so it also matches the symbol and digit:

Pattern.compile("(\\x03\\d)(a)\\w*(?=\\x03\\d)");

I used \\x03\\d to match the codes. The last part (?= ) is a look-ahead. It checks if it matches, but does not consume it. This is so, so you do multiple matches in a row.

A simpler alternative, would be to just split the string on "└", and examine the pieces.