正则表达式未提取确切的模式
我正在用 Java 读取超过 100000 个字符的字符串。 我有一个关键字列表,我会搜索字符串,如果该字符串存在,我会调用一个进行一些内部处理的函数。
例如,我拥有的关键字是“face” - 我希望获得与“faces”而不是“facebook”匹配的所有模式。我可以接受字符串中脸部后面的空格字符,因此如果在字符串中我有一个像“face”或“faces”或“face”或“faces”这样的匹配项,我也可以接受。但是我不能接受“duckface”或“duckface”等。
我已经编写了正则表达式
Pattern p = Pattern.compile("\\s+"+keyword+"s\\s+|\\s+");
,其中关键字是我的关键字列表,但我没有得到所需的结果。您能阅读我的描述并请建议可能出现的问题以及如何解决它吗?
另外,如果共享一个非常好的 Java 页面正则表达式的指针,我也会很感激。
谢谢贡献者..
编辑
我知道它不起作用的原因是我使用了以下代码:
Pattern p = Pattern.compile("\\s+"+keyword+"s\\s+|\\s+");
Matcher m = p.matcher(myInputDataSting);
if(m.find())
{
System.out.println("Its a Match: "+m.group());
}
这将返回一个空白字符串...
I am working in Java to read a string of over 100000 characters.
I have a list of keywords, that I search the string for, and if the string is present I call a function which does some internal processing.
The kind of keyword I have is "face", for example - I wish to get all the patterns where I have matches for "faces" not "facebook". I can accept a space character behind the face in the string so if in a string I have a match like " face" or " faces" or "face " or " faces" i can accept that too. However I can not accept "duckface" or "duckface " etc.
I have written the regex
Pattern p = Pattern.compile("\\s+"+keyword+"s\\s+|\\s+");
where keyword is my list of keywords, but I am not getting the desired results. Can you read my description and please suggest what might be issue and how I can fix it?
Also if a pointer to a really good regex for Java page is shared I would appreciate that as well.
Thank you Contributers ..
Edit
The reason I know it is not working is I have used the following code:
Pattern p = Pattern.compile("\\s+"+keyword+"s\\s+|\\s+");
Matcher m = p.matcher(myInputDataSting);
if(m.find())
{
System.out.println("Its a Match: "+m.group());
}
This returns a blank string...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果
keyword
是“face”
,那么您当前的正则表达式匹配一个或多个空白字符,后跟
faces code>,后跟一个或多个空白字符,或 一个或多个空白字符。 (管道
|
的优先级非常低。)您真正想要的是
匹配单词边界,后跟
face
,可选地后跟s
,后跟单词边界。所以,你可以写:(
尽管显然这仅适用于像
face
这样的单词,只需添加s
即可形成复数形式)。您可以在 http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html,但这并不是一个教程。为此,我建议仅谷歌搜索“正则表达式教程”,然后找到适合您的。 (它不必是特定于 Java 的:您会发现的大多数教程都是针对与 Java 非常相似的正则表达式风格。)
If
keyword
is"face"
, then your current regex iswhich matches either one or more whitespace characters, followed by
faces
, followed by one or more whitespace characters, or one or more whitespace characters. (The pipe|
has very low precedence.)What you really want is
which matches a word boundary, followed by
face
, optionally followed bys
, followed by a word boundary.So, you can write:
(though obviously this will only work for words like
face
that form their plurals by simply addings
).You can find a comprehensive listing of Java's regular-expression support at http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html, but it's not much of a tutorial. For that, I'd recommend just Googling "regular expression tutorial", and finding one that suits you. (It doesn't have to be Java-specific: most of the tutorials you'll find are for flavors of regular-expression that are very similar to Java's.)
你应该使用
,其中关键字不是复数。 \\b 表示关键字必须是搜索字符串中的完整单词。是?表示关键字的值可能以 s 结尾。
如果您对正则表达式不够熟悉,我建议您阅读 http://docs .oracle.com/javase/tutorial/essential/regex/index.html,因为有示例和解释。
You should use
, where keyword is not plural. \\b means that keyword must be as a complete word in searched string. s? means that keyword's value may end with s.
If you are not familar enough with regular expressions I recommend reading http://docs.oracle.com/javase/tutorial/essential/regex/index.html, because there are examples and explanations.