正则表达式捕获未知数量的重复组
我正在尝试编写一个在 Java 程序中使用的正则表达式,该表达式将识别可能在输入中出现未知次数的模式。我愚蠢的小例子是:
String patString = "(?:.*(ht).*)*";
然后我尝试通过循环从“the hut is hot”这样的行访问匹配项通过 matcher.group(i)。它只记住最后一场比赛(在本例中为“热门”),因为只有一个捕获组——我猜当重用捕获组时,matcher.group(1) 的内容会被覆盖。不过,我想要的是某种包含“hut”和“hot”的数组。
有更好的方法吗? FWIW,我真正想做的是在信号词之后拾取所有(可能是多词)专有名词,其中之间可能还有其他单词和标点符号。因此,如果“看到”是信号,并且我们有“我看到鲍勃和约翰·史密斯以及他的妻子玛格丽特”,那么我想要 {“鲍勃”,“约翰·史密斯”,“玛格丽特”}。
I'm try to write a regular expression to use in a Java program that will recognize a pattern that may appear in the input an unknown number of times. My silly little example is:
String patString = "(?:.*(h.t).*)*";
Then I try to access the matches from a line like "the hut is hot" by looping through matcher.group(i). It only remembers the last match (in this case, "hot") because there is only one capture group--I guess the contents of matcher.group(1) get overwritten as the capture group is reused. What I want, though, is some kind of array containing both "hut" and "hot."
Is there a better way to do this? FWIW, what I'm really trying to do is to pick up all the (possibly multiword) proper nouns after a signal word, where there may be other words and punctuation in between. So if "saw" is the signal and we have "I saw Bob with John Smith, and his wife Margaret," I want {"Bob","John Smith","Margaret"}.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
(类似问题:具有可变组数的正则表达式?)
这是不可能的。最好的选择是使用
ht
,并使用该功能 确实存在于.NET中,但如上所述,Java 中没有对应的内容。
(Similar question: Regular expression with variable number of groups?)
This is not possible. Your best alternative is to use
h.t
, and use aThe feature does exist in .NET, but as mentioned above, there's no counterpart in Java.