用于将多个正则表达式与多个字符串的优先级进行匹配的Java工具
我有无限的字符串序列和大量按优先级排序的正则表达式。对于序列中的每个字符串,我必须找到第一个匹配的正则表达式和匹配的子字符串。字符串不是很长(<1Kb),而正则表达式的数量可能从数百到数千不等。
我正在寻找一个可以有效完成这项工作的 Java 工具。我想该技术应该是提前构建 DFA。
我当前的选择是 JFLEX。我在 JFLEX 中无法解决的问题是它的规则没有优先级,并且 JFLEX 会查找与文本最长部分匹配的规则。
我的问题是我的问题是否可以通过 JFLEX 解决?如果没有,您能推荐另一种可以做到的 Java 工具/技术吗?
I have an unlimited sequence of strings and numerous regular expressions ordered by priorities. For each string in a sequence I have to to find the first matching regular expression and the matched substring. Strings are not very long (<1Kb) while the number of regular expressions may vary from hundreds to thousands.
I'm looking for a Java tool that would do this job efficiently. I guess the technique should be building DFA ahead.
My current option is JFLEX. The problem I can't workaround in JFLEX is that its rules have no priorities and JFLEX looks for the rule matching the longest part of text.
My question is whether my problem could be solved with JFLEX? If not, can you suggest another Java tool/technique that would do?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用 Java 正则表达式。将替代项构建成 RE 字符串,每个替代项都用“(”和“)+?”包围并用“|”分隔,优先级最高的 RE 在前面。第一个构造使子 RE 变得贪婪,因此它们不会回溯并且“|”替代方案是从左到右评估的,因此优先级最高的 RE 将首先被尝试。
例如,给定字符串“zeroonetwothirdfour”,
请特别注意,在最后一个示例中,“twothird”匹配,即使它出现在目标字符串中并且比“onetwothird”匹配短。
You could use Java regexp's. Build up the alternatives into a RE string with each alternative surrounded with '(' and ')+?' and separated by '|', with the highest priority REs first. The first construct makes the sub-REs greedy so they won't backtrack and '|' alternatives are evaluated left-to-right so the highest priority REs will be tried first.
For example, given a string of "zeroonetwothreefour"
Note especially that in the last example, 'twothree' matches even though it occurs later in the target string and is shorter than the 'onetwothree' match.