我可以确定正则表达式模式匹配的第一个字符集吗？

发布于 2024-07-17 06:03:32 字数 498 浏览 10 评论 0 原文

我希望能够计算给定的 java.util.regex.Pattern 实例可以与字符串中的第一个字符匹配的所有字符的集合。更正式地说，给定 DFA 相当于某个正则表达式，我想要从起始状态开始的所有传出转换的集合。

示例：

Pattern p = Pattern.compile("[abc]def|daniel|chris|\\s+");
Set<Character> first = getFirstSet(p);

集合 first 应包含以下元素：

{ 'a', 'b', 'c', 'd', ' ', '\n', '\r', '\t' }

有什么想法吗？我很清楚我可以自己构建 DFA 并以这种方式确定相关状态，但我想避免这种麻烦（阅读：这对我来说不值得那么多）。请注意，我的主机语言实际上是 Scala，因此我可以访问所有核心 Scala 库（无论其价值如何）。

原文

I would like to be able to compute the set of all characters which may be matched as the first character in a string by a given instance of java.util.regex.Pattern. More formally, given the DFA equivalent to a certain regular expression, I want the set of all outgoing transitions from the start state.

An example:

Pattern p = Pattern.compile("[abc]def|daniel|chris|\\s+");
Set<Character> first = getFirstSet(p);

The set first should contain the following elements:

{ 'a', 'b', 'c', 'd', ' ', '\n', '\r', '\t' }

Any ideas? I'm well aware that I could construct the DFA myself and determine the relevant states that way, but I'd like to avoid that kind of hassle (read: it's not worth that much to me). Note that my host language is actually Scala, so I have access to all of the core Scala libs (for what it's worth).

分享到QQ

分享到微博