正则表达式与分组的交集,如何导出分组位的交集?
我目前正在尝试解决类似于测试两种常规语言的交集 例外的是我知道如何进行相交,但有一个额外的要求。
我打算使用的交集逻辑是 Dragon Book 的将 NFA 转换为 DFA 的算法,但同时在两个 NFA 上执行。由于所有 DFA 都是 NFA(但具有很少的不确定性),因此您可以根据需要重复此操作以获得更多交叉点。
我的问题是我的正则表达式之一具有可以进一步用作新正则表达式的一部分的组。具体来说:
bin/x86/a.out: obj/x86/.*\.o
obj/{[a-zA-Z0-9]+}/{.*}.o : src/\2.c
在第一行的末尾,我有一个匹配 x86 目标的所有对象的正则表达式。在第二行中,我有一个正则表达式,它指定可能的构建行,该行应将第一组与固定的“x86”相匹配,第二组与其后的任何给定字符串相匹配。在示例中,第一个匹配项尚未使用,但它应该是可检索的。为了确保匹配结束(并允许递归规则),我想使用从第一个正则表达式获得的信息来匹配第二个正则表达式。通过从第一行中获取第二个正则表达式和从第二行中获取第一个正则表达式来选择规则,并确定两者的交集(由交集产生的 DFA)是否具有接受状态。如果确实如此,则存在双方都可以解析的句子,因此该组可以采用一些值。
一般来说,是否可以从第一个正则表达式中提取信息以用于匹配第二个正则表达式组?
如果不是一般情况下,我需要添加哪些类型的限制?
I'm currently trying to solve a problem that's similar to Testing intersection of two regular languages with the exception that I know how to do the intersection, but have an additional requirement.
The intersection logic I intend to use is the Dragon Book's algorithm for converting an NFA to a DFA, but executed on two NFA's at the same time. Since all DFA's are NFA's (but with very little non-determinism), you can repeat this as needed for more intersections.
My problem is that one of my regexes has groups that can be used further on as a part of a new regex. Concretely:
bin/x86/a.out: obj/x86/.*\.o
obj/{[a-zA-Z0-9]+}/{.*}.o: src/\2.c
In the end of the first line I have a regex that matches all objects for x86 targets. In the second line I have a regex that specifies a possible build line, that should match the first group with the fixed "x86" and the second with any given string after it. In the example the first match isn't used yet, but it should be retrievable. To make sure that the matching ends (and to allow recursive rules), I want to use the information gained from the first regex in matching the second. The rule is selected by taking the second regex from the first and the first from the second line and to determine if the intersection of the two (the DFA resulting from the intersection) has an accepting state. If it does, there are sentences that both can parse and therefore some values that the group can take.
Is it possible, in general, to extract information from the first regex for use in matching the group of the second regex?
If not in general, what kinds of restrictions do I need to add?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我相信反引号使语言变得不规则,因此您无法将其转换为有限自动机。
I believe back-ticks make the language non-regular, so you won't be able to convert it to a finite-automoton.
因为这就是我想要做的事情(没有双关语)。
到目前为止还没有。我正在考虑根据这个问题的输出来写我自己的。如果这是不可能的,我可以使用支持此功能的现有版本。如果这在理论上是可能的,我将开发自己的产品来做到这一点&按照我的意愿进行申请。
交集背后的想法是定义通用的规则,并且可以包含多个不同的左侧部分(在通常的 makefile 中使用 %,但不需要进行某种排序)如果您确实有多个变化点(例如平台、构建类型或文件名),请使用递归 make。如果我不能考虑该组的第二个正则表达式,我就不能递归地使用这样的规则,因为递归在每个步骤/级别之间不会有任何变化。这会降低通用性,但仍然可以接受。尽管如此,知道答案仍然是一个有趣的问题(IE,可以通用地完成),并且它将决定我对正则表达式库的要求。
(没有作为原作者发布,因为我丢失了我的 cookie 并且正在等待帐户合并)。
Because that's the thing I'm trying to make (no pun intended).
None, as of yet. I'm considering to write my own based on the output of this question. If this isn't possible, I may make do with an existing one that supports this. If this is theoretically possible, I'll develop my own to do exactly this & make the application as I intend it.
The idea behind the intersection is to define rules that are generic and can contain multiple varying left-side parts (the use of % in usual makefiles, but without the requirement to do some sort of recursive make if you do have more than one variation point - such as the platform, build type or file name). If I can't take the second regex into account for the group I can't use such a rule recursively because the recursion wouldn't have any change between each step/level. That would reduce the genericity but would still be acceptable. Still, it's an interesting question to know the answer to (IE, the can it be done generically) and it'll be deciding in the requirements I'll have for a regex library.
(Not posted as original author because I lost my cookie & am waiting for the accounts to be merged).