推广 UNIX 风格正则表达式的泵引理

发布于 2024-08-28 06:23:44 字数 467 浏览 12 评论 0原文

除了常见的 **+?* 运算符之外,大多数 UNIX 正则表达式还具有反斜杠运算符,其中 \1,\2 ,... 匹配最后一个括号中的内容,因此例如 *L=(a*)b\1* 匹配(非常规)语言 *a^nba ^n*

一方面,这似乎非常强大,因为您可以创建 (a*)b\1b\1 来匹配语言 *a^nba^nba^n*甚至堆栈自动机都无法识别。另一方面,我很确定 *a^nb^n* 不能用这种方式表达。

我有两个问题:

  1. 是否有关于该语言系列(UNIX-y 常规)的文献。特别是,是否有针对这些的泵送引理的版本?
  2. 有人可以证明或反驳 *a^nb^n* 不能这样表达吗?

Most UNIX regular expressions have, besides the usual **,+,?* operators a backslash operator where \1,\2,... match whatever's in the last parentheses, so for example *L=(a*)b\1* matches the (non regular) language *a^n b a^n*.

On one hand, this seems to be pretty powerful since you can create (a*)b\1b\1 to match the language *a^n b a^n b a^n* which can't even be recognized by a stack automaton. On the other hand, I'm pretty sure *a^n b^n* cannot be expressed this way.

I have two questions:

  1. Is there any literature on this family of languages (UNIX-y regular). In particular, is there a version of the pumping lemma for these?
  2. Can someone prove, or disprove, that *a^n b^n* cannot be expressed this way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

情域 2024-09-04 06:23:44

您可能正在寻找

当然,向前和向后跟踪他们的引用以查找更多关于这个主题的文献。

You're probably looking for

and of course follow their citations forward and backward to find more literature on this subject.

苍风燃霜 2024-09-04 06:23:44

a^nb^n 是 CFL。语法是

A -> aAb | e

你可以使用 RL 的泵引理来证明 A 不是 RL

a^n b^n is CFL. The grammar is

A -> aAb | e

you can use pumping lemma for RL to prove A is not RL

雪落纷纷 2024-09-04 06:23:44

Ruby 1.9.1 支持以下正则表达式:

regex = %r{ (?<foo> a\g<foo>a | b\g<foo>b | c) }x

p regex.match("aaacbbb")
# the result is #<MatchData "c" foo:"c">

Fun with Ruby 1.9 Regular Expressions" 有一个示例,他实际上排列了正则表达式的所有部分,使其看起来像上下文无关语法,如下所示:

sentence = %r{ 
    (?<subject>   cat   | dog   | gerbil    ){0} 
    (?<verb>      eats  | drinks| generates ){0} 
    (?<object>    water | bones | PDFs      ){0} 
    (?<adjective> big   | small | smelly    ){0} 

    (?<opt_adj>   (\g<adjective>\s)?     ){0} 

    The\s\g<opt_adj>\g<subject>\s\g<verb>\s\g<opt_adj>\g<object> 
}x

我认为这意味着至少 Ruby 1.9 .1 的正则表达式引擎(即 Oniguruma 正则表达式引擎)实际上相当于上下文无关语法,尽管捕获组不如实际的解析器生成器那么有用。

这意味着“上下文无关语言的泵引理”应该描述语言类别可由 Ruby 1.9.1 的正则表达式引擎识别。

编辑:哎呀!我搞砸了,没有做重要的测试,这实际上使我上面的答案完全错误。我不会删除答案,因为它仍然是有用的信息。

regex = %r{\A(?<foo> a\g<foo>a | b\g<foo>b | c)\Z}x
#I added anchors for the beginning and end of the string
regex.match("aaacbbb")
#returns nil, indicating that no match is possible with recursive capturing groups.

编辑:几个月后回到这里,我发现我在上次编辑中的测试是不正确的。即使 regex 确实像上下文无关语法一样运行,也不应该期望 "aaacbbb"regex 匹配。

正确的测试应该是在像 "aabcbaa" 这样的字符串上,并且它与正则表达式匹配:

regex = %r{\A(?<foo> a\g<foo>a | b\g<foo>b | c)\Z}x
regex.match("aaacaaa")
# => #<MatchData "aaacaaa" foo:"aaacaaa">
regex.match("aacaa")
# => #<MatchData "aacaa" foo:"aacaa">
regex.match("aabcbaa")
# => #<MatchData "aabcbaa" foo:"aabcbaa">

Ruby 1.9.1 supports the following regex:

regex = %r{ (?<foo> a\g<foo>a | b\g<foo>b | c) }x

p regex.match("aaacbbb")
# the result is #<MatchData "c" foo:"c">

"Fun with Ruby 1.9 Regular Expressions" has an example where he actually arranges all the parts of a regex so that it looks like a context-free grammar as follows:

sentence = %r{ 
    (?<subject>   cat   | dog   | gerbil    ){0} 
    (?<verb>      eats  | drinks| generates ){0} 
    (?<object>    water | bones | PDFs      ){0} 
    (?<adjective> big   | small | smelly    ){0} 

    (?<opt_adj>   (\g<adjective>\s)?     ){0} 

    The\s\g<opt_adj>\g<subject>\s\g<verb>\s\g<opt_adj>\g<object> 
}x

I think this means that at least Ruby 1.9.1's regex engine, which is the Oniguruma regex engine, is actually equivalent to a context-free grammar, though the capturing groups aren't as useful as an actual parser-generator.

This means that "Pumping lemma for context-free languages" should describe the class of languages recognizable by Ruby 1.9.1's regex engine.

EDIT: Whoops! I messed up, and didn't do an important test which actually makes my answer above totally wrong. I won't delete the answer, because it's useful information nonetheless.

regex = %r{\A(?<foo> a\g<foo>a | b\g<foo>b | c)\Z}x
#I added anchors for the beginning and end of the string
regex.match("aaacbbb")
#returns nil, indicating that no match is possible with recursive capturing groups.

EDIT: Coming back to this many months later, I just discovered that my test in the last edit was incorrect. "aaacbbb" shouldn't be expected to match regex even if regex does operate like a context-free grammar.

The correct test should be on a string like "aabcbaa", and that does match the regex:

regex = %r{\A(?<foo> a\g<foo>a | b\g<foo>b | c)\Z}x
regex.match("aaacaaa")
# => #<MatchData "aaacaaa" foo:"aaacaaa">
regex.match("aacaa")
# => #<MatchData "aacaa" foo:"aacaa">
regex.match("aabcbaa")
# => #<MatchData "aabcbaa" foo:"aabcbaa">
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文