为什么在线解析器似乎停在正则表达式处?

发布于 2024-07-18 00:55:56 字数 752 浏览 5 评论 0原文

我一直想知道为什么似乎没有任何解析器,例如 BNF,其行为类似于各种库中的正则表达式。

当然,还有像 ANTLRYacc 和许多其他生成代码,而这些代码又可以解析 CFG,但似乎没有一个库可以在没有中间步骤的情况下做到这一点。

我有兴趣编写一个 Packrat 解析器,以启动所有这些嵌套括号-与正则表达式相关的怪癖(也许更重要的是,为了它的运动),但不知何故,我有一种感觉,我只是走进了另一个停滞的问题 - 就像一类沼​​泽。

这些解析器是否存在技术/理论限制,或者我只是错过了一些东西?

I've been wondering for long why there doesn't seem to be any parsers for, say, BNF, that behave like regexps in various libraries.

Sure, there's things like ANTLR, Yacc and many others that generate code which, in turn, can parse a CFG, but there doesn't seem to be a library that can do that without the intermediate step.

I'm interested in writing a Packrat parser, to boot all those nested-parenthesis-quirks associated with regexps (and, perhaps even more so, for the sport of it), but somehow I have this feeling that I'm just walking into another halting problem -like class of swamps.

Is there a technical/theoretical limitation for these parsers, or am I just missing something?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

感情废物 2024-07-25 00:55:56

我认为这更多是一种文化问题。 上下文无关语法的使用主要限于编译器,编译器通常具有与每个产生规则相关联的代码。 在某些语言中,输出代码比模拟回调更容易。 在其他情况下,您将看到解析器库:例如 Haskell 中的解析器组合器。 另一方面,正则表达式在 grep 等工具中得到广泛使用,每次用户给出新的正则表达式时都运行 C 编译器很不方便。

I think it's more of a cultural thing. The use of context-free grammars is mostly confined to compilers, which typically have code associated with each production rule. In some languages, it's easier to output code than to simulate callbacks. In others, you'll see parser libraries: parser combinators in Haskell, for example. On the other hand, regular expressions see wide use in tools like grep, where it's inconvenient to run the C compiler every time the user gives a new regular expression.

余生一个溪 2024-07-25 00:55:56

Boost.Spirit 看起来就像你所追求的。

如果您想自己制作,我使用了 BNFC< /a> 对于我最新的编译器项目,它提供 在其自己的实现中使用的语法。 这可能是一个很好的起点......

Boost.Spirit looks like what you are after.

If you are looking to make your own, I've used BNFC for my latest compiler project and it provides the grammar used in its own implementation. This might be a good starting point...

断舍离 2024-07-25 00:55:56

不存在隐藏在阴影中的技术/理论限制。 我不能说为什么它们不更受欢迎,但我知道至少有一个库可以提供您所寻求的这种“在线”解析。

SimpleParse 是一个 Python 库,可让您简单地将多毛的 EBNF 语法粘贴到程序中并使用它来解析内容立即进行,无需任何中间步骤。 我已经将它用于多个项目,在这些项目中我想要自定义输入语言,但实际上不想致力于任何正式的构建过程。

这是我脑海中浮现的一个小例子:

decl = r"""
    root := expr
    expr := term, ("|", term)*
    term := factor+
    factor := ("(" expr ")") / [a-z]
"""
parser = Parser(decl) 
success, trees, next = parser.parse("(a(b|def)|c)def")

Haskell 和 Scala 的解析器组合器库还允许您在使用它的同一代码块中表达解析器的语法。 但是,您不能让用户在运行时输入语法(这可能只有制作软件来帮助人们理解语法的人感兴趣)。

There isn't and technical/theoretical limitation lurking in the shadows. I can't say why they aren't more popular, but I know of at least one library that provides this sort of "on-line" parsing that you seek.

SimpleParse is a python library that lets you simply paste your hairy EBNF grammar into your program and use it to parse things right away, no itermediate steps. I've used it for several projects where I wanted a custom input language but really didn't want to commit to any formal build process.

Here's a tiny example off the top of my head:

decl = r"""
    root := expr
    expr := term, ("|", term)*
    term := factor+
    factor := ("(" expr ")") / [a-z]
"""
parser = Parser(decl) 
success, trees, next = parser.parse("(a(b|def)|c)def")

The parser combinator libraries for Haskell and Scala also let your express your the grammar for your parser in the same chunk of code that uses it. However you can't, say, let the user type in a grammar at runtime (which might only be of interest to people making software to help people understand grammars anyway).

打小就很酷 2024-07-25 00:55:56

Pyparsing (http://pyparsing.wikispaces.com) 内置了对 Packrat 解析的支持,而且它是纯粹的Python,这样你就可以看到实际的实现。

Pyparsing (http://pyparsing.wikispaces.com) has built-in support for packrat parsing and it is pure Python, so you can see the actual implementation.

爱*していゐ 2024-07-25 00:55:56

因为成熟的上下文无关语法已经足够令人困惑了,因为它们没有一些神秘而密集且难以理解的语法来使它们更加令人困惑?

很难知道你在问什么。 您是否正在尝试创建类似于正则表达式的东西,但用于上下文无关语法? 例如,使用 $var =~ /expr = expr + expr/ (在 Perl 中)并匹配 "1 + 1""1 + 1 + 1”“1 + 1 + 1 + 1 + 1 + ...”? 我认为这样做的局限性之一是语法:拥有超过三个规则将使您的“语法表达式”比任何现代正则表达式更难读。

Because full-blown context-free grammars are confusing enough as they are without some cryptically dense and incomprehensible syntax to make them even more confusing?

It's hard to know what you're asking. Are you trying to create something like a regular expression, but for context-free grammars? Like, using $var =~ /expr = expr + expr/ (in Perl) and having that match "1 + 1" or "1 + 1 + 1" or "1 + 1 + 1 + 1 + 1 + ..."? I think one of the limitations of this is going to be syntax: Having more than about three rules is going to make your "grammar-expression" even more unreadable than any modern-day regular expression.

乖乖 2024-07-25 00:55:56

副作用是我看到的唯一能让你受益的东西。 大多数解析器生成器都包含用于处理的嵌入式代码,您需要一个 eval 才能使其工作。

解决这个问题的一种方法是命名操作,然后创建一个“操作”函数,该函数采用要执行的操作的名称和执行该操作的参数。

Side effect are the only thing I see thing that will get you. Most of the parser generators include embedded code for processing and you would need an eval to make that work.

One way around that would be to name actions and then make an "action" function that takes the name of the action to do and the args to do it with.

朱染 2024-07-25 00:55:56

理论上你可以用 C++ 中的 Boost Spirit 来做到这一点,但它主要是为静态语法而设计的。 我认为这种情况不常见的原因是 CFG 不像正则表达式那样常用。 除了编译器构造之外,我从未使用过语法,但我多次使用过正则表达式。 CFG 通常比正则表达式复杂得多,因此使用 YACC 或 ANTLR 等工具静态生成代码是有意义的。

You could theoretically do it with Boost Spirit in C++, but it is mainly made for static grammars. I think the reason this is not common is that CFGs are not as commonly used as regexs. I've never had to use a grammar except for compiler construction, but I have used regexs many times. CFGs are generally much more complex than regexs, so it makes sense to generate code statically with a tool like YACC or ANTLR.

内心旳酸楚 2024-07-25 00:55:56

tcllib 有类似的东西,如果你能忍受 解析表达式语法 以及 TCL。 如果你喜欢 Perl,CPAN 有 Parse::Earley这里是一个看起来很有前途的纯 Perl 变体。 PLY 似乎是 Python 的一个可行的解决方案

tcllib has something like that, if you can put up with Parse Expression Grammars and also TCL. If Perl is your thing CPAN has Parse::Earley. Here's a pure Perl variation which looks promising. PLY seems to be a plausible solution for Python

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文