在快乐/无限循环中嵌套解析器?

发布于 2024-10-10 04:49:36 字数 1674 浏览 13 评论 0原文

我正在尝试为简单的标记语言编写一个解析器,很高兴。目前,我遇到了无限循环和嵌套元素的一些问题。

我的标记语言基本上由两个元素组成,一个用于“普通”文本,一个用于粗体/强调文本。

data Markup
    = MarkupText   String
    | MarkupEmph   [Markup]

例如,像 Foo *bar* 这样的文本应该被解析为 [MarkupText "Foo ", MarkupEmph [MarkupText "bar"]]

该示例的词法分析工作正常,但解析它会导致无限循环 - 我不明白为什么。这是我目前的方法:

-- The main parser: Parsing a list of "Markup"
Markups     :: { [Markup] }
            : Markups Markup                    { $1 ++ [$2] }
            | Markup                            { [$1]       }

-- One single markup element
Markup      :: { Markup }
            : '*' Markups1 '*'                  { MarkupEmph $2 }
            | Markup1                           { $1            }

-- The nested list inside *..*
Markups1    :: { [Markup] }
            : Markups1 Markup1                  { $1 ++ [$2] }
            | Markup1                           { [$1]       }

-- Markup which is always available:
Markup1     :: { Markup }
            : String                            { MarkupText $1 }

这种方法有什么问题?怎么解决呢?

更新:抱歉。 Lexing 没有按预期工作。无限循环位于词法分析器内部。对不起。 :)

更新 2: 根据请求,我将其用作词法分析器:

lexer :: String -> [Token]
lexer [] = []
lexer str@(c:cs)

    | c == '*'              = TokenSymbol "*"   : lexer cs
    -- ...more rules...
    | otherwise             = TokenString val   : lexer rest

  where (val, rest) = span isValidChar str
        isValidChar = (/= '*')

发生无限递归是因为我使用 lexer str 而不是 lexer cs'*' 的第一条规则中。没有看到它,因为我的实际代码有点复杂。 :)

I'm trying to write a parser for a simple markup language with happy. Currently, I'm having some issues with infinit loops and nested elements.

My markup language basicly consists of two elements, one for "normal" text and one for bold/emphasized text.

data Markup
    = MarkupText   String
    | MarkupEmph   [Markup]

For example, a text like Foo *bar* should get parsed as [MarkupText "Foo ", MarkupEmph [MarkupText "bar"]].

Lexing of that example works fine, but the parsing it results in an infinite loop - and I can't see why. This is my current approach:

-- The main parser: Parsing a list of "Markup"
Markups     :: { [Markup] }
            : Markups Markup                    { $1 ++ [$2] }
            | Markup                            { [$1]       }

-- One single markup element
Markup      :: { Markup }
            : '*' Markups1 '*'                  { MarkupEmph $2 }
            | Markup1                           { $1            }

-- The nested list inside *..*
Markups1    :: { [Markup] }
            : Markups1 Markup1                  { $1 ++ [$2] }
            | Markup1                           { [$1]       }

-- Markup which is always available:
Markup1     :: { Markup }
            : String                            { MarkupText $1 }

What's wrong with that approach? How could the be resolved?

Update: Sorry. Lexing wasn't working as expected. The infinit loop was inside the lexer. Sorry. :)

Update 2: On request, I'm using this as lexer:

lexer :: String -> [Token]
lexer [] = []
lexer str@(c:cs)

    | c == '*'              = TokenSymbol "*"   : lexer cs
    -- ...more rules...
    | otherwise             = TokenString val   : lexer rest

  where (val, rest) = span isValidChar str
        isValidChar = (/= '*')

The infinit recursion occured because I had lexer str instead of lexer cs in that first rule for '*'. Didn't see it because my actual code was a bit more complex. :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

请止步禁区 2024-10-17 04:49:36

只是一个警告,自从我处理解析器生成器以来已经有一段时间了。

看起来你需要一个 LR(1) 解析器,但我不确定 Happy 是不是。我确信一旦我写下这篇文章,就会有人能够纠正我。

如果您的解析器无法向前查看,它将永远停留在该语句上。

Markups1    :: { [Markup] }
        : Markups1 Markup1 
        | Markup1

它将查找 Markups1,而 Markups1 又会查找 Markups1。我能猜测到的最好的情况是,它不会对 Markup1 进行前瞻性检查以查看它是否是字符串。

尝试像这样重写它

Markups1    :: { [Markup] }
        : Markup1 Markups1
        | 

本质上你希望它首先找到该字符串,然后尝试查找另一个字符串,如果没有找到它需要结束该语句。

Just a warning it has been a while since I've dealt with parser generators.

It would appear you need a LR(1) parser which im not sure Happy is. I am positive once I write this someone will be able to correct me.

If your parser can't look ahead it will be stuck on this statement forever

Markups1    :: { [Markup] }
        : Markups1 Markup1 
        | Markup1

It will look for a Markups1, which in turn looks for a Markups1. Best I can guess, it isnt peforming a look ahead to Markup1 to see if it is a string.

Try rewriting it like this

Markups1    :: { [Markup] }
        : Markup1 Markups1
        | 

Essentially you want it to find the string first, then try to look for another string, if it doesn't find one it needs to end that statement.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文