如何使用解析器组合器进行条件检查

发布于 2024-11-17 01:48:27 字数 927 浏览 2 评论 0原文

我试图编写一个简单的 html 模板引擎（为了好玩），并且想要解析这样的结构

A. 普通行是 HTML

B. 如果一行以 $ 开头，则将其视为 java 代码 C行

$ if (isSuper) {
    <span>Are you wearing red underwear?</span>
$ }

。如果 ${} 包含多行，则其中的所有代码都应该是 java 代码。

D. 如果一行以 $include 开头，则对该行执行一些技巧（调用另一个模板），

$include anotherTemplate(id, name)

这将创建 anotherTemplate 的新实例，并将其称为 E.render()方法

，除了$include之外还会有更多的“命令”，比如$def、$val >。

我如何在解析器组合器中表达这一点？实际上，它是

1. 和 2. 的条件分叉，我得到了这样的结果：

'$' ~> ( '{' ~> upto('}') <~ '}' |  not('{') <~ newline )

其中 upto 是从 Scalate Scamel 解析器借用的（我刚刚开始阅读但不太理解）

我使用 not('{') 来区分 $.... 代码行和 ${...} 块。但这很麻烦，并且不会扩展到其他“命令”

那么我该怎么做呢？

原文

I was trying to write a simple html template engine (for fun), and wanna parse a structure like this

A. normal lines are HTML

B. if a line starts with $ then view it as a java code line

$ if (isSuper) {
    <span>Are you wearing red underwear?</span>
$ }

C. if ${} wraps multiple lines, all code in it should be java code.

D. if a line starts with $include then do some trick on the line (call another template)

$include anotherTemplate(id, name)

this will create a new instance of anotherTemplate, and call it's render() method

E. and there would be more "commands" other than $include, such as $def, $val.

How can I express this in parser combinators? In effect it is a conditional fork

for 1. and 2., I got something like this:

'
where upto is borrowed from Scalate Scamel parser (which I just start to read and can't quite understand)
I used not('{') to distinguish $.... code line with ${...} block. But this is cumbersome, and won't extend to other "commands"
So How can I do this?
 ~> ( '{' ~> upto('}') <~ '}' |  not('{') <~ newline )

where upto is borrowed from Scalate Scamel parser (which I just start to read and can't quite understand)

I used not('{') to distinguish $.... code line with ${...} block. But this is cumbersome, and won't extend to other "commands"

So How can I do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不羁少年 2024-11-24 01:48:27

您使用 not 是多余的。 | 方法实现有序选择；仅当第一件事失败时才会尝试第二件事。这应该可以解决问题：

def directive: Parser[Directive] =
  ( '
为了更快地解析和更好的错误消息，您应该尽可能频繁地“提交”解析器。我认为这就是您在使用 not('{') 时想要达到的目的。
现在，如果上述解析器看到 '$' 后跟 '{'，然后没有看到 javaStuff< /code>，它将回溯并按顺序考虑剩余的四个 '$'-替代项（include、def、val，最后javaDirective)，然后回溯到 '$' 之前尝试 htmlDirective，然后失败并出现令人费解的错误消息。但是，如果我们看到 '{'，我们就知道其他替代方案都不可能成功，那么我们为什么要检查它们呢？同样，以 '$' 开头的行永远不可能是 htmlDirective。
我们希望像 '{' 这样的东西成为没有回头路的点；如果 after-'{' 解析器失败并想要回溯，我们应该停止它，并将导致回溯的失败作为错误直接传播给用户。
执行此操作的方法是使用commit。此函数/组合器在应用于解析器 p 时，会查看来自 p 的 ParseResult 并将其更改为 Error （完全放弃信号），如果它最初是一个Failure（回溯信号），否则保持不变。通过适当使用 commit，directive 解析器将变为：
def directive: Parser[Directive] =
  ( '
当我第一次学习使用解析库时，我发现查看 源代码解析器；它使其中一些内容变得更加清晰。
 （其他一些提示：append 和 ParseResult#append 的目的是决定应将解析替代序列中的哪些失败传播给用户。只需忽略这些另外，在您进行更多练习之前，我不会太担心 >>/flatMap/into ；到时候，请阅读Daniel Sobral 的解释。最后，我从来没有使用过 |||，你可能赢了。也不是。快乐解析！）
希望这有帮助。
 ~>
    ( '{' ~> javaStuff <~ '}'
    | "include" ~> includeDirective
    | "def"     ~> defDirective
    | "val"     ~> valDirective
    | javaDirective
    )
  | htmlDirective
  )

def templateFile: Parser[List[Directive]] = (directive <~ '\n').*

为了更快地解析和更好的错误消息，您应该尽可能频繁地“提交”解析器。我认为这就是您在使用 not('{') 时想要达到的目的。
现在，如果上述解析器看到 '$' 后跟 '{'，然后没有看到 javaStuff< /code>，它将回溯并按顺序考虑剩余的四个 '$'-替代项（include、def、val，最后javaDirective)，然后回溯到 '$' 之前尝试 htmlDirective，然后失败并出现令人费解的错误消息。但是，如果我们看到 '{'，我们就知道其他替代方案都不可能成功，那么我们为什么要检查它们呢？同样，以 '$' 开头的行永远不可能是 htmlDirective。
我们希望像 '{' 这样的东西成为没有回头路的点；如果 after-'{' 解析器失败并想要回溯，我们应该停止它，并将导致回溯的失败作为错误直接传播给用户。
执行此操作的方法是使用commit。此函数/组合器在应用于解析器 p 时，会查看来自 p 的 ParseResult 并将其更改为 Error （完全放弃信号），如果它最初是一个Failure（回溯信号），否则保持不变。通过适当使用 commit，directive 解析器将变为：

当我第一次学习使用解析库时，我发现查看 源代码解析器；它使其中一些内容变得更加清晰。
 （其他一些提示：append 和 ParseResult#append 的目的是决定应将解析替代序列中的哪些失败传播给用户。只需忽略这些另外，在您进行更多练习之前，我不会太担心 >>/flatMap/into ；到时候，请阅读Daniel Sobral 的解释。最后，我从来没有使用过 |||，你可能赢了。也不是。快乐解析！）
希望这有帮助。
 ~> commit( '{' ~> commit(javaStuff <~ '}')
                 | "include" ~> commit(includeDirective)
                 | "def"     ~> commit(defDirective)
                 | "val"     ~> commit(valDirective
                 | javaDirective
                 )
  | htmlDirective
  )

当我第一次学习使用解析库时，我发现查看源代码解析器；它使其中一些内容变得更加清晰。

（其他一些提示：append 和 ParseResult#append 的目的是决定应将解析替代序列中的哪些失败传播给用户。只需忽略这些另外，在您进行更多练习之前，我不会太担心 >>/flatMap/into ；到时候，请阅读Daniel Sobral 的解释。最后，我从来没有使用过 |||，你可能赢了。也不是。快乐解析！）

希望这有帮助。

~> ( '{' ~> javaStuff <~ '}' | "include" ~> includeDirective | "def" ~> defDirective | "val" ~> valDirective | javaDirective ) | htmlDirective ) def templateFile: Parser[List[Directive]] = (directive <~ '\n').*

为了更快地解析和更好的错误消息，您应该尽可能频繁地“提交”解析器。我认为这就是您在使用 not('{') 时想要达到的目的。

现在，如果上述解析器看到 '$' 后跟 '{'，然后没有看到 javaStuff< /code>，它将回溯并按顺序考虑剩余的四个 '$'-替代项（include、def、val，最后javaDirective)，然后回溯到 '$' 之前尝试 htmlDirective，然后失败并出现令人费解的错误消息。但是，如果我们看到 '{'，我们就知道其他替代方案都不可能成功，那么我们为什么要检查它们呢？同样，以 '$' 开头的行永远不可能是 htmlDirective。

我们希望像 '{' 这样的东西成为没有回头路的点；如果 after-'{' 解析器失败并想要回溯，我们应该停止它，并将导致回溯的失败作为错误直接传播给用户。

执行此操作的方法是使用commit。此函数/组合器在应用于解析器 p 时，会查看来自 p 的 ParseResult 并将其更改为 Error（完全放弃信号），如果它最初是一个Failure（回溯信号），否则保持不变。通过适当使用 commit，directive 解析器将变为：

当我第一次学习使用解析库时，我发现查看源代码解析器；它使其中一些内容变得更加清晰。

希望这有帮助。

Your use of not is redundant. The | method implements ordered choice; the second thing is tried only if the first has failed. This should do the trick:

def directive: Parser[Directive] =
  ( '
For faster parsing and better error messages, you should "commit" your parsers as often as possible. I think this is what you were trying to get at when you used not('{').
Right now, if the above parser sees a '$' followed by a '{' and then doesn't see javaStuff, it'll backtrack and consider each of the four remaining '$'-alternatives in order (include, def, val, and finally javaDirective), and then backtrack to before '$' to try htmlDirective, before failing with a baffling error message. But if we see a '{', we know that none of the other alternatives could possibly succeed, so why should we check them? Likewise, a line that starts with '$' can never be an htmlDirective.
We want things like '{' to be points of no backtrack; if the after-'{' parser fails and wants to backtrack, we should stop it in its tracks and propagate the backtrack-causing failure directly to the user as an error.
The way to do this is with commit. This function/combinator, when applied to a parser p, looks at the ParseResult coming out of p and changes it to an Error (the give-up-entirely signal) if it was originally a Failure (the backtrack signal), leaving it unchanged otherwise. With appropriate use of commit, the directive parser becomes:
def directive: Parser[Directive] =
  ( '
When I first learned to use the parsing library, I found it really helpful to look at the source code for Parsers; it makes some of this stuff a bit more clear. 
(Some other tips: The purpose of append and ParseResult#append is to decide which failure from a sequence of parse-alternatives should be propagated to the user. Just ignore those for now. Also, I wouldn't worry too much about >>/flatMap/into until you've gotten some more practice; when it's time, read Daniel Sobral's explanation. Finally, I've never had to use |||, and you probably won't either. Happy parsing!)
Hope this helps.
 ~>
    ( '{' ~> javaStuff <~ '}'
    | "include" ~> includeDirective
    | "def"     ~> defDirective
    | "val"     ~> valDirective
    | javaDirective
    )
  | htmlDirective
  )

def templateFile: Parser[List[Directive]] = (directive <~ '\n').*

For faster parsing and better error messages, you should "commit" your parsers as often as possible. I think this is what you were trying to get at when you used not('{').
Right now, if the above parser sees a '$' followed by a '{' and then doesn't see javaStuff, it'll backtrack and consider each of the four remaining '$'-alternatives in order (include, def, val, and finally javaDirective), and then backtrack to before '$' to try htmlDirective, before failing with a baffling error message. But if we see a '{', we know that none of the other alternatives could possibly succeed, so why should we check them? Likewise, a line that starts with '$' can never be an htmlDirective.
We want things like '{' to be points of no backtrack; if the after-'{' parser fails and wants to backtrack, we should stop it in its tracks and propagate the backtrack-causing failure directly to the user as an error.
The way to do this is with commit. This function/combinator, when applied to a parser p, looks at the ParseResult coming out of p and changes it to an Error (the give-up-entirely signal) if it was originally a Failure (the backtrack signal), leaving it unchanged otherwise. With appropriate use of commit, the directive parser becomes:

When I first learned to use the parsing library, I found it really helpful to look at the source code for Parsers; it makes some of this stuff a bit more clear. 
(Some other tips: The purpose of append and ParseResult#append is to decide which failure from a sequence of parse-alternatives should be propagated to the user. Just ignore those for now. Also, I wouldn't worry too much about >>/flatMap/into until you've gotten some more practice; when it's time, read Daniel Sobral's explanation. Finally, I've never had to use |||, and you probably won't either. Happy parsing!)
Hope this helps.
 ~> commit( '{' ~> commit(javaStuff <~ '}')
                 | "include" ~> commit(includeDirective)
                 | "def"     ~> commit(defDirective)
                 | "val"     ~> commit(valDirective
                 | javaDirective
                 )
  | htmlDirective
  )

When I first learned to use the parsing library, I found it really helpful to look at the source code for Parsers; it makes some of this stuff a bit more clear.

(Some other tips: The purpose of append and ParseResult#append is to decide which failure from a sequence of parse-alternatives should be propagated to the user. Just ignore those for now. Also, I wouldn't worry too much about >>/flatMap/into until you've gotten some more practice; when it's time, read Daniel Sobral's explanation. Finally, I've never had to use |||, and you probably won't either. Happy parsing!)

Hope this helps.

For faster parsing and better error messages, you should "commit" your parsers as often as possible. I think this is what you were trying to get at when you used not('{').

Right now, if the above parser sees a '$' followed by a '{' and then doesn't see javaStuff, it'll backtrack and consider each of the four remaining '$'-alternatives in order (include, def, val, and finally javaDirective), and then backtrack to before '$' to try htmlDirective, before failing with a baffling error message. But if we see a '{', we know that none of the other alternatives could possibly succeed, so why should we check them? Likewise, a line that starts with '$' can never be an htmlDirective.

We want things like '{' to be points of no backtrack; if the after-'{' parser fails and wants to backtrack, we should stop it in its tracks and propagate the backtrack-causing failure directly to the user as an error.

The way to do this is with commit. This function/combinator, when applied to a parser p, looks at the ParseResult coming out of p and changes it to an Error (the give-up-entirely signal) if it was originally a Failure (the backtrack signal), leaving it unchanged otherwise. With appropriate use of commit, the directive parser becomes:

When I first learned to use the parsing library, I found it really helpful to look at the source code for Parsers; it makes some of this stuff a bit more clear.

Hope this helps.

回复收藏 0 原文

~没有更多了~