如何使用解析器组合器进行条件检查

发布于 2024-11-17 01:48:27 字数 927 浏览 2 评论 0原文

我试图编写一个简单的 html 模板引擎(为了好玩),并且想要解析这样的结构

A. 普通行是 HTML

B. 如果一行以 $ 开头,则将其视为 java 代码 C行

$ if (isSuper) {
    <span>Are you wearing red underwear?</span>
$ }

。如果 ${} 包含多行,则其中的所有代码都应该是 java 代码。

D. 如果一行以 $include 开头,则对该行执行一些技巧(调用另一个模板),

$include anotherTemplate(id, name)

这将创建 anotherTemplate 的新实例,并将其称为 E.render()方法

,除了$include之外还会有更多的“命令”,比如$def$val >。

我如何在解析器组合器中表达这一点?实际上,它是

1. 和 2. 的条件分叉,我得到了这样的结果:

'$' ~> ( '{' ~> upto('}') <~ '}' |  not('{') <~ newline )

其中 upto 是从 Scalate Scamel 解析器借用的(我刚刚开始阅读但不太理解)

我使用 not('{') 来区分 $.... 代码行和 ${...} 块。但这很麻烦,并且不会扩展到其他“命令”

那么我该怎么做呢?

I was trying to write a simple html template engine (for fun), and wanna parse a structure like this

A. normal lines are HTML

B. if a line starts with $ then view it as a java code line

$ if (isSuper) {
    <span>Are you wearing red underwear?</span>
$ }

C. if ${} wraps multiple lines, all code in it should be java code.

D. if a line starts with $include then do some trick on the line (call another template)

$include anotherTemplate(id, name)

this will create a new instance of anotherTemplate, and call it's render() method

E. and there would be more "commands" other than $include, such as $def, $val.

How can I express this in parser combinators? In effect it is a conditional fork

for 1. and 2., I got something like this:

'

where upto is borrowed from Scalate Scamel parser (which I just start to read and can't quite understand)

I used not('{') to distinguish $.... code line with ${...} block. But this is cumbersome, and won't extend to other "commands"

So How can I do this?

~> ( '{' ~> upto('}') <~ '}' | not('{') <~ newline )

where upto is borrowed from Scalate Scamel parser (which I just start to read and can't quite understand)

I used not('{') to distinguish $.... code line with ${...} block. But this is cumbersome, and won't extend to other "commands"

So How can I do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

不羁少年 2024-11-24 01:48:27

您使用 not 是多余的。 | 方法实现有序选择;仅当第一件事失败时才会尝试第二件事。这应该可以解决问题:

def directive: Parser[Directive] =
  ( '

为了更快地解析和更好的错误消息,您应该尽可能频繁地“提交”解析器。我认为这就是您在使用 not('{') 时想要达到的目的。

现在,如果上述解析器看到 '$' 后跟 '{',然后没有看到 javaStuff< /code>,它将回溯并按顺序考虑剩余的四个 '$'-替代项(includedefval,最后javaDirective),然后回溯到 '$' 之前尝试 htmlDirective,然后失败并出现令人费解的错误消息。但是,如果我们看到 '{',我们就知道其他替代方案都不可能成功,那么我们为什么要检查它们呢?同样,以 '$' 开头的行永远不可能是 htmlDirective

我们希望像 '{' 这样的东西成为没有回头路的点;如果 after-'{' 解析器失败并想要回溯,我们应该停止它,并将导致回溯的失败作为错误直接传播给用户。

执行此操作的方法是使用commit。此函数/组合器在应用于解析器 p 时,会查看来自 pParseResult 并将其更改为 Error (完全放弃信号),如果它最初是一个Failure(回溯信号),否则保持不变。通过适当使用 commitdirective 解析器将变为:

def directive: Parser[Directive] =
  ( '

当我第一次学习使用解析库时,我发现查看 源代码解析器;它使其中一些内容变得更加清晰。

(其他一些提示:appendParseResult#append 的目的是决定应将解析替代序列中的哪些失败传播给用户。只需忽略这些另外,在您进行更多练习之前,我不会太担心 >>/flatMap/into ;到时候,请阅读Daniel Sobral 的解释。最后,我从来没有使用过 |||,你可能赢了。也不是。快乐解析!)

希望这有帮助。

~> ( '{' ~> javaStuff <~ '}' | "include" ~> includeDirective | "def" ~> defDirective | "val" ~> valDirective | javaDirective ) | htmlDirective ) def templateFile: Parser[List[Directive]] = (directive <~ '\n').*

为了更快地解析和更好的错误消息,您应该尽可能频繁地“提交”解析器。我认为这就是您在使用 not('{') 时想要达到的目的。

现在,如果上述解析器看到 '$' 后跟 '{',然后没有看到 javaStuff< /code>,它将回溯并按顺序考虑剩余的四个 '$'-替代项(includedefval,最后javaDirective),然后回溯到 '$' 之前尝试 htmlDirective,然后失败并出现令人费解的错误消息。但是,如果我们看到 '{',我们就知道其他替代方案都不可能成功,那么我们为什么要检查它们呢?同样,以 '$' 开头的行永远不可能是 htmlDirective

我们希望像 '{' 这样的东西成为没有回头路的点;如果 after-'{' 解析器失败并想要回溯,我们应该停止它,并将导致回溯的失败作为错误直接传播给用户。

执行此操作的方法是使用commit。此函数/组合器在应用于解析器 p 时,会查看来自 pParseResult 并将其更改为 Error (完全放弃信号),如果它最初是一个Failure(回溯信号),否则保持不变。通过适当使用 commitdirective 解析器将变为:


当我第一次学习使用解析库时,我发现查看 源代码解析器;它使其中一些内容变得更加清晰。

(其他一些提示:appendParseResult#append 的目的是决定应将解析替代序列中的哪些失败传播给用户。只需忽略这些另外,在您进行更多练习之前,我不会太担心 >>/flatMap/into ;到时候,请阅读Daniel Sobral 的解释。最后,我从来没有使用过 |||,你可能赢了。也不是。快乐解析!)

希望这有帮助。

~> commit( '{' ~> commit(javaStuff <~ '}') | "include" ~> commit(includeDirective) | "def" ~> commit(defDirective) | "val" ~> commit(valDirective | javaDirective ) | htmlDirective )

当我第一次学习使用解析库时,我发现查看 源代码解析器;它使其中一些内容变得更加清晰。

(其他一些提示:appendParseResult#append 的目的是决定应将解析替代序列中的哪些失败传播给用户。只需忽略这些另外,在您进行更多练习之前,我不会太担心 >>/flatMap/into ;到时候,请阅读Daniel Sobral 的解释。最后,我从来没有使用过 |||,你可能赢了。也不是。快乐解析!)

希望这有帮助。

~> ( '{' ~> javaStuff <~ '}' | "include" ~> includeDirective | "def" ~> defDirective | "val" ~> valDirective | javaDirective ) | htmlDirective ) def templateFile: Parser[List[Directive]] = (directive <~ '\n').*

为了更快地解析和更好的错误消息,您应该尽可能频繁地“提交”解析器。我认为这就是您在使用 not('{') 时想要达到的目的。

现在,如果上述解析器看到 '$' 后跟 '{',然后没有看到 javaStuff< /code>,它将回溯并按顺序考虑剩余的四个 '$'-替代项(includedefval,最后javaDirective),然后回溯到 '$' 之前尝试 htmlDirective,然后失败并出现令人费解的错误消息。但是,如果我们看到 '{',我们就知道其他替代方案都不可能成功,那么我们为什么要检查它们呢?同样,以 '$' 开头的行永远不可能是 htmlDirective

我们希望像 '{' 这样的东西成为没有回头路的点;如果 after-'{' 解析器失败并想要回溯,我们应该停止它,并将导致回溯的失败作为错误直接传播给用户。

执行此操作的方法是使用commit。此函数/组合器在应用于解析器 p 时,会查看来自 pParseResult 并将其更改为 Error(完全放弃信号),如果它最初是一个Failure(回溯信号),否则保持不变。通过适当使用 commitdirective 解析器将变为:

当我第一次学习使用解析库时,我发现查看 源代码解析器;它使其中一些内容变得更加清晰。

(其他一些提示:appendParseResult#append 的目的是决定应将解析替代序列中的哪些失败传播给用户。只需忽略这些另外,在您进行更多练习之前,我不会太担心 >>/flatMap/into ;到时候,请阅读Daniel Sobral 的解释。最后,我从来没有使用过 |||,你可能赢了。也不是。快乐解析!)

希望这有帮助。

Your use of not is redundant. The | method implements ordered choice; the second thing is tried only if the first has failed. This should do the trick:

def directive: Parser[Directive] =
  ( '

For faster parsing and better error messages, you should "commit" your parsers as often as possible. I think this is what you were trying to get at when you used not('{').

Right now, if the above parser sees a '$' followed by a '{' and then doesn't see javaStuff, it'll backtrack and consider each of the four remaining '$'-alternatives in order (include, def, val, and finally javaDirective), and then backtrack to before '$' to try htmlDirective, before failing with a baffling error message. But if we see a '{', we know that none of the other alternatives could possibly succeed, so why should we check them? Likewise, a line that starts with '$' can never be an htmlDirective.

We want things like '{' to be points of no backtrack; if the after-'{' parser fails and wants to backtrack, we should stop it in its tracks and propagate the backtrack-causing failure directly to the user as an error.

The way to do this is with commit. This function/combinator, when applied to a parser p, looks at the ParseResult coming out of p and changes it to an Error (the give-up-entirely signal) if it was originally a Failure (the backtrack signal), leaving it unchanged otherwise. With appropriate use of commit, the directive parser becomes:

def directive: Parser[Directive] =
  ( '

When I first learned to use the parsing library, I found it really helpful to look at the source code for Parsers; it makes some of this stuff a bit more clear.

(Some other tips: The purpose of append and ParseResult#append is to decide which failure from a sequence of parse-alternatives should be propagated to the user. Just ignore those for now. Also, I wouldn't worry too much about >>/flatMap/into until you've gotten some more practice; when it's time, read Daniel Sobral's explanation. Finally, I've never had to use |||, and you probably won't either. Happy parsing!)

Hope this helps.

~> ( '{' ~> javaStuff <~ '}' | "include" ~> includeDirective | "def" ~> defDirective | "val" ~> valDirective | javaDirective ) | htmlDirective ) def templateFile: Parser[List[Directive]] = (directive <~ '\n').*

For faster parsing and better error messages, you should "commit" your parsers as often as possible. I think this is what you were trying to get at when you used not('{').

Right now, if the above parser sees a '$' followed by a '{' and then doesn't see javaStuff, it'll backtrack and consider each of the four remaining '$'-alternatives in order (include, def, val, and finally javaDirective), and then backtrack to before '$' to try htmlDirective, before failing with a baffling error message. But if we see a '{', we know that none of the other alternatives could possibly succeed, so why should we check them? Likewise, a line that starts with '$' can never be an htmlDirective.

We want things like '{' to be points of no backtrack; if the after-'{' parser fails and wants to backtrack, we should stop it in its tracks and propagate the backtrack-causing failure directly to the user as an error.

The way to do this is with commit. This function/combinator, when applied to a parser p, looks at the ParseResult coming out of p and changes it to an Error (the give-up-entirely signal) if it was originally a Failure (the backtrack signal), leaving it unchanged otherwise. With appropriate use of commit, the directive parser becomes:


When I first learned to use the parsing library, I found it really helpful to look at the source code for Parsers; it makes some of this stuff a bit more clear.

(Some other tips: The purpose of append and ParseResult#append is to decide which failure from a sequence of parse-alternatives should be propagated to the user. Just ignore those for now. Also, I wouldn't worry too much about >>/flatMap/into until you've gotten some more practice; when it's time, read Daniel Sobral's explanation. Finally, I've never had to use |||, and you probably won't either. Happy parsing!)

Hope this helps.

~> commit( '{' ~> commit(javaStuff <~ '}') | "include" ~> commit(includeDirective) | "def" ~> commit(defDirective) | "val" ~> commit(valDirective | javaDirective ) | htmlDirective )

When I first learned to use the parsing library, I found it really helpful to look at the source code for Parsers; it makes some of this stuff a bit more clear.

(Some other tips: The purpose of append and ParseResult#append is to decide which failure from a sequence of parse-alternatives should be propagated to the user. Just ignore those for now. Also, I wouldn't worry too much about >>/flatMap/into until you've gotten some more practice; when it's time, read Daniel Sobral's explanation. Finally, I've never had to use |||, and you probably won't either. Happy parsing!)

Hope this helps.

~> ( '{' ~> javaStuff <~ '}' | "include" ~> includeDirective | "def" ~> defDirective | "val" ~> valDirective | javaDirective ) | htmlDirective ) def templateFile: Parser[List[Directive]] = (directive <~ '\n').*

For faster parsing and better error messages, you should "commit" your parsers as often as possible. I think this is what you were trying to get at when you used not('{').

Right now, if the above parser sees a '$' followed by a '{' and then doesn't see javaStuff, it'll backtrack and consider each of the four remaining '$'-alternatives in order (include, def, val, and finally javaDirective), and then backtrack to before '$' to try htmlDirective, before failing with a baffling error message. But if we see a '{', we know that none of the other alternatives could possibly succeed, so why should we check them? Likewise, a line that starts with '$' can never be an htmlDirective.

We want things like '{' to be points of no backtrack; if the after-'{' parser fails and wants to backtrack, we should stop it in its tracks and propagate the backtrack-causing failure directly to the user as an error.

The way to do this is with commit. This function/combinator, when applied to a parser p, looks at the ParseResult coming out of p and changes it to an Error (the give-up-entirely signal) if it was originally a Failure (the backtrack signal), leaving it unchanged otherwise. With appropriate use of commit, the directive parser becomes:

When I first learned to use the parsing library, I found it really helpful to look at the source code for Parsers; it makes some of this stuff a bit more clear.

(Some other tips: The purpose of append and ParseResult#append is to decide which failure from a sequence of parse-alternatives should be propagated to the user. Just ignore those for now. Also, I wouldn't worry too much about >>/flatMap/into until you've gotten some more practice; when it's time, read Daniel Sobral's explanation. Finally, I've never had to use |||, and you probably won't either. Happy parsing!)

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文