如何在 JavaCC 中处理文本块内的标记?

发布于 2024-10-06 17:49:53 字数 1007 浏览 7 评论 0原文

我有一个 DSL 的简单方面,我可以在其中定义一个键和一个值:

<前><代码>mykey=\ 这是我的$REF{有用} 多行 细绳 我以反斜杠终止 但我支持转义 \\ 字符 我希望处理这个字符串的值部分 在本例中为 3 个块。 \

在这个例子中我想要的三个标记(对于值部分)是

  • ValueLiteral == This is my
  • ValueReference == $REF{useful}
  • ValueLiteral == multiline 等等......

我为值定义了一个规则,如下所示:

void multiLineValue(): {} {
  < BACKSLASH >< EOL >
  (
    valuePartLiteralMulti() |
    valuePartRef()
  )*
  < BACKSLASH >
}

这里是我对多行字符串类型的 TOKEN 定义:

TOKEN :
{
     < MULTILINE_STRING:(  ( (~["\\"])
    | ("\\"
        ( ["\\", "'", "\"", "$", "n", "r", "t", "b", "f"]
        | ["u", "U"]["+"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]
        )
      ) ))+>
}

我的问题是我的多行字符串标记类型也消耗 '$REF{' 字符的字符序列。

我想修改这个多行字符串,以便它在遇到未转义的“$REF{”时停止消耗字符(但将继续读取“\$REF{”序列)。

任何帮助将不胜感激。

I have a simple aspect of a DSL where I can define a key and a value as such:

mykey=\
   This is my $REF{useful}
   multiline
   string
   where I terminate with a backslash
   but I support escaped \\ characters
   and I wish to handle the value part of this string
   as 3 blocks in this example.
\

The three tokens (for the value part) I would like in this example are

  • ValueLiteral == This is my
  • ValueReference == $REF{useful}
  • ValueLiteral == multiline etc....

I defined a rule for the value as such:

void multiLineValue(): {} {
  < BACKSLASH >< EOL >
  (
    valuePartLiteralMulti() |
    valuePartRef()
  )*
  < BACKSLASH >
}

Here is my TOKEN definition for the multiline string type:

TOKEN :
{
     < MULTILINE_STRING:(  ( (~["\\"])
    | ("\\"
        ( ["\\", "'", "\"", "$", "n", "r", "t", "b", "f"]
        | ["u", "U"]["+"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]
        )
      ) ))+>
}

My problem is that my multi line string token type also consumes the character sequence of the '$REF{' characters.

I would like to modify this multi-line string so that it will stop consuming characters when it encounters an unescaped "$REF{" (but will continue reading past a "\$REF{" sequence).

Any assistance would be much appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夏九 2024-10-13 17:49:53

我不确定,但在你的令牌定义中你还包括 $ (在 unicode 中?),也许你应该在开头添加 ~("$") (或 unicode 等效项)。

或者您可以使用语法 LOOKAHEAD,例如 LOOKAHEAD(valuePartRef())...

ps 您可以有多个 REF 吗?

I'm not sure, but in your token definition you also include $ (in unicode?), maybe you should add ~("$") (or the unicode equivalent) at the beginnig.

Or you can use syntatic LOOKAHEAD, something like LOOKAHEAD(valuePartRef())...

p.s. Can you have more than one REF?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文