如何在 JavaCC 中处理文本块内的标记?
我有一个 DSL 的简单方面,我可以在其中定义一个键和一个值:
<前><代码>mykey=\ 这是我的$REF{有用} 多行 细绳 我以反斜杠终止 但我支持转义 \\ 字符 我希望处理这个字符串的值部分 在本例中为 3 个块。 \
在这个例子中我想要的三个标记(对于值部分)是
- ValueLiteral == This is my
- ValueReference == $REF{useful}
- ValueLiteral == multiline 等等......
我为值定义了一个规则,如下所示:
void multiLineValue(): {} {
< BACKSLASH >< EOL >
(
valuePartLiteralMulti() |
valuePartRef()
)*
< BACKSLASH >
}
这里是我对多行字符串类型的 TOKEN 定义:
TOKEN :
{
< MULTILINE_STRING:( ( (~["\\"])
| ("\\"
( ["\\", "'", "\"", "$", "n", "r", "t", "b", "f"]
| ["u", "U"]["+"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]
)
) ))+>
}
我的问题是我的多行字符串标记类型也消耗 '$REF{' 字符的字符序列。
我想修改这个多行字符串,以便它在遇到未转义的“$REF{”时停止消耗字符(但将继续读取“\$REF{”序列)。
任何帮助将不胜感激。
I have a simple aspect of a DSL where I can define a key and a value as such:
mykey=\ This is my $REF{useful} multiline string where I terminate with a backslash but I support escaped \\ characters and I wish to handle the value part of this string as 3 blocks in this example. \
The three tokens (for the value part) I would like in this example are
- ValueLiteral == This is my
- ValueReference == $REF{useful}
- ValueLiteral == multiline etc....
I defined a rule for the value as such:
void multiLineValue(): {} {
< BACKSLASH >< EOL >
(
valuePartLiteralMulti() |
valuePartRef()
)*
< BACKSLASH >
}
Here is my TOKEN definition for the multiline string type:
TOKEN :
{
< MULTILINE_STRING:( ( (~["\\"])
| ("\\"
( ["\\", "'", "\"", "$", "n", "r", "t", "b", "f"]
| ["u", "U"]["+"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]
)
) ))+>
}
My problem is that my multi line string token type also consumes the character sequence of the '$REF{' characters.
I would like to modify this multi-line string so that it will stop consuming characters when it encounters an unescaped "$REF{" (but will continue reading past a "\$REF{" sequence).
Any assistance would be much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不确定,但在你的令牌定义中你还包括 $ (在 unicode 中?),也许你应该在开头添加 ~("$") (或 unicode 等效项)。
或者您可以使用语法 LOOKAHEAD,例如 LOOKAHEAD(valuePartRef())...
ps 您可以有多个 REF 吗?
I'm not sure, but in your token definition you also include $ (in unicode?), maybe you should add ~("$") (or the unicode equivalent) at the beginnig.
Or you can use syntatic LOOKAHEAD, something like LOOKAHEAD(valuePartRef())...
p.s. Can you have more than one REF?