javacc 标记正则表达式 and_symbol_in

发布于 2025-01-01 04:12:34 字数 564 浏览 1 评论 0原文

我需要描述包含一些单词的标记。该单词可以包含英文字母和一些其他特殊符号,但不应该以某些定义的英文字母开头(例如“O”)。

看起来我需要 AND_SYMBOL_IN 操作或其他东西,但我没有在javacc 文档。 我需要这样的行为:

TOKEN : { < LETTERS: (
  (~["O", "-"] AND_SYMBOL_IN ["a"-"z","A"-"Z","-",".","&","|","0"-"9"])? (["a"-"z","A"-"Z","-",".","&","|","0"-"9"])+
  ) > }

我可以创建特殊的令牌(如下所示),但我相信有更好的决定,不是吗?

TOKEN : { < #LETTEREX: (
["a"-"z","A"-"N","P"-"Z",".","&","|","0"-"9","-"]) > }

TOKEN : { < LETTERS: (
(< LETTEREX > ) (< LETTEREX > | ["O"])+
) > }

I need to describe the token containing some word. The word could contain english letters and some other special symbols, but shouldn`t begin with some defined english letters (for example, 'O").

It looks like I need AND_SYMBOL_IN operation or something, but I haven`t find it in the javacc documentation.
I need the behavior something like this:

TOKEN : { < LETTERS: (
  (~["O", "-"] AND_SYMBOL_IN ["a"-"z","A"-"Z","-",".","&","|","0"-"9"])? (["a"-"z","A"-"Z","-",".","&","|","0"-"9"])+
  ) > }

I can create special token(like below), but I believe there is more nice decision, isn`t it?

TOKEN : { < #LETTEREX: (
["a"-"z","A"-"N","P"-"Z",".","&","|","0"-"9","-"]) > }

TOKEN : { < LETTERS: (
(< LETTEREX > ) (< LETTEREX > | ["O"])+
) > }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

〆一缕阳光ご 2025-01-08 04:12:34

JavaCC 使用语法中声明匹配标记的顺序来解决大小相等的匹配之间的歧义。因此,一种可能性是在您想要的标记之前匹配您不想要的标记:

例如:

TOKEN : { < #LETTER : ["a"-"z","A"-"Z","-",".","&","|","0"-"9"] > }
TOKEN : { < WORDS_STARTING_WITH_O : "O" ( < LETTER > )+ > }
TOKEN : { < WORDS_NOT_STARTING_WITH_O : (< LETTER > )+ > }

这是否合适取决于您有多少特殊情况以及它们的复杂程度。

JavaCC resolves ambiguities between equally sized matches using the order that the matching tokens are declared in the grammar. So one possibility is to match the token you don't want before the token you do:

For example:

TOKEN : { < #LETTER : ["a"-"z","A"-"Z","-",".","&","|","0"-"9"] > }
TOKEN : { < WORDS_STARTING_WITH_O : "O" ( < LETTER > )+ > }
TOKEN : { < WORDS_NOT_STARTING_WITH_O : (< LETTER > )+ > }

How suitable this is depends on how many special cases you have and how complicated they are.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文