如何在Flex Lexer中声明和重用角色类?

发布于 2025-01-23 00:54:42 字数 631 浏览 1 评论 0原文

通常,当您想重复使用正则表达式时,可以在声明部分中的FLEX中声明。默认情况下,它们将被括号所包围。例如:

num_seq [0-9]+

%%

{num_seq} return INT;  // will become ([0-9]+)

{num_seq}\.{num_seq} return FLOAT;  // will become ([0-9]+)\.([0-9]+)

但是,我想重复使用一些角色类。我可以定义自定义类,例如[:alpha:][:alnum:]

chars [a-zA-Z]

%%

  // will become (([a-zA-Z]){-}[aeiouAEIOU])+  // ill-formed
  // desired ([a-zA-Z]{-}[aeiouAEIOU])+  // correct
({chars}{-}[aeiouAEIOU])+ return ONLY_CONS;

({chars}{-}[a-z])+ return ONLY_UPPER;

({chars}{-}[A-Z])+ return ONLY_LOWER;

。在他们周围。是否有适当的方法或至关重要的解决方法可以实现这一目标?

Normally, when you want to reuse a regular expression, you can declare it in flex in declaration section. They will get enclosed by parenthesis by default. Eg:

num_seq [0-9]+

%%

{num_seq} return INT;  // will become ([0-9]+)

{num_seq}\.{num_seq} return FLOAT;  // will become ([0-9]+)\.([0-9]+)

But, I wanted to reuse some character classes. Can I define custom classes like [:alpha:], [:alnum:] etc. A toy Eg:

chars [a-zA-Z]

%%

  // will become (([a-zA-Z]){-}[aeiouAEIOU])+  // ill-formed
  // desired ([a-zA-Z]{-}[aeiouAEIOU])+  // correct
({chars}{-}[aeiouAEIOU])+ return ONLY_CONS;

({chars}{-}[a-z])+ return ONLY_UPPER;

({chars}{-}[A-Z])+ return ONLY_LOWER;

But currently, this will fail to compile because of the parenthesis added around them. Is there a proper way or at-least a workaround to achieve this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

莳間冲淡了誓言ζ 2025-01-30 00:54:42

这可能会不时有用,但不幸的是,它从未在Flex中实现。您可以通过在LEX兼容模式下运行FLEX来抑制宏观替换周围的自动括号,但这可能具有其他不受欢迎的效果。

POSIX要求正则表达式语法除了预定义的字符类外还包括

…表单的字符类表达式:[: name :]…在那些 name 关键字的地方strong> charclass lc_ctype 类别中的定义。

不幸的是,Flex并未实现此要求。修补弹性很难做到这一点,但是由于没有便携式机制可以允许用户在其语言环境中添加charclasses,而且实际上,许多标准的C库实现都缺乏适当的网站支持 - 几乎没有激励措施进行这种更改。

查看了所有这些选项后,我最终确信自己最简单的便携式解决方案是用基于name 。由于该字符序列不太可能存在于Flex输入文件中,因此使用SED或Python进行简单的搜索和替换是足够的。在我看来,正确解析flex输入文件似乎比它值得更多的麻烦。

This might be useful from time to time, but unfortunately it has never been implemented in flex. You could suppress the automatic parentheses around macro substitution by running flex in lex compatibility mode, but that has other probably undesirable effects.

Posix requires that regular expression bracket syntax includes, in addition to the predefined character classes,

…character class expressions of the form: [:name:] … in those locales where the name keyword has been given a charclass definition in the LC_CTYPE category.

Unfortunately, flex does not implement this requirement. It is not too difficult to patch flex to do this, but since there is no portable mechanism to allow the user to add charclasses to their locale --and, indeed, many standard C library implementations lack proper locale support-- there is little incentive to make this change.

Having looked at all these options, I eventually convinced myself that the simplest portable solution is to preprocess the flex input file to replace [:name:] with a set of characters based on name. Since that sequence of characters is unlikely to be present in a flex input file, a simple-minded search and replace using sed or python is adequate; correctly parsing the flex input file seems to me to be more trouble than it was worth.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文