如何在Flex Lexer中声明和重用角色类?
通常,当您想重复使用正则表达式时,可以在声明部分中的FLEX中声明。默认情况下,它们将被括号所包围。例如:
num_seq [0-9]+
%%
{num_seq} return INT; // will become ([0-9]+)
{num_seq}\.{num_seq} return FLOAT; // will become ([0-9]+)\.([0-9]+)
但是,我想重复使用一些角色类。我可以定义自定义类,例如[:alpha:]
,[:alnum:]
等
chars [a-zA-Z]
%%
// will become (([a-zA-Z]){-}[aeiouAEIOU])+ // ill-formed
// desired ([a-zA-Z]{-}[aeiouAEIOU])+ // correct
({chars}{-}[aeiouAEIOU])+ return ONLY_CONS;
({chars}{-}[a-z])+ return ONLY_UPPER;
({chars}{-}[A-Z])+ return ONLY_LOWER;
。在他们周围。是否有适当的方法或至关重要的解决方法可以实现这一目标?
Normally, when you want to reuse a regular expression, you can declare it in flex in declaration section. They will get enclosed by parenthesis by default. Eg:
num_seq [0-9]+
%%
{num_seq} return INT; // will become ([0-9]+)
{num_seq}\.{num_seq} return FLOAT; // will become ([0-9]+)\.([0-9]+)
But, I wanted to reuse some character classes. Can I define custom classes like [:alpha:]
, [:alnum:]
etc. A toy Eg:
chars [a-zA-Z]
%%
// will become (([a-zA-Z]){-}[aeiouAEIOU])+ // ill-formed
// desired ([a-zA-Z]{-}[aeiouAEIOU])+ // correct
({chars}{-}[aeiouAEIOU])+ return ONLY_CONS;
({chars}{-}[a-z])+ return ONLY_UPPER;
({chars}{-}[A-Z])+ return ONLY_LOWER;
But currently, this will fail to compile because of the parenthesis added around them. Is there a proper way or at-least a workaround to achieve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这可能会不时有用,但不幸的是,它从未在Flex中实现。您可以通过在LEX兼容模式下运行FLEX来抑制宏观替换周围的自动括号,但这可能具有其他不受欢迎的效果。
POSIX要求正则表达式语法除了预定义的字符类外还包括
不幸的是,Flex并未实现此要求。修补弹性很难做到这一点,但是由于没有便携式机制可以允许用户在其语言环境中添加charclasses,而且实际上,许多标准的C库实现都缺乏适当的网站支持 - 几乎没有激励措施进行这种更改。
查看了所有这些选项后,我最终确信自己最简单的便携式解决方案是用基于
name 。由于该字符序列不太可能存在于Flex输入文件中,因此使用SED或Python进行简单的搜索和替换是足够的。在我看来,正确解析flex输入文件似乎比它值得更多的麻烦。
This might be useful from time to time, but unfortunately it has never been implemented in flex. You could suppress the automatic parentheses around macro substitution by running flex in lex compatibility mode, but that has other probably undesirable effects.
Posix requires that regular expression bracket syntax includes, in addition to the predefined character classes,
Unfortunately, flex does not implement this requirement. It is not too difficult to patch flex to do this, but since there is no portable mechanism to allow the user to add charclasses to their locale --and, indeed, many standard C library implementations lack proper locale support-- there is little incentive to make this change.
Having looked at all these options, I eventually convinced myself that the simplest portable solution is to preprocess the flex input file to replace
[:name:]
with a set of characters based onname
. Since that sequence of characters is unlikely to be present in a flex input file, a simple-minded search and replace using sed or python is adequate; correctly parsing the flex input file seems to me to be more trouble than it was worth.