Lex 中的操作可以访问各个正则表达式组吗？

发布于 2024-11-26 21:51:54 字数 643 浏览 4 评论 0原文

（注意：我猜不是，因为组字符 - 括号 - 是根据文档用于更改优先级。但如果是这样，您是否推荐可以执行此操作的替代 C/C++ 扫描器生成器？我不太热衷于编写自己的词法分析器。）

示例：

假设我有这个输入： foo [tagName attribute="value"] bar 我想使用 Lex/Flex 提取标签。我当然可以写这个规则：

\[[a-z]+[[:space:]]+[a-z]+=\"[a-z]+\"\]       printf("matched %s", yytext);

但是假设我想要访问字符串的某些部分，例如属性，但不必再次解析 yytext （因为字符串已经被扫描过，所以扫描部分字符串实际上没有意义）再说一遍）。所以像这样的东西会更好（正则表达式组）：

\[[a-z]+[[:space:]]+[a-z]+=\"([a-z]+)\"\]     printf("matched attribute %s", $1);

原文

Can actions in Lex access individual regex groups?

(NOTE: I'm guessing not, since the group characters - parentheses - are according to the documentation used to change precedence. But if so, do you recommend an alternative C/C++ scanner generator that can do this? I'm not really hot on writing my own lexical analyzer.)

Example:

Let's say I have this input: foo [tagName attribute="value"] bar and I want to extract the tag using Lex/Flex. I could certainly write this rule:

\[[a-z]+[[:space:]]+[a-z]+=\"[a-z]+\"\]       printf("matched %s", yytext);

But let's say I would want to access certain parts of the string, e.g. the attribute but without having to parse yytext again (as the string has already been scanned it doesn't really make sense to scan part of it again). So something like this would be preferable (regex groups):

\[[a-z]+[[:space:]]+[a-z]+=\"([a-z]+)\"\]     printf("matched attribute %s", $1);

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

南烟 2024-12-03 21:51:54

您可以将其分开以启动条件。像这样的事情：

%x VALUEPARSE ENDSTATE
%%
char string_buf[100];
<INITIAL>\[[a-z]+[[:space:]]+[a-z]+=\"     {BEGIN(VALUEPARSE);}
<VALUEPARSE>([a-z]+)    (strncpy(string_buf, yytext, yyleng);BEGIN(ENDSTATE);}  //getting value text
<ENDSTATE>\"\]    {BEGIN(INITIAL);}
%%

关于替代的 C/C++ 扫描仪生成器 - 我使用 QT 类 QRegularExpression 来做同样的事情，它可以很容易地在匹配后获取正则表达式组。

You can separate it to start conditions. Something like this:

%x VALUEPARSE ENDSTATE
%%
char string_buf[100];
<INITIAL>\[[a-z]+[[:space:]]+[a-z]+=\"     {BEGIN(VALUEPARSE);}
<VALUEPARSE>([a-z]+)    (strncpy(string_buf, yytext, yyleng);BEGIN(ENDSTATE);}  //getting value text
<ENDSTATE>\"\]    {BEGIN(INITIAL);}
%%

About an alternative C/C++ scanner generator - I use QT class QRegularExpression for same things, it can very easy get regex group after match.

回复收藏 0 原文

原谅我要高飞 2024-12-03 21:51:54

当然，至少其中某些形式是这样的。
但可从 sourceforge.org 下载的默认 lex/flex 似乎并未在其文档中列出，并且此示例将完整字符串保留在 yytext 中。

来自 IBM 针对 AIX 的 LEX 文档：

（表达式）
匹配括号中的表达式。
()（括号）运算符用于分组，并使括号内的表达式读入 yytext 数组。括号中的组可以用来代替任何其他模式中的任何单个字符。
示例： (ab|cd+)?(ef)* 匹配 abefef、efefef、cdef 或 cddd 等字符串；但不是 abc、abcd 或 abcdef。