获取以字符串“lngt”结尾的字符串在莱克斯

发布于 2024-12-28 08:06:17 字数 388 浏览 5 评论 0原文

我正在编写一个 lex 脚本来标记 C AST。我想在 lex 中编写一个正则表达式来获取以特定字符串“lngt”结尾但在 lex 返回的最终字符串中不包含“lngt”的字符串。所以基本上字符串形式是 (.*lngt)，但我无法弄清楚如何在 lex 中做到这一点。任何建议/方向都会非常有帮助

示例：我的文件中有这一行

@65  string_cst  type: @71  strg: Reverse order of the given number is : %d  lngt: 42

我想在 strg: 之后和 lngt: 之前检索字符串：即“给定数字的反向顺序是：%d”（注意：该字符串可以组成任何可能的字符）

谢谢。

原文

I am writing a lex script to tokenize C ASTs. I want to write a regex in lex to get a string that ends with a specific string "lngt" but does not include "lngt" in the final string returned by lex. So basically the string form would be (.*lngt), but I haven't been able to figure out how to do this in lex. Any advice/direction would be really helpful

Example:I have this line in my file

@65  string_cst  type: @71  strg: Reverse order of the given number is : %d  lngt: 42

I want to retrieve string after strg: and before lngt: ie "Reverse order of the given number is : %d" (NOTE: this string could be composed of any characters possible)

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你的他你的她 2025-01-04 08:06:17

这个问题需要一个类似于我在这里写的答案< /a>.可以通过在 lex 中编写自己的状态机来完成。也可以通过编写一些 C 代码来完成，如引用的答案或下面引用的其他文本中所示。

如果我们假设您想要的字符串始终位于“strg”和“lngt”之间，那么这与任何其他非对称字符串分隔符相同。

%x STRG LETTERL LN LNG LNGT
ws [ \t\r\n]+
%%
<INITIAL>"strg: " {
            BEGIN(STRG);
            }
<STRG>[^l]*l {
            yymore();
            BEGIN(LETTERL);
            }
<LETTERL>n {
           yymore();
           BEGIN(LN);
          }
<LN>g {
           yymore();
           BEGIN(LNG);
           }
<LNG>t {
           yymore();
           BEGIN(LNGT);
            }
<LNGT>":" {
           printf("String is '%s'\n", yytext);
           BEGIN(INITIAL);
            }
<LETTERL>[^n] {
            BEGIN(STRG);
            yymore();
            }
<LN>[^g] {
            BEGIN(STRG);
            yymore();
             }
<LNG>[^t] {
           BEGIN(STRG);
            yymore();
              }
<LNGT>[^:] {
            BEGIN(STRG);
            yymore();
               }
<INITIAL>{ws}  /* skip */ ;
<INITIAL>.   /* skip anything not in the string */
%%

引用一下我的另一个回答：

一些大学编译器课程提供了建议的解决方案。很好地解释了这一点的是这里（在曼彻斯特）。其中引用了几本也涵盖了这些问题的好书：
J.莱文、T.梅森和D.Brown：Lex 和 Yacc（第二版）
MELesk 和E.Schmidt：Lex - 词法分析器生成器
所描述的两种技术是使用启动条件来明确指定状态机，或手动输入直接读取字符。

This question needs an answer is similar to the one I wrote here. It can be done by writing your own state machine in lex. It could also be done by writing some C code as shown in the cited answer or in the other texts cited below.

If we assume that the string you want is always between "strg" and "lngt" then this is the same as any other non-symmetric string delimiters.

%x STRG LETTERL LN LNG LNGT
ws [ \t\r\n]+
%%
<INITIAL>"strg: " {
            BEGIN(STRG);
            }
<STRG>[^l]*l {
            yymore();
            BEGIN(LETTERL);
            }
<LETTERL>n {
           yymore();
           BEGIN(LN);
          }
<LN>g {
           yymore();
           BEGIN(LNG);
           }
<LNG>t {
           yymore();
           BEGIN(LNGT);
            }
<LNGT>":" {
           printf("String is '%s'\n", yytext);
           BEGIN(INITIAL);
            }
<LETTERL>[^n] {
            BEGIN(STRG);
            yymore();
            }
<LN>[^g] {
            BEGIN(STRG);
            yymore();
             }
<LNG>[^t] {
           BEGIN(STRG);
            yymore();
              }
<LNGT>[^:] {
            BEGIN(STRG);
            yymore();
               }
<INITIAL>{ws}  /* skip */ ;
<INITIAL>.   /* skip anything not in the string */
%%

To quote my other answer:

There are suggested solutions on several university compiler courses. The one that explains it well is here (at Manchester). Which cites a couple of good books which also cover the problems:
J.Levine, T.Mason & D.Brown: Lex and Yacc (2nd ed.)
M.E.Lesk & E.Schmidt: Lex - A Lexical Analyzer Generator
The two techniques described are to use Start Conditions to explicity specify the state machine, or manual input to read characters directly.

回复收藏 0 原文

~没有更多了~