命令在 lex 中失败

发布于 2024-07-15 20:45:55 字数 364 浏览 1 评论 0 原文

我在程序中使用 lex,但遇到了需要帮助的问题。

我的程序以 [something "something] 的形式接受输入。这是正常工作的。

但是,我还需要接受 [something"something] 的形式。

有没有一种方法可以让我在 lex 中拥有某种第一种情况,所有输入都经过运行(如预处理),然后让相同的修改后的输入继续通过我的程序的其余部分?

这就是我正在谈论的内容:

%%
.* {
   do preprocessing
   }

something{
   return SOMETHING;
   }

\" {
   return QUOTE;
   }
%%

I'm using lex in my program and I've run into a problem I need some help with.

My program accepts its input in the form of [something " something]. This is working correctly.

However, I also need to accept the form [something"something].

Is there a way that I can have some sort of first case in lex that all input is run through (like preprocessing), and then have that same, modified input continue on through the rest of my program?

Here's kind of what I'm talking about:

%%
.* {
   do preprocessing
   }

something{
   return SOMETHING;
   }

\" {
   return QUOTE;
   }
%%

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

掐死时间 2024-07-22 20:45:55

好吧,您实际上可以在 lex 中编写一个预处理器并将其放入您的构建系统中,但这可能太过分了!

您可以使用开始条件,通过 BEGIN 在它们之间切换,并首先解析输入,然后使用 unput 将字符推回到流中,然后不同的开始条件可以解析结果(请参阅 Flex 手册)。

我最近为类似 python 的配置语言编写了一个解析器,它就是这样做的。 解析器有两种模式(开始条件),一种是在行开头对制表符进行计数以确定范围,另一种是进行实际解析。

这些方法很好,但通常有更好的方法,特别是如果您的输入方案不是非常复杂。

您的程序的 [something " some] 和 [something"something] 之间是否存在语法差异? 空白吃规则能解决问题吗?

可以多描述一下您的语言和语法吗……?

评论后:

好吧,基本上你有两个令牌,SOMETHING 和 QUOTE。 如果您的令牌由空格分隔,您可以执行以下操作:

%%
\"     {
       //this will match a single quote
       return QUOTE;
       }

[^" \t\n\r]+   {
               //this will match a run of anything thats not a quote, space, tab or line ending
               return SOMETHING;
               }

[ \t\n\r]      {
               //do nothing: i.e. ignore whitespace
               }

%%

对于您的 SOMETHING 令牌,您还可以匹配类似 [A-Za-z_][A-Za-z0-9_]* 的内容匹配一个字母或下划线后跟 0 个或多个字母、下划线和数字。

这有帮助吗?

Well, you could actually write a preprocessor in lex and put it into your build system, but thats probably overkill!

You can use start conditions, switching between them with BEGIN, and parse input first, then use unput to push characters back into the stream, then a different start condition can parse the result (See the Flex manual).

I recently wrote a parser for a python-like config language that did just that. the parser had two modes (start conditions), one to count tabs at the start of a line to determine scope, and then another to do the actual parsing.

These methods are fine but there is usually a better way of doing it, especially if your input scheme isn't hugely complex.

Is there a gramatical difference between [something " something] and [something"something] for your program? would a whitespace eating rule do the trick?

Could describe your language and grammar a little more....?

After Comment:

Ok, so basically you have two tokens, SOMETHING and QUOTE. If your tokens are seperated by white space you can do the following:

%%
\"     {
       //this will match a single quote
       return QUOTE;
       }

[^" \t\n\r]+   {
               //this will match a run of anything thats not a quote, space, tab or line ending
               return SOMETHING;
               }

[ \t\n\r]      {
               //do nothing: i.e. ignore whitespace
               }

%%

For your SOMETHING token you could also match something like [A-Za-z_][A-Za-z0-9_]* which will match a letter or an underscore followed by 0 or more letters, underscores and numbers.

Does that help?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文