在 flex/lex（解析器生成器）中实现字边界状态

发布于 2024-07-12 04:28:38 字数 736 浏览 7 评论 0原文

我希望能够根据模式匹配是出现在单词字符之后还是出现在非单词字符之后来预测模式匹配。换句话说，我想在 flex/lex 不支持的模式开头模拟 \b 分词正则表达式字符。

下面是我的尝试（无法按预期工作）：

%{
#include <stdio.h>
%}

%x inword
%x nonword

%%
[a-zA-Z]    { BEGIN inword; yymore(); }
[^a-zA-Z]   { BEGIN nonword; yymore(); }

<inword>a { printf("'a' in word\n"); }
<nonword>a { printf("'a' not in word\n"); }

%%

输入：

a
ba
a

预期输出

'a' not in word
'a' in word
'a' not in word

实际输出：

a
'a' in word
'a' in word

我这样做是因为我想做类似方言器，我一直想学习如何使用真正的词法分析器。有时我想要替换的模式需要是单词的片段，有时它们只需要是整个单词。

原文

I want to be able to predicate pattern matches on whether they occur after word characters or after non-word characters. In other words, I want to simulate the \b word break regex char at the beginning of the pattern which flex/lex does not support.

Here's my attempt below (which does not work as desired):

%{
#include <stdio.h>
%}

%x inword
%x nonword

%%
[a-zA-Z]    { BEGIN inword; yymore(); }
[^a-zA-Z]   { BEGIN nonword; yymore(); }

<inword>a { printf("'a' in word\n"); }
<nonword>a { printf("'a' not in word\n"); }

%%

Input :

a
ba
a

Expected output

'a' not in word
'a' in word
'a' not in word

actual output:

a
'a' in word
'a' in word

I'm doing this because I want to do something like the dialectizer and I have always wanted to learn how to use a real lexer. Sometimes the patterns I want to replace need to be fragments of words, sometimes they need to be whole words only.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

绅刃 2024-07-19 04:28:38

这就是实现我想要的：

%{
#include <stdio.h>
%}

WC      [A-Za-z']
NW      [^A-Za-z']

%start      INW NIW

{WC}  { BEGIN INW; REJECT; }
{NW}  { BEGIN NIW; REJECT; }

<INW>a { printf("'a' in word\n"); }
<NIW>a { printf("'a' not in word\n"); }

这样我可以在任何模式的开头或结尾执行与 \B 或 \b 等效的操作。您可以通过执行 a/{WC} 或 a/{NW} 在末尾进行匹配。

我想在不消耗任何角色的情况下设置状态。诀窍是使用 REJECT 而不是 yymore()，我想我没有完全理解。

Here's what accomplished what I wanted:

%{
#include <stdio.h>
%}

WC      [A-Za-z']
NW      [^A-Za-z']

%start      INW NIW

{WC}  { BEGIN INW; REJECT; }
{NW}  { BEGIN NIW; REJECT; }

<INW>a { printf("'a' in word\n"); }
<NIW>a { printf("'a' not in word\n"); }

This way I can do the equivalent of \B or \b at the beginning or end of any pattern. You can match at the end by doing a/{WC} or a/{NW}.

I wanted to set up the states without consuming any characters. The trick is using REJECT rather than yymore(), which I guess I didn't fully understand.

回复收藏 0 原文

七堇年 2024-07-19 04:28:38

%%
[a-zA-Z]+a[a-zA-Z]* {printf("a in word: %s\n", yytext);}
a[a-zA-Z]+ {printf("a in word: %s\n", yytext);}
a {printf("a not in word\n");}
. ;

测试：

user@cody /tmp $ ./a.out <<EOF
> a
> ba
> ab
> a
> EOF
a not in word

a in word: ba

a in word: ab

a not in word

%%
[a-zA-Z]+a[a-zA-Z]* {printf("a in word: %s\n", yytext);}
a[a-zA-Z]+ {printf("a in word: %s\n", yytext);}
a {printf("a not in word\n");}
. ;

Testing:

user@cody /tmp $ ./a.out <<EOF
> a
> ba
> ab
> a
> EOF
a not in word

a in word: ba

a in word: ab

a not in word

回复收藏 0 原文

~没有更多了~

关于作者

烂柯人

暂无简介

0 文章

0 评论

21 人气

关注发私信

友情链接

文江博客

在 flex/lex（解析器生成器）中实现字边界状态

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

qq_FjTq5B

18273202778

WordPress小学生

〃温暖了心ぐ

迷乱花海

niuniu

友情链接

在 flex/lex（解析器生成器）中实现字边界状态

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

qq_FjTq5B

18273202778

WordPress小学生

〃温暖了心ぐ

迷乱花海

niuniu

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。