在 flex/lex(解析器生成器)中实现字边界状态
我希望能够根据模式匹配是出现在单词字符之后还是出现在非单词字符之后来预测模式匹配。 换句话说,我想在 flex/lex 不支持的模式开头模拟 \b 分词正则表达式字符。
下面是我的尝试(无法按预期工作):
%{
#include <stdio.h>
%}
%x inword
%x nonword
%%
[a-zA-Z] { BEGIN inword; yymore(); }
[^a-zA-Z] { BEGIN nonword; yymore(); }
<inword>a { printf("'a' in word\n"); }
<nonword>a { printf("'a' not in word\n"); }
%%
输入:
a
ba
a
预期输出
'a' not in word
'a' in word
'a' not in word
实际输出:
a
'a' in word
'a' in word
我这样做是因为我想做类似 方言器,我一直想学习如何使用真正的词法分析器。 有时我想要替换的模式需要是单词的片段,有时它们只需要是整个单词。
I want to be able to predicate pattern matches on whether they occur after word characters or after non-word characters. In other words, I want to simulate the \b word break regex char at the beginning of the pattern which flex/lex does not support.
Here's my attempt below (which does not work as desired):
%{
#include <stdio.h>
%}
%x inword
%x nonword
%%
[a-zA-Z] { BEGIN inword; yymore(); }
[^a-zA-Z] { BEGIN nonword; yymore(); }
<inword>a { printf("'a' in word\n"); }
<nonword>a { printf("'a' not in word\n"); }
%%
Input :
a
ba
a
Expected output
'a' not in word
'a' in word
'a' not in word
actual output:
a
'a' in word
'a' in word
I'm doing this because I want to do something like the dialectizer and I have always wanted to learn how to use a real lexer. Sometimes the patterns I want to replace need to be fragments of words, sometimes they need to be whole words only.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这就是实现我想要的:
这样我可以在任何模式的开头或结尾执行与 \B 或 \b 等效的操作。 您可以通过执行
a/{WC}
或a/{NW}
在末尾进行匹配。我想在不消耗任何角色的情况下设置状态。 诀窍是使用 REJECT 而不是 yymore(),我想我没有完全理解。
Here's what accomplished what I wanted:
This way I can do the equivalent of \B or \b at the beginning or end of any pattern. You can match at the end by doing
a/{WC}
ora/{NW}
.I wanted to set up the states without consuming any characters. The trick is using REJECT rather than yymore(), which I guess I didn't fully understand.
测试:
Testing: