当前位置：文江博客话题详情

使用Lex/Yacc识别汉字标识符

发布于 2024-09-08 01:30:26 字数 29 浏览 9 评论 0原文

如何使用Lex/Yacc识别中文字符标识符？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烂人 2024-09-15 01:30:26

我认为你的意思是 Lex（词法分析器生成器）。 Yacc 是解析器生成器。

根据 Unicode 中汉字的完整范围是多少？，大多数 CJH 字符都属于 3400-9FFF范围。

根据 http://dinosaur.compilertools.net/lex/index.html

任意字符。几乎匹配
任意字符、运算符字符
。是所有字符的类
除了换行符。转义为八进制是
尽管不可移植，但可能：
<前><代码> [\40-\176]
匹配所有可打印字符
ASCII 字符集，从八进制 40 开始
（空白）转换为八进制 176（波形符）。

所以我假设你需要的是类似 [\32000-\117777] 的东西。

I think you mean Lex (the lexer generator). Yacc is the parser generator.

According to What's the complete range for Chinese characters in Unicode?, most CJH characters fall in the 3400-9FFF range.

According to http://dinosaur.compilertools.net/lex/index.html

Arbitrary character. To match almost
any character, the operator character
. is the class of all characters
except newline. Escaping into octal is
possible although non-portable:
                             [\40-\176]
matches all printable characters in
the ASCII character set, from octal 40
(blank) to octal 176 (tilde).

So I would assume what you need is something like [\32000-\117777].

回复收藏 0 原文