ANSI C 中的匹配单词
如何匹配 ANSI C 中的单词(1-n 个字符)? (另外:匹配 C 源代码中的常量的模式是什么?)
我尝试读取该文件并将其传递给 regexec()
(regex.h)。 问题:我正在编写的工具应该能够读取源代码并找到 全部使用常量(#define)来检查它们是否已定义。
用于测试的模式为:[a-zA-Z_0-9]{1,}
。但这会匹配诸如“test.h”中的“h”之类的单词。
How can I match a word (1-n characters) in ANSI C? (in addition: What is the pattern to match a constant in C-sourcecode?)
I tried reading the file and passing it to regexec()
(regex.h).
Problem: The tool I'm writing should be able to read sourcecode and find
all used constants (#define) to check if they're defined.
The pattern used for testing is: [a-zA-Z_0-9]{1,}
. But this would match words such as the "h" in "test.h".
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
标识符必须以字母或下划线开头,因此据
我所知,C 标识符和预处理器标识符之间没有语法差异。约定使用大写字母表示预处理器,使用小写字母表示 C 标识符,但没有实际要求。除非定义保证使用不同的命名约定,否则您基本上必须找到源文件和任何包含的文件中的每个标识符,并将它们分类为预处理器标识符、C 标识符和未声明的标识符。
来自海湾合作委员会手册:
Identifiers must start with a letter or underscore, so the pattern is
I know of no syntactic difference between C and preprocessor identifiers. There is a convention to use upper case for preprocessor and lowercase for C identifiers, but no actual requirement. Unless defines are guaranteed to use a distinct naming convention you would basically have to find every identifier in the source file and any included files and sort them into preprocessor identifiers, C identifiers and undeclared identifiers.
From the GCC manual:
除了对 C 源代码进行正则表达式搜索之外,另一个选择是使用预处理器库,例如 Boost Wave 或者类似 Coan 的东西,而不是从头开始。
Another option besides doing regex searches over C source code would be to use a preprocessor library like Boost Wave or perhaps something like Coan instead of starting from scratch.
这是 Lexer 语法 和 解析器语法(分别采用flex和bison格式)语言。特别地,与标识符相关的部分是:
所以id可以以任何大小写字母或下划线开头,然后有更多的大小写字母、下划线和数字。我相信它与文件名的部分内容不匹配,因为它们被引用并且它单独处理引号。
Here is the Lexer grammar and the Parser grammar (in flex and bison format, respectively) for the entire c language. In particular, the part relevant to identifiers is:
So the id can start with any uppercase or lowercase letter or an underscore, and then have more uppercase or lowercase letters, underscores, and numbers. I believe it doesn't match parts of file names because they're quoted and it handles quotes separately.