C 风格:宏还是预处理器?
我编写了一个库来将字符串与一组模式进行匹配,现在我可以轻松地将词法扫描器嵌入到 C 程序中。
我知道有许多成熟的工具可用于创建词法扫描器(lex 和 re2c,仅列出我想到的前两个)这个问题与词法分析器无关,而是关于“扩展”C 语法的最佳方法。 词法分析器示例只是一般问题的具体案例。
我可以看到两种可能的解决方案:
- 编写一个预处理器,将带有嵌入式词法分析器的源文件转换为纯 C 文件,并且可能转换为要在编译中使用的一组其他文件。
- 编写一组 C 宏以更易读的形式表示词法分析器。
我已经完成了这两项工作,但问题是:“根据以下标准,您认为哪一种做法更好?”
- 可读性。 词法分析器逻辑应该清晰且易于理解
- 可维护性。 查找并修复错误不应该是一场噩梦!
- 构建过程中的干扰。 预处理器将需要在构建过程中执行额外的步骤,预处理器必须位于路径等中。
换句话说,如果您必须维护或编写一个使用两种方法之一的软件,其中一种是会少让你失望吗?
作为一个例子,这里是一个针对以下问题的词法分析器:
- 对所有数字求和(可以是十进制形式,包括像 1.3E-4.2 这样的指数)
- 跳过字符串(双引号和单引号)
- 跳过列表(类似于 LISP 列表: (3 4 ( 0 1)() 3) )
- 在遇到单词 end(大小写无关)或缓冲区末尾时停止
在两种样式中。
/**** SCANNER STYLE 1 (preprocessor) ****/
#include "pmx.h"
t = buffer
while (*t) {
switch pmx(t) { /* the preprocessor will handle this */
case "&q" : /* skip strings */
break;
case "&f<?=eE>&F" : /* sum numbers */
sum += atof(pmx(Start,0));
break;
case "&b()": /* skip lists */
break;
case "&iend" : /* stop processing */
t = "";
break;
case "<.>": /* skip a char and proceed */
break;
}
}
/**** SCANNER STYLE 2 (macros) ****/
#include "pmx.h"
/* There can be up to 128 tokens per scanner with id x80 to xFF */
#define TOK_STRING x81
#define TOK_NUMBER x82
#define TOK_LIST x83
#define TOK_END x84
#define TOK_CHAR x85
pmxScanner( /* pmxScanner() is a pretty complex macro */
buffer
,
pmxTokSet("&q" , TOK_STRING)
pmxTokSet("&f<?=eE>&F" , TOK_NUMBER)
pmxTokSet("&b()" , TOK_LIST)
pmxTokSet("&iend" , TOK_END)
pmxTokSet("<.>" , TOK_CHAR)
,
pmxTokCase(TOK_STRING) : /* skip strings */
continue;
pmxTokCase(TOK_NUMBER) : /* sum numbers */
sum += atof(pmxTokStart(0));
continue;
pmxTokCase(TOK_LIST): /* skip lists */
continue;
pmxTokCase(TOK_END) : /* stop processing */
break;
pmxTokCase(TOK_CHAR) : /* skip a char and proceed */
continue;
);
如果有人对当前的实现感兴趣,代码位于:http://sites.google.com/站点/clibutl 。
I've written a library to match strings against a set of patterns and I can now easily embed lexical scanners into C programs.
I know there are many well established tools available to create lexical scanners (lex and re2c, to just name the first two that come to mind) this question is not about lexers, it's about the best approach to "extend" C syntax. The lexer example is just a concrete case of a general problem.
I can see two possible solutions:
- write a preprocessor that converts a source file with the embedded lexer to a plain C file and, possibly, to a set of other files to be used in the compilation.
- write a set of C macros to represent lexers in a more readable form.
I've already done both but the question is: "which one would you consider a better practice according the following criteria?"
- Readability. The lexer logic should be clear and easy to understand
- Maintainability. Finding and fixing a bug should not be a nightmare!
- Interference in the build process. The preprocessor will require an additional step in the build process, the preprocessor will have to be in the path etc etc.
In other words, if you had to maintain or write a piece of software that is using one of the two approaches, wich one will disappoint you less?
As an example, here is a lexer for the following problem:
- Sum all numbers (can be in decimal form including exponential like 1.3E-4.2)
- Skip strings (double and single quoted)
- skip lists (similar to LISP lists: (3 4 (0 1)() 3) )
- stop on encountering the word end (case is irrelevant) or at the end of the buffer
In the two styles.
/**** SCANNER STYLE 1 (preprocessor) ****/
#include "pmx.h"
t = buffer
while (*t) {
switch pmx(t) { /* the preprocessor will handle this */
case "&q" : /* skip strings */
break;
case "&f<?=eE>&F" : /* sum numbers */
sum += atof(pmx(Start,0));
break;
case "&b()": /* skip lists */
break;
case "&iend" : /* stop processing */
t = "";
break;
case "<.>": /* skip a char and proceed */
break;
}
}
/**** SCANNER STYLE 2 (macros) ****/
#include "pmx.h"
/* There can be up to 128 tokens per scanner with id x80 to xFF */
#define TOK_STRING x81
#define TOK_NUMBER x82
#define TOK_LIST x83
#define TOK_END x84
#define TOK_CHAR x85
pmxScanner( /* pmxScanner() is a pretty complex macro */
buffer
,
pmxTokSet("&q" , TOK_STRING)
pmxTokSet("&f<?=eE>&F" , TOK_NUMBER)
pmxTokSet("&b()" , TOK_LIST)
pmxTokSet("&iend" , TOK_END)
pmxTokSet("<.>" , TOK_CHAR)
,
pmxTokCase(TOK_STRING) : /* skip strings */
continue;
pmxTokCase(TOK_NUMBER) : /* sum numbers */
sum += atof(pmxTokStart(0));
continue;
pmxTokCase(TOK_LIST): /* skip lists */
continue;
pmxTokCase(TOK_END) : /* stop processing */
break;
pmxTokCase(TOK_CHAR) : /* skip a char and proceed */
continue;
);
Should anyone be interested in the current implementation, the code is here: http://sites.google.com/site/clibutl .
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
预处理器将提供更强大和通用的解决方案。 另一方面,宏可以快速启动,提供良好的概念验证,并且当示例关键字/令牌空间较小时也很容易。 在某个点之后,使用宏扩展/包含新功能可能会变得乏味。 我会说启动宏,然后将它们转换为预处理器命令。
另外,如果可能的话,尝试使用通用预处理器而不是编写自己的预处理器。
是的。 但是您编写的任何解决方案也是如此:) - 并且您必须维护它。 您命名的大多数程序都有可用的 Windows 端口(例如,请参阅 m4 for windows)。 使用这种解决方案的优点是可以节省大量时间。 当然,缺点是,如果出现奇怪的错误,您可能必须加快源代码的速度(尽管维护这些错误的人员非常有帮助,并且肯定会确保您获得一切帮助)。
再说一次,是的,我更喜欢打包的解决方案而不是自己推出。
Preprocessor will offer a more robust and generic solution. Macros on the other hand are quick to whip up, provide a good proof-of-concept and easy when the sample keyword/token space is small. Scaling up/including new features may become tedious with macros after a point. I'd say whip up macros to get started and then convert them to your preprocessor commands.
Also, try to be able to use a generic preprocessor rather than writing your own, if possible.
Yes. But so would any solution you write :) -- and you have to maintain it. Most of the programs you've names have a Windows port available (e.g. see m4 for windows). The advantages of using such a solution is you save a lot of time. Of course, the downside is you probably have to get upto speed with the source code, if and when the odd bug turns up (though the folks maintaining these are very helpful and will certainly make sure you have every help).
And again, yes, I'd prefer a packaged solution to rolling my own.
自定义预处理器是解析器/解释器生成器中的典型方法,因为宏的可能性非常有限,并且在扩展阶段提供了潜在的问题,使得调试需要付出巨大的努力。
我建议您使用经过时间考验的工具,例如经典的 Yacc/Lex Unix 程序,或者如果您想“扩展”C,请使用 C++ 和 Boost::spirit,这是一个广泛使用模板的解析器生成器。
The custom preprocessor is the typical approach in parser/interpreter generators, as macros possibilities are very limited and offer potential problems at expansion stage making debugging a tremendous effort.
I suggest you to use a time-tested tool such as the classical Yacc/Lex Unix programs, or if you want to "extend" C, use C++ and Boost::spirit, a parser generator that uses templates extensively.